图书介绍

Hadoop应用架构PDF|Epub|txt|kindle电子书版本网盘下载

Hadoop应用架构
  • MarkGrover,TedMalaska,JonatbanSeidman等著 著
  • 出版社: 南京:东南大学出版社
  • ISBN:9787564170011
  • 出版时间:2017
  • 标注页数:376页
  • 文件大小:42MB
  • 文件页数:400页
  • 主题词:数据处理软件-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

Hadoop应用架构PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

Part Ⅰ.Architectural Considerations for Hadoop Applications1

1.Data Modeling in Hadoop1

Data Storage Options2

Standard File Formats4

Hadoop File Types5

Serialization Formats7

Columnar Formats9

Compression12

HDFS Schema Design14

Location of HDFS Files16

Advanced HDFS Schema Design17

HDFS Schema Design Summary21

HBase Schema Design21

Row Key22

Timestamp25

Hops25

Tables and Regions26

Using Columns28

Using Column Families30

Time-to-Live30

Managing Metadata31

What Is Metadata?31

Why Care About Metadata?32

Where to Store Metadata?32

Examples of Managing Metadata34

Limitations of the Hive Metastore and HCatalog34

Other Ways of Storing Metadata35

Conclusion36

2.Data Movement39

Data Ingestion Considerations39

Timeliness of Data Ingestion40

Incremental Updates42

Access Patterns43

Original Source System and Data Structure44

Transformations47

Network Bottlenecks48

Network Security49

Push or Pull49

Failure Handling50

Level of Complexity51

Data Ingestion Options51

File Transfers52

Considerations for File Transfers versus Other Ingest Methods55

Sqoop:Batch Transfer Between Hadoop and Relational Databases56

Flume:Event-Based Data Collection and Processing61

Kafka71

Data Extraction76

Conclusion77

3.Processing Data in Hadoop79

MapReduce80

MapReduce Overview80

Example for MapReduce88

When to Use MapReduce94

Spark95

Spark Overview95

Overview of Spark Components96

Basic Spark Concepts97

Benefits of Using Spark100

Spark Example102

When to Use Spark104

Abstractions104

Pig106

Pig Example106

When to Use Pig109

Crunch110

Crunch Example110

When to Use Crunch115

Cascading115

Cascading Example116

When to Use Cascading119

Hive119

Hive Overview119

Example of Hive Code121

When to Use Hive125

Impala126

Impala Overview127

Speed-Oriented Design128

Impala Example130

When to Use Impala131

Conclusion132

4.Common Hadoop Processing Patterns135

Pattern:Removing Duplicate Records by Primary Key135

Data Generation for Deduplication Example136

Code Example:Spark Deduplication in Scala137

Code Example:Deduplication in SQL139

Pattern:Windowing Analysis140

Data Generation for Windowing Analysis Example141

Code Example:Peaks and Valleys in Spark142

Code Example:Peaks and Valleys in SQL146

Pattern:Time Series Modifications147

Use HBase and Versioning148

Use HBase with a RowKey of RecordKey and StartTime149

Use HDFS and Rewrite the Whole Table149

Use Partitions on HDFS for Current and Historical Records150

Data Generation for Time Series Example150

Code Example:Time Series in Spark151

Code Example:Time Series in SQL154

Conclusion157

5.Graph Processing on Hadoop159

What Is a Graph?159

What Is Graph Processing?161

How Do You Process a Graph in a Distributed System?162

The Bulk Synchronous Parallel Model163

BSP by Example163

Giraph165

Read and Partition the Data166

Batch Process the Graph with BSP168

Write the Graph Back to Disk172

Putting It All Together173

When Should You Use Giraph?174

GraphX174

Just Another RDD175

GraphX Pregel Interface177

vprog()178

sendMessage()179

mergeMessage()179

Which Tool to Use?180

Conclusion180

6.Orchestration183

Why We Need Workflow Orchestration183

The Limits of Scripting184

The Enterprise Job Scheduler and Hadoop186

Orchestration Frameworks in the Hadoop Ecosystem186

Oozie Terminology188

Oozie Overview188

Oozie Workflow191

Workflow Patterns194

Point-to-Point Workflow194

Fan-Out Workflow196

Capture-and-Decide Workflow198

Parameterizing Workflows201

Classpath Definition203

Scheduling Patterns204

Frequency Scheduling205

Time and Data Triggers205

Executing Workflows210

Conclusion210

7.Near-Real-Time Processing with Hadoop213

Stream Processing215

Apache Storm217

Storm High-Level Architecture218

Storm Topologies219

Tuples and Streams221

Spouts and Bolts221

Stream Groupings222

Reliability of Storm Applications223

Exactly-Once Processing223

Fault Tolerance224

Integrating Storm with HDFS225

Integrating Storm with HBase225

Storm Example:Simple Moving Average226

Evaluating Storm233

Trident233

Trident Example:Simple Moving Average234

Evaluating Trident237

Spark Streaming237

Overview of Spark Streaming238

Spark Streaming Example:Simple Count238

Spark Streaming Example:Multiple Inputs240

Spark Streaming Example:Maintaining State241

Spark Streaming Example:Windowing243

Spark Streaming Example:Streaming versus ETL Code244

Evaluating Spark Streaming245

Flume Interceptors246

Which Tool to Use?247

Low-Latency Enrichment,Validation,Alerting,and Ingestion247

NRT Counting,Rolling Averages,and Iterative Processing248

Complex Data Pipelines249

Conclusion250

Part Ⅱ.Case Studies250

8.Clickstream Analysis253

Defining the Use Case253

Using Hadoop for Clickstream Analysis255

Design Overview256

Storage257

Ingestion260

The Client Tier264

The Collector Tier266

Processing268

Data Deduplication270

Sessionization272

Analyzing275

Orchestration276

Conclusion279

9.Fraud Detection281

Continuous Improvement281

Taking Action282

Architectural Requirements of Fraud Detection Systems283

Introducing Our Use Case283

High-Level Design284

Client Architecture286

Profile Storage and Retrieval287

Caching288

HBase Data Definition289

Delivering Transaction Status:Approved or Denied?294

Ingest295

Path Between the Client and Flume296

Near-Real-Time and Exploratory Analytics302

Near-Real-Time Processing302

Exploratory Analytics304

What About Other Architectures?305

Flume Interceptors305

Kafka to Storm or Spark Streaming306

External Business Rules Engine306

Conclusion307

10.Data Warehouse309

Using Hadoop for Data Warehousing312

Defining the Use Case314

OLTP Schema316

Data Warehouse:Introduction and Terminology317

Data Warehousing with Hadoop319

High-Level Design319

Data Modeling and Storage320

Ingestion332

Data Processing and Access337

Aggregations341

Data Export343

Orchestration344

Conclusion345

A.Joins in Impala347

Index353

热门推荐