图书介绍
Hadoop应用架构PDF|Epub|txt|kindle电子书版本网盘下载
![Hadoop应用架构](https://www.shukui.net/cover/61/34512915.jpg)
- MarkGrover,TedMalaska,JonatbanSeidman等著 著
- 出版社: 南京:东南大学出版社
- ISBN:9787564170011
- 出版时间:2017
- 标注页数:376页
- 文件大小:42MB
- 文件页数:400页
- 主题词:数据处理软件-英文
PDF下载
下载说明
Hadoop应用架构PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
Part Ⅰ.Architectural Considerations for Hadoop Applications1
1.Data Modeling in Hadoop1
Data Storage Options2
Standard File Formats4
Hadoop File Types5
Serialization Formats7
Columnar Formats9
Compression12
HDFS Schema Design14
Location of HDFS Files16
Advanced HDFS Schema Design17
HDFS Schema Design Summary21
HBase Schema Design21
Row Key22
Timestamp25
Hops25
Tables and Regions26
Using Columns28
Using Column Families30
Time-to-Live30
Managing Metadata31
What Is Metadata?31
Why Care About Metadata?32
Where to Store Metadata?32
Examples of Managing Metadata34
Limitations of the Hive Metastore and HCatalog34
Other Ways of Storing Metadata35
Conclusion36
2.Data Movement39
Data Ingestion Considerations39
Timeliness of Data Ingestion40
Incremental Updates42
Access Patterns43
Original Source System and Data Structure44
Transformations47
Network Bottlenecks48
Network Security49
Push or Pull49
Failure Handling50
Level of Complexity51
Data Ingestion Options51
File Transfers52
Considerations for File Transfers versus Other Ingest Methods55
Sqoop:Batch Transfer Between Hadoop and Relational Databases56
Flume:Event-Based Data Collection and Processing61
Kafka71
Data Extraction76
Conclusion77
3.Processing Data in Hadoop79
MapReduce80
MapReduce Overview80
Example for MapReduce88
When to Use MapReduce94
Spark95
Spark Overview95
Overview of Spark Components96
Basic Spark Concepts97
Benefits of Using Spark100
Spark Example102
When to Use Spark104
Abstractions104
Pig106
Pig Example106
When to Use Pig109
Crunch110
Crunch Example110
When to Use Crunch115
Cascading115
Cascading Example116
When to Use Cascading119
Hive119
Hive Overview119
Example of Hive Code121
When to Use Hive125
Impala126
Impala Overview127
Speed-Oriented Design128
Impala Example130
When to Use Impala131
Conclusion132
4.Common Hadoop Processing Patterns135
Pattern:Removing Duplicate Records by Primary Key135
Data Generation for Deduplication Example136
Code Example:Spark Deduplication in Scala137
Code Example:Deduplication in SQL139
Pattern:Windowing Analysis140
Data Generation for Windowing Analysis Example141
Code Example:Peaks and Valleys in Spark142
Code Example:Peaks and Valleys in SQL146
Pattern:Time Series Modifications147
Use HBase and Versioning148
Use HBase with a RowKey of RecordKey and StartTime149
Use HDFS and Rewrite the Whole Table149
Use Partitions on HDFS for Current and Historical Records150
Data Generation for Time Series Example150
Code Example:Time Series in Spark151
Code Example:Time Series in SQL154
Conclusion157
5.Graph Processing on Hadoop159
What Is a Graph?159
What Is Graph Processing?161
How Do You Process a Graph in a Distributed System?162
The Bulk Synchronous Parallel Model163
BSP by Example163
Giraph165
Read and Partition the Data166
Batch Process the Graph with BSP168
Write the Graph Back to Disk172
Putting It All Together173
When Should You Use Giraph?174
GraphX174
Just Another RDD175
GraphX Pregel Interface177
vprog()178
sendMessage()179
mergeMessage()179
Which Tool to Use?180
Conclusion180
6.Orchestration183
Why We Need Workflow Orchestration183
The Limits of Scripting184
The Enterprise Job Scheduler and Hadoop186
Orchestration Frameworks in the Hadoop Ecosystem186
Oozie Terminology188
Oozie Overview188
Oozie Workflow191
Workflow Patterns194
Point-to-Point Workflow194
Fan-Out Workflow196
Capture-and-Decide Workflow198
Parameterizing Workflows201
Classpath Definition203
Scheduling Patterns204
Frequency Scheduling205
Time and Data Triggers205
Executing Workflows210
Conclusion210
7.Near-Real-Time Processing with Hadoop213
Stream Processing215
Apache Storm217
Storm High-Level Architecture218
Storm Topologies219
Tuples and Streams221
Spouts and Bolts221
Stream Groupings222
Reliability of Storm Applications223
Exactly-Once Processing223
Fault Tolerance224
Integrating Storm with HDFS225
Integrating Storm with HBase225
Storm Example:Simple Moving Average226
Evaluating Storm233
Trident233
Trident Example:Simple Moving Average234
Evaluating Trident237
Spark Streaming237
Overview of Spark Streaming238
Spark Streaming Example:Simple Count238
Spark Streaming Example:Multiple Inputs240
Spark Streaming Example:Maintaining State241
Spark Streaming Example:Windowing243
Spark Streaming Example:Streaming versus ETL Code244
Evaluating Spark Streaming245
Flume Interceptors246
Which Tool to Use?247
Low-Latency Enrichment,Validation,Alerting,and Ingestion247
NRT Counting,Rolling Averages,and Iterative Processing248
Complex Data Pipelines249
Conclusion250
Part Ⅱ.Case Studies250
8.Clickstream Analysis253
Defining the Use Case253
Using Hadoop for Clickstream Analysis255
Design Overview256
Storage257
Ingestion260
The Client Tier264
The Collector Tier266
Processing268
Data Deduplication270
Sessionization272
Analyzing275
Orchestration276
Conclusion279
9.Fraud Detection281
Continuous Improvement281
Taking Action282
Architectural Requirements of Fraud Detection Systems283
Introducing Our Use Case283
High-Level Design284
Client Architecture286
Profile Storage and Retrieval287
Caching288
HBase Data Definition289
Delivering Transaction Status:Approved or Denied?294
Ingest295
Path Between the Client and Flume296
Near-Real-Time and Exploratory Analytics302
Near-Real-Time Processing302
Exploratory Analytics304
What About Other Architectures?305
Flume Interceptors305
Kafka to Storm or Spark Streaming306
External Business Rules Engine306
Conclusion307
10.Data Warehouse309
Using Hadoop for Data Warehousing312
Defining the Use Case314
OLTP Schema316
Data Warehouse:Introduction and Terminology317
Data Warehousing with Hadoop319
High-Level Design319
Data Modeling and Storage320
Ingestion332
Data Processing and Access337
Aggregations341
Data Export343
Orchestration344
Conclusion345
A.Joins in Impala347
Index353