图书介绍
an introduction to parallel programming = 并行程序设计导论 (英文版)PDF|Epub|txt|kindle电子书版本网盘下载
- 著
- 出版社:
- ISBN:
- 出版时间:未知
- 标注页数:0页
- 文件大小:53MB
- 文件页数:389页
- 主题词:
PDF下载
下载说明
an introduction to parallel programming = 并行程序设计导论 (英文版)PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
CHAPTER 1 Why Parallel Computing?1
1.1 Why We Need Ever-Increasing Performance2
1.2 Why We’re Building Parallel Systems3
1.3 Why We Need to Write Parallel Programs3
1.4 How Do We Write Parallel Programs?6
1.5 What We’ll Be Doing8
1.6 Concurrent,Parallel,Distributed9
1.7 The Rest of the Book10
1.8 A Word of Warning10
1.9 Typographical Conventions11
1.10 Summary12
1.11 Exercises12
CHAPTER 2 Parallel Hardware and Parallel Software15
2.1 Some Background15
2.1.1 The von Neumann architecture15
2.1.2 Processes,multitasking,and threads17
2.2 Modifications to the von Neumann Model18
2.2.1 The basics of caching19
2.2.2 Cache mappings20
2.2.3 Caches and programs:an example22
2.2.4 Virtual memory23
2.2.5 Instruction-level parallelism25
2.2.6 Hardware multithreading28
2.3 Parallel Hardware29
2.3.1 SIMD systems29
2.3.2 MIMD systems32
2.3.3 Interconnection networks35
2.3.4 Cache coherence43
2.3.5 Shared-memory versus distributed-memory46
2.4 Parallel Software47
2.4.1 Caveats47
2.4.2 Coordinating the processes/threads48
2.4.3 Shared-memory49
2.4.4 Distributed-memory53
2.4.5 Programming hybrid systems56
2.5 Input and Output56
2.6 Performance58
2.6.1 Speedup and efficiency58
2.6.2 Amdahl’s law61
2.6.3 Scalability62
2.6.4 Taking timings63
2.7 Parallel Program Design65
2.7.1 An example66
2.8 Writing and Running Parallel Programs70
2.9 Assumptions70
2.10 Summary71
2.10.1 Serial systems71
2.10.2 Parallel hardware73
2.10.3 Parallel software74
2.10.4 Input and output75
2.10.5 Performance75
2.10.6 Parallel program design76
2.10.7 Assumptions76
2.11 Exercises77
CHAPTER 3 Distributed-Memory Programming with MPI83
3.1 Getting Started84
3.1.1 Compilation and execution84
3.1.2 MPI programs86
3.1.3 MPI_InitandMPI_Finalize86
3.1.4 Communicators,MP I_Comm_size and MPI_Comm_rank87
3.1.5 SPMD programs88
3.1.6 Communication88
3.1.7 MPI_Send88
3.1.8 MPI_Recv90
3.1.9 Message matching91
3.1.10 The s t a t u s _p argument92
3.1.11 Semantics of MP I_Send and MP I_Recv93
3.1.12 Some potential pitfalls94
3.2 The Trapezoidal Rule in MPI94
3.2.1 The trapezoidal rule94
3.2.2 Parallelizing the trapezoidal rule96
3.3 Dealing with I/O97
3.3.1 Output97
3.3.2 Input100
3.4 Collective Communication101
3.4.1 Tree-structured communication102
3.4.2 MPI_Reduce103
3.4.3 Collective vs.point-to-point communications105
3.4.4 MPI_Al l reduce106
3.4.5 Broadcast106
3.4.6 Data distributions109
3.4.7 Scatter110
3.4.8 Gather112
3.4.9 Allgather113
3.5 MPI Derived Datatypes116
3.6 Performance Evaluation of MPI Programs119
3.6.1 Taking timings119
3.6.2 Results122
3.6.3 Speedup and efficiency125
3.6.4 Scalability126
3.7 A Parallel Sorting Algorithm127
3.7.1 Some simple serial sorting algorithms127
3.7.2 Parallel odd-even transposition sort129
3.7.3 Safety in MPI programs132
3.7.4 Final details of parallel odd-even sort134
3.8 Summary136
3.9 Exercises140
3.10 Programming Assignments147
CHAPTER 4 Shared-Memory Programming with Pthreads151
4.1 Processes,Threads,and Pthreads151
4.2 Hello,World153
4.2.1 Execution153
4.2.2 Preliminaries155
4.2.3 Starting the threads156
4.2.4 Running the threads157
4.2.5 Stopping the threads158
4.2.6 Error checking158
4.2.7 Other approaches to thread startup159
4.3 Matrix-Vector Multiplication159
4.4 Critical Sections162
4.5 Busy-Waiting165
4.6 Mutexes168
4.7 Producer-Consumer Synchronization and Semaphores171
4.8 Barriers and Condition Variables176
4.8.1 Busy-waiting and a mutex177
4.8.2 Semaphores177
4.8.3 Condition variables179
4.8.4 Pthreads barriers181
4.9 Read-Write Locks181
4.9.1 Linked list functions181
4.9.2 A multi-threaded linked list183
4.9.3 Pthreads read-write locks187
4.9.4 Performance of the various implementations188
4.9.5 Implementing read-write locks190
4.10 Caches,Cache Coherence,and False Sharing190
4.11 Thread-Safety195
4.11.1 Incorrect programs can produce correct output198
4.12 Summary198
4.13 Exercises200
4.14 Programming Assignments206
CHAPTER 5 Shared-Memory Programming with OpenMP209
5.1 Getting Started210
5.1.1 Compiling and running OpenMP programs211
5.1.2 The program212
5.1.3 Error checking215
5.2 The Trapezoidal Rule216
5.2.1 A first OpenMP version216
5.3 Scope of Variables220
5.4 The Reduction Clause221
5.5 The parallel for Directive224
5.5.1 Caveats225
5.5.2 Data dependences227
5.5.3 Finding loop-carried dependences228
5.5.4 Estimating π229
5.5.5 More on scope231
5.6 More About Loops in OpenMP:Sorting232
5.6.1 Bubble sort232
5.6.2 Odd-even transposition sort233
5.7 Scheduling Loops236
5.7.1 The schedule clause237
5.7.2 The static schedule type238
5.7.3 The dy n a m i c and guided schedule types239
5.7.4 The r u n t i me schedule type239
5.7.5 Which schedule?241
5.8 Producers and Consumers241
5.8.1 Queues241
5.8.2 Message-passing242
5.8.3 Sending messages243
5.8.4 Receiving messages243
5.8.5 Termination detection244
5.8.6 Startup244
5.8.7 The a t o m i c directive245
5.8.8 Critical sections and locks246
5.8.9 Using locks in the message-passing program248
5.8.10 critical directives,atomic directives,or locks?249
5.8.11 Some caveats249
5.9 Caches,Cache Coherence,and False Sharing251
5.10 Thread-Safety256
5.10.1 Incorrect programs can produce correct output258
5.11 Summary259
5.12 Exercises263
5.13 Programming Assignments267
CHAPTER 6 Parallel Program Development271
6.1 Two n-Body Solvers271
6.1.1 The problem271
6.1.2 Two serial programs273
6.1.3 Parallelizing the n-body solvers277
6.1.4 A word about I/O280
6.1.5 Parallelizing the basic solver using OpenMP281
6.1.6 Parallelizing the reduced solver using OpenMP284
6.1.7 Evaluating the OpenMP codes288
6.1.8 Parallelizing the solvers using pthreads289
6.1.9 Parallelizing the basic solver using MPI290
6.1.10 Parallelizing the reduced solver using MPI292
6.1.11 Performance of the MPI solvers297
6.2 Tree Search299
6.2.1 Recursive depth-first search302
6.2.2 Nonrecursive depth-first search303
6.2.3 Data structures for the serial implementations305
6.2.4 Performance of the serial implementations306
6.2.5 Parallelizing tree search306
6.2.6 A static parallelization of tree search using pthreads309
6.2.7 A dynamic parallelization of tree search using pthreads310
6.2.8 Evaluating the pthreads tree-search programs315
6.2.9 Parallelizing the tree-search programs using OpenMP316
6.2.10 Performance of the OpenMP implementations318
6.2.11 Implementation of tree search using MPI and static partitioning319
6.2.12 Implementation of tree search using MPI and dynamic partitioning327
6.3 A Word of Caution335
6.4 Which API?335
6.5 Summary336
6.5.1 Pthreads and OpenMP337
6.5.2 MPI338
6.6 Exercises341
6.7 Programming Assignments350
CHAPTER 7 Where to Go from Here353
References357
Index361