图书介绍
高性能计算机上的数值线性 英文影印版PDF|Epub|txt|kindle电子书版本网盘下载
![高性能计算机上的数值线性 英文影印版](https://www.shukui.net/cover/3/30749867.jpg)
- (美)冬格拉等著 著
- 出版社: 北京:清华大学出版社
- ISBN:9787302244998
- 出版时间:2011
- 标注页数:345页
- 文件大小:15MB
- 文件页数:362页
- 主题词:线性代数计算法-英文
PDF下载
下载说明
高性能计算机上的数值线性 英文影印版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
1 High-Performance Computing1
1.1 Trends in Computer Design1
1.2 Traditional Computers and Their Limitations2
1.3 Parallelism within a Single Processor3
1.3.1 Multiple Functional Units3
1.3.2 Pipelining3
1.3.3 Overlapping4
1.3.4 RISC5
1.3.5 VLIW6
1.3.6 Vector Instructions7
1.3.7 Chaining7
1.3.8 Memory-to-Memory and Register-to-Register Organizations8
1.3.9 Register Set9
1.3.10 Stripmining9
1.3.11 Reconfigurable Vector Registers10
1.3.12 Memory Organization10
1.4 Data Organization11
1.4.1 Main Memory12
1.4.2 Cache14
1.4.3 Local Memory15
1.5 Memory Management15
1.6 Parallelism through Multiple Pipes or Multiple Processors18
1.7 Message Passing19
1.8 Virtual Shared Memory21
1.8.1 Routing21
1.9 Interconnection Topology22
1.9.1 Crossbar Switch23
1.9.2 Timeshared Bus23
1.9.3 Ring Connection24
1.9.4 Mesh Connection24
1.9.5 Hypercube25
1.9.6 Multi-staged Network26
1.10 Programming Techniques26
1.11 Trends:Network-Based Computing29
2 Overview of Current High-Performance Computers31
2.1 Supercomputers31
2.2 RISC-Based Processors34
2.3 Parallel Processors35
3 Implementation Details and Overhead39
3.1 Parallel Decomposition and Data Dependency Graphs39
3.2 Synchronization42
3.3 Load Balancing43
3.4 Recurrence44
3.5 Indirect Addressing46
3.6 Message Passing47
3.6.1 Performance Prediction49
3.6.2 Message-Passing Standards50
3.6.3 Routing55
4 Performance:Analysis,Modeling,and Measurements57
4.1 Amdahl's Law58
4.1.1 Simple Case of Amdahl's Law58
4.1.2 General Form of Amdahl's Law59
4.2 Vector Speed and Vector Length59
4.3 Amdahl's Law—Parallel Processing60
4.3.1 A Simple Model61
4.3.2 Gustafson's Model63
4.4 Examples of(r∞,n1/2)-values for Various Computers64
4.4.1 CRAY J90 and CRAY T90 (One Processor)65
4.4.2 General Observations66
4.5 LINPACK Benchmark66
4.5.1 Description of the Benchmark66
4.5.2 Calls to the BLAS67
4.5.3 Asymptotic Performance67
5 Building Blocks in Linear Algebra71
5.1 Basic Linear Algebra Subprograms71
5.1.1 Level 1 BLAS72
5.1.2 Level 2 BLAS73
5.1.3 Level 3 BLAS74
5.2 Levels of Parallelism76
5.2.1 Vector Computers76
5.2.2 Parallel Processors with Shared Memory78
5.2.3 Parallel-Vector Computers78
5.2.4 Clusters Computing78
5.3 Basic Factorizations of Linear Algebra79
5.3.1 Point Algorithm:Gaussian Elimination with Partial Piv-oting79
5.3.2 Special Matrices80
5.4 Blocked Algorithms:Matrix-Vector and Matrix-Matrix Versions83
5.4.1 Right-Looking Algorithm85
5.4.2 Left-Looking Algorithm86
5.4.3 Crout Algorithm87
5.4.4 Typical Performance of Blocked LU Decomposition88
5.4.5 Blocked Symmetric Indefinite Factorization89
5.4.6 Typical Performance of Blocked Symmetric Indefinite Fac-torization91
5.5 Linear Least Squares92
5.5.1 Householder Method93
5.5.2 Blocked Householder Method94
5.5.3 Typical Performance of the Blocked Householder Factor-ization95
5.6 Organization of the Modules95
5.6.1 Matrix-Vector Product96
5.6.2 Matrix-Matrix Product97
5.6.3 Typical Performance for Parallel Processing98
5.6.4 Benefits98
5.7 LAPACK99
5.8 ScaLAPACK100
5.8.1 The Basic Linear Algebra Communication Subprograms(BLACS)101
5.8.2 PBLAS102
5.8.3 ScaLAPACK Sample Code103
6 Direct Solution of Sparse Linear Systems107
6.1 Introduction to Direct Methods for Sparse Linear Systems111
6.1.1 Four Approaches111
6.1.2 Description of Sparse Data Structure112
6.1.3 Manipulation of Sparse Data Structures113
6.2 General Sparse Matrix Methods115
6.2.1 Fill-in and Sparsity Ordering115
6.2.2 Indirect Addressing—Its Effect and How to Avoid It118
6.2.3 Comparison with Dense Codes120
6.2.4 Other Approaches121
6.3 Methods for Symmetric Matrices and Band Systems123
6.3.1 The Clique Concept in Gaussian Elimination124
6.3.2 Further Comments on Ordering Schemes126
6.4 Frontal Methods126
6.4.1 Frontal Methods—Link to Band Methods and Numerical Pivoting128
6.4.2 Vector Performance129
6.4.3 Parallel Implementation of Frontal Schemes130
6.5 Multifrontal Methods131
6.5.1 Performance on Vector Machines135
6.5.2 Performance on RISC Machines136
6.5.3 Performance on Parallel Machines137
6.5.4 Exploitation of Structure142
6.5.5 Unsymmetric Multifrontal Methods143
6.6 Other Approaches for Exploitation of Parallelism144
6.7 Software145
6.8 Brief Summary147
7 Krylov Subspaces:Projection149
7.1 Notation149
7.2 Basic Iteration Methods:Richardson Iteration,Power Method150
7.3 Orthogonal Basis(Arnoldi,Lanczos)153
8 Iterative Methods for Linear Systems157
8.1 Krylov Subspace Solution Methods:Basic Principles157
8.1.1 The Ritz-Galerkin Approach:FOM and CG158
8.1.2 The Minimum Residual Approach:GMRES and MINRES159
8.1.3 The Petrov-Galerkin Approach:Bi-CG and QMR159
8.1.4 The Minimum Error Approach:SYMMLQ and GMERR161
8.2 Iterative Methods in More Detail162
8.2.1 The CG Method163
8.2.2 Parallelism in the CG Method:General Aspects165
8.2.3 Parallelism in the CG Method:Communication Overhead166
8.2.4 MINRES168
8.2.5 Least Squares CG170
8.2.6 GMRES and GMRES(m)172
8.2.7 GMRES with Variable Preconditioning175
8.2.8 Bi-CG and QMR179
8.2.9 CGS182
8.2.10 Bi-CGSTAB184
8.2.11 Bi-CGSTAB(e)and Variants186
8.3 Other Issues189
8.4 How to Test Iterative Methods191
9 Preconditioning and Parallel Preconditioning195
9.1 Preconditioning and Parallel Preconditioning195
9.2 The Purpose of Preconditioning195
9.3 Incomplete LU Decompositions199
9.3.1 Efficient Implementations of ILU(0) Preconditioning203
9.3.2 General Incomplete Decompositions204
9.3.3 Variants of ILU Preconditioners207
9.3.4 Some General Comments on ILU208
9.4 Some Other Forms of Preconditioning209
9.4.1 Sparse Approximate Inverse(SPAI)209
9.4.2 Polynomial Preconditioning211
9.4.3 Preconditioning by Blocks or Domains211
9.4.4 Element by Element Preconditioners213
9.5 Vector and Parallel Implementation of Preconditioners215
9.5.1 Partial Vectorization215
9.5.2 Reordering the Unknowns217
9.5.3 Changing the Order of Computation219
9.5.4 Some Other Vectorizable Preconditioners222
9.5.5 Parallel Aspects of Reorderings225
9.5.6 Experiences with Parallelism227
10 Linear Eigenvalue Problems Ax=λx231
10.1 Theoretical Background and Notation231
10.2 Single-Vector Methods232
10.3 The QR Algorithm234
10.4 Subspace Projection Methods235
10.5 The Arnoldi Factorization237
10.6 Restarting the Arnoldi Process239
10.6.1 Explicit Restarting239
10.7 Implicit Restarting240
10.8 Lanczos'Method243
10.9 Harmonic Ritz Values and Vectors245
10.10 Other Subspace Iteration Methods246
10.11 Davidson's Method248
10.12 The Jacobi-Davidson Iteration Method249
10.12.1 JDQR252
10.13 Eigenvalue Software:ARPACK,P_ARPACK253
10.13.1 Reverse Communication Interface254
10.13.2 Parallelizing ARPACK255
10.13.3 Data Distribution of the Arnoldi Factorization256
10.14 Message Passing259
10.15 Parallel Performance260
10.16 Availability261
10.17 Summary261
11 The Generalized Eigenproblem263
11.1 Arnoldi/Lanczos with Shift-Invert263
11.2 Alternatives to Arnoldi/Lanczos with Shift-Invert265
11.3 The Jacobi-Davidson QZ Algorithm266
11.4 The Jacobi-Davidson QZ Method:Restart and Deflation268
11.5 Parallel Aspects271
A Acquiring Mathematical Software273
A.1 netlib273
A.1.1 Mathematical Software275
A.2 Mathematical Software Libraries275
B Glossary277
C Level 1,2,and 3 BLAS Quick Reference291
D Operation Counts for Various BLAS and Decompositions295
Bibliography301
Index329