图书介绍

高性能计算机上的数值线性 英文影印版PDF|Epub|txt|kindle电子书版本网盘下载

高性能计算机上的数值线性 英文影印版
  • (美)冬格拉等著 著
  • 出版社: 北京:清华大学出版社
  • ISBN:9787302244998
  • 出版时间:2011
  • 标注页数:345页
  • 文件大小:15MB
  • 文件页数:362页
  • 主题词:线性代数计算法-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

高性能计算机上的数值线性 英文影印版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1 High-Performance Computing1

1.1 Trends in Computer Design1

1.2 Traditional Computers and Their Limitations2

1.3 Parallelism within a Single Processor3

1.3.1 Multiple Functional Units3

1.3.2 Pipelining3

1.3.3 Overlapping4

1.3.4 RISC5

1.3.5 VLIW6

1.3.6 Vector Instructions7

1.3.7 Chaining7

1.3.8 Memory-to-Memory and Register-to-Register Organizations8

1.3.9 Register Set9

1.3.10 Stripmining9

1.3.11 Reconfigurable Vector Registers10

1.3.12 Memory Organization10

1.4 Data Organization11

1.4.1 Main Memory12

1.4.2 Cache14

1.4.3 Local Memory15

1.5 Memory Management15

1.6 Parallelism through Multiple Pipes or Multiple Processors18

1.7 Message Passing19

1.8 Virtual Shared Memory21

1.8.1 Routing21

1.9 Interconnection Topology22

1.9.1 Crossbar Switch23

1.9.2 Timeshared Bus23

1.9.3 Ring Connection24

1.9.4 Mesh Connection24

1.9.5 Hypercube25

1.9.6 Multi-staged Network26

1.10 Programming Techniques26

1.11 Trends:Network-Based Computing29

2 Overview of Current High-Performance Computers31

2.1 Supercomputers31

2.2 RISC-Based Processors34

2.3 Parallel Processors35

3 Implementation Details and Overhead39

3.1 Parallel Decomposition and Data Dependency Graphs39

3.2 Synchronization42

3.3 Load Balancing43

3.4 Recurrence44

3.5 Indirect Addressing46

3.6 Message Passing47

3.6.1 Performance Prediction49

3.6.2 Message-Passing Standards50

3.6.3 Routing55

4 Performance:Analysis,Modeling,and Measurements57

4.1 Amdahl's Law58

4.1.1 Simple Case of Amdahl's Law58

4.1.2 General Form of Amdahl's Law59

4.2 Vector Speed and Vector Length59

4.3 Amdahl's Law—Parallel Processing60

4.3.1 A Simple Model61

4.3.2 Gustafson's Model63

4.4 Examples of(r∞,n1/2)-values for Various Computers64

4.4.1 CRAY J90 and CRAY T90 (One Processor)65

4.4.2 General Observations66

4.5 LINPACK Benchmark66

4.5.1 Description of the Benchmark66

4.5.2 Calls to the BLAS67

4.5.3 Asymptotic Performance67

5 Building Blocks in Linear Algebra71

5.1 Basic Linear Algebra Subprograms71

5.1.1 Level 1 BLAS72

5.1.2 Level 2 BLAS73

5.1.3 Level 3 BLAS74

5.2 Levels of Parallelism76

5.2.1 Vector Computers76

5.2.2 Parallel Processors with Shared Memory78

5.2.3 Parallel-Vector Computers78

5.2.4 Clusters Computing78

5.3 Basic Factorizations of Linear Algebra79

5.3.1 Point Algorithm:Gaussian Elimination with Partial Piv-oting79

5.3.2 Special Matrices80

5.4 Blocked Algorithms:Matrix-Vector and Matrix-Matrix Versions83

5.4.1 Right-Looking Algorithm85

5.4.2 Left-Looking Algorithm86

5.4.3 Crout Algorithm87

5.4.4 Typical Performance of Blocked LU Decomposition88

5.4.5 Blocked Symmetric Indefinite Factorization89

5.4.6 Typical Performance of Blocked Symmetric Indefinite Fac-torization91

5.5 Linear Least Squares92

5.5.1 Householder Method93

5.5.2 Blocked Householder Method94

5.5.3 Typical Performance of the Blocked Householder Factor-ization95

5.6 Organization of the Modules95

5.6.1 Matrix-Vector Product96

5.6.2 Matrix-Matrix Product97

5.6.3 Typical Performance for Parallel Processing98

5.6.4 Benefits98

5.7 LAPACK99

5.8 ScaLAPACK100

5.8.1 The Basic Linear Algebra Communication Subprograms(BLACS)101

5.8.2 PBLAS102

5.8.3 ScaLAPACK Sample Code103

6 Direct Solution of Sparse Linear Systems107

6.1 Introduction to Direct Methods for Sparse Linear Systems111

6.1.1 Four Approaches111

6.1.2 Description of Sparse Data Structure112

6.1.3 Manipulation of Sparse Data Structures113

6.2 General Sparse Matrix Methods115

6.2.1 Fill-in and Sparsity Ordering115

6.2.2 Indirect Addressing—Its Effect and How to Avoid It118

6.2.3 Comparison with Dense Codes120

6.2.4 Other Approaches121

6.3 Methods for Symmetric Matrices and Band Systems123

6.3.1 The Clique Concept in Gaussian Elimination124

6.3.2 Further Comments on Ordering Schemes126

6.4 Frontal Methods126

6.4.1 Frontal Methods—Link to Band Methods and Numerical Pivoting128

6.4.2 Vector Performance129

6.4.3 Parallel Implementation of Frontal Schemes130

6.5 Multifrontal Methods131

6.5.1 Performance on Vector Machines135

6.5.2 Performance on RISC Machines136

6.5.3 Performance on Parallel Machines137

6.5.4 Exploitation of Structure142

6.5.5 Unsymmetric Multifrontal Methods143

6.6 Other Approaches for Exploitation of Parallelism144

6.7 Software145

6.8 Brief Summary147

7 Krylov Subspaces:Projection149

7.1 Notation149

7.2 Basic Iteration Methods:Richardson Iteration,Power Method150

7.3 Orthogonal Basis(Arnoldi,Lanczos)153

8 Iterative Methods for Linear Systems157

8.1 Krylov Subspace Solution Methods:Basic Principles157

8.1.1 The Ritz-Galerkin Approach:FOM and CG158

8.1.2 The Minimum Residual Approach:GMRES and MINRES159

8.1.3 The Petrov-Galerkin Approach:Bi-CG and QMR159

8.1.4 The Minimum Error Approach:SYMMLQ and GMERR161

8.2 Iterative Methods in More Detail162

8.2.1 The CG Method163

8.2.2 Parallelism in the CG Method:General Aspects165

8.2.3 Parallelism in the CG Method:Communication Overhead166

8.2.4 MINRES168

8.2.5 Least Squares CG170

8.2.6 GMRES and GMRES(m)172

8.2.7 GMRES with Variable Preconditioning175

8.2.8 Bi-CG and QMR179

8.2.9 CGS182

8.2.10 Bi-CGSTAB184

8.2.11 Bi-CGSTAB(e)and Variants186

8.3 Other Issues189

8.4 How to Test Iterative Methods191

9 Preconditioning and Parallel Preconditioning195

9.1 Preconditioning and Parallel Preconditioning195

9.2 The Purpose of Preconditioning195

9.3 Incomplete LU Decompositions199

9.3.1 Efficient Implementations of ILU(0) Preconditioning203

9.3.2 General Incomplete Decompositions204

9.3.3 Variants of ILU Preconditioners207

9.3.4 Some General Comments on ILU208

9.4 Some Other Forms of Preconditioning209

9.4.1 Sparse Approximate Inverse(SPAI)209

9.4.2 Polynomial Preconditioning211

9.4.3 Preconditioning by Blocks or Domains211

9.4.4 Element by Element Preconditioners213

9.5 Vector and Parallel Implementation of Preconditioners215

9.5.1 Partial Vectorization215

9.5.2 Reordering the Unknowns217

9.5.3 Changing the Order of Computation219

9.5.4 Some Other Vectorizable Preconditioners222

9.5.5 Parallel Aspects of Reorderings225

9.5.6 Experiences with Parallelism227

10 Linear Eigenvalue Problems Ax=λx231

10.1 Theoretical Background and Notation231

10.2 Single-Vector Methods232

10.3 The QR Algorithm234

10.4 Subspace Projection Methods235

10.5 The Arnoldi Factorization237

10.6 Restarting the Arnoldi Process239

10.6.1 Explicit Restarting239

10.7 Implicit Restarting240

10.8 Lanczos'Method243

10.9 Harmonic Ritz Values and Vectors245

10.10 Other Subspace Iteration Methods246

10.11 Davidson's Method248

10.12 The Jacobi-Davidson Iteration Method249

10.12.1 JDQR252

10.13 Eigenvalue Software:ARPACK,P_ARPACK253

10.13.1 Reverse Communication Interface254

10.13.2 Parallelizing ARPACK255

10.13.3 Data Distribution of the Arnoldi Factorization256

10.14 Message Passing259

10.15 Parallel Performance260

10.16 Availability261

10.17 Summary261

11 The Generalized Eigenproblem263

11.1 Arnoldi/Lanczos with Shift-Invert263

11.2 Alternatives to Arnoldi/Lanczos with Shift-Invert265

11.3 The Jacobi-Davidson QZ Algorithm266

11.4 The Jacobi-Davidson QZ Method:Restart and Deflation268

11.5 Parallel Aspects271

A Acquiring Mathematical Software273

A.1 netlib273

A.1.1 Mathematical Software275

A.2 Mathematical Software Libraries275

B Glossary277

C Level 1,2,and 3 BLAS Quick Reference291

D Operation Counts for Various BLAS and Decompositions295

Bibliography301

Index329

热门推荐