Introduction to High Performance Scientific Computing
Victor Eijkhout
Computers & Technology
Introduction to High Performance Scientific Computing
Free
Description
Contents
Reviews

The field of high performance scientific computing lies at the
crossroads of a number of disciplines and skill sets, and
correspondingly, for someone to be successful at using high
performance computing in science requires at least elementary
knowledge of and skills in all these areas. This book brings
together the strands of numerical modeling, numerical linear
algebra, computer architecture, parallel computing, performance
optimization in a unified manner.



The contents of this book are a combination of theoretical material
and self-guided tutorials on various practical skills. Together,
this teaches a graduate student or advanced undergraduate the
necessary skills to be a successful computational scientist.


Victor Eijkhout is a research scientist at the
Texas Advanced Computing Center of
The University of Texas at Austin.



This book is released under a CC-BY license, thanks to a gift from the Saylor Foundation. Print copies and course materials are available from the author's web page.

Language
English
ISBN
978-1-257-99254-6
I Theory
Single-processor Computing
The Von Neumann architecture
Modern processors
The processing cores
8-bit, 16-bit, 32-bit, 64-bit
Caches: on-chip memory
Graphics, controllers, special purpose hardware
Superscalar processing and instruction-level parallelism
Memory Hierarchies
Busses
Latency and Bandwidth
Registers
Caches
Prefetch streams
Concurrency and memory transfer
Memory banks
TLB and virtual memory
Multicore architectures
Cache coherence
Computations on multicore chips
Locality and data reuse
Data reuse and arithmetic intensity
Locality
Programming strategies for high performance
Peak performance
Pipelining
Cache size
Cache lines
TLB
Cache associativity
Loop tiling
Optimization strategies
Cache aware and cache oblivious programming
Case study: Matrix-vector product
Case study: Goto matrix-matrix product
Power consumption
Derivation of scaling properties
Multicore
Total computer power
Review questions
Parallel Computing
Introduction
Quantifying parallelism
Definitions
Asymptotics
Amdahl's law
Scalability
Simulation scaling
Parallel Computers Architectures
SIMD
MIMD / SPMD computers
The commoditization of supercomputers
Different types of memory access
Symmetric Multi-Processors: Uniform Memory Access
Non-Uniform Memory Access
Logically and physically distributed memory
Granularity of parallelism
Data parallelism
Instruction-level parallelism
Task-level parallelism
Conveniently parallel computing
Medium-grain data parallelism
Task granularity
Parallel programming
Thread parallelism
OpenMP
Distributed memory programming through message passing
Hybrid shared/distributed memory computing
Parallel languages
OS-based approaches
Active messages
Bulk synchronous parallelism
Program design for parallelism
Topologies
Some graph theory
Busses
Linear arrays and rings
2D and 3D arrays
Hypercubes
Switched networks
Cluster networks
Bandwidth and latency
Locality in parallel computing
Multi-threaded architectures
Co-processors
A little history
Bottlenecks
GPU computing
Intel Xeon Phi
Remaining topics
Load balancing
Distributed computing, grid computing, cloud computing
Usage scenarios
Characterization
Capability versus capacity computing
FPGA computing
MapReduce
The top500 list
The top500 list as a recent history of supercomputing
Heterogeneous computing
Computer Arithmetic
Integers
Real numbers
They're not really real numbers
Representation of real numbers
Limitations
Normalized and unnormalized numbers
Representation error
Machine precision
The IEEE 754 standard for floating point numbers
Round-off error analysis
Correct rounding
Addition
Multiplication
Subtraction
Examples
Roundoff error in parallel computations
Compilers and round-off
More about floating point arithmetic
Programming languages
Other computer arithmetic systems
Extended precision
Fixed-point arithmetic
Complex numbers
Conclusions
Numerical treatment of differential equations
Initial value problems
Error and stability
Finite difference approximation: Euler explicit and implicit methods
Boundary value problems
General PDE theory
The Poisson equation in one space dimension
The Poisson equation in two space dimensions
Difference stencils
Other discretization techniques
Initial boundary value problem
Discretization
Stability analysis
Numerical linear algebra
Elimination of unknowns
Linear algebra in computer arithmetic
Roundoff control during elimination
Influence of roundoff on eigenvalue computations
LU factorization
The algorithm
The Cholesky factorization
Uniqueness
Pivoting
Solving the system
Complexity
Block algorithms
Sparse matrices
Storage of sparse matrices
Sparse matrices and graph theory
LU factorizations of sparse matrices
Iterative methods
Abstract presentation
Convergence and error analysis
Computational form
Convergence of the method
Choice of K
Stopping tests
Theory of general iterative methods
Iterating by orthogonalization
Coupled recurrences form of iterative methods
The method of Conjugate Gradients
Derivation from minimization
GMRES
Complexity
Further Reading
High performance linear algebra
Collective operations
Broadcast
Reduction
Allreduce
Allgather
Reduce-scatter
The sparse matrix-vector product
Parallel dense matrix-vector product
Implementing the block-row case
Scalability of the dense matrix-vector product
LU factorization in parallel
Solving a triangular system
Factorization
Parallel sparse matrix-vector product
Parallel efficiency of the sparse matrix-vector product
Memory behaviour of the sparse matrix-vector product
The transpose product
Setup of the sparse matrix-vector product
Parallelism in solving linear systems from PDEs
Computational aspects of iterative methods
Vector operations
Finite element matrix construction
Parallel preconditioners
Jacobi preconditioning
The trouble with parallel ILU
Block Jacobi methods
Parallel ILU
Ordering strategies and parallelism
Nested dissection
Variable reordering and colouring: independent sets
Operator splitting
Parallelism and implicit operations
Wavefronts
Recursive doubling
Approximating implicit by explicit operations, series expansion
Grid updates
Analysis
Communication and work minimizing strategy
Block algorithms on multicore architectures
II Applications
Molecular dynamics
Force Computation
Force Fields
Computing Short-Range Nonbonded Forces
Computing Long-Range Forces
Parallel Decompositions
Atom Decompositions
Force Decompositions
Spatial Decompositions
Neutral Territory Methods
Parallel Fast Fourier Transform
Parallel 1-D FFT
Parallel 3-D FFT
Integration for Molecular Dynamics
Sorting
Brief introduction to sorting
Quicksort
Quicksort in shared memory
Quicksort on a hypercube
Quicksort on a general parallel processor
Bitonic sort
Graph analytics
Traditional graph algorithms
Shortest path algorithms
Parallelization
Floyd-Warshall all-pairs shortest path
Spanning trees
`Real world' graphs
Properties of random graphs
Concepts
Hypertext algorithms
HITS
PageRank
Large-scale computational graph theory
N-body problems
The Barnes-Hut algorithm
The Fast Multipole Method
Full computation
Implementation
Vectorization
Shared memory implementation
Distributed memory implementation
Monte Carlo Methods
Parallel Random Number Generation
Examples
Integration by statistics
Monte Carlo simulation of the Ising model
Computational biology
Dynamic programming approaches
Suffix tree
Big data
Recommender systems
Stochastic gradient descent
Alternating least squares
Other physics applications
Lattice Boltzmann methods
Hartree-Fock / Density Functional Theory
III Appendices
Linear algebra
Norms
Gram-Schmidt orthogonalization
The power method
Nonnegative matrices; Perron vectors
The Gershgorin theorem
Householder reflectors
Complexity
Partial Differential Equations
Partial derivatives
Poisson or Laplace Equation
Heat Equation
Steady state
Taylor series
Graph theory
Definitions
Common types of graphs
Graph colouring and independent sets
Graphs and matrices
Spectral graph theory
Fourier Transforms
Automata theory
Finite State Automatons (FSAs)
General discussion
Parallel Prefix
IV Tutorials
Unix intro
Files and such
Text searching and regular expressions
Command execution
Scripting
Expansion
Shell interaction
The system and other users
The sed and awk tools
Review questions
Compilers and libraries
An introduction to binary files
Simple compilation
Libraries
Managing projects with Make
A simple example
Makefile power tools
Miscellania
Shell scripting in a Makefile
A Makefile for LaTeX
Source code control
Workflow in source code control systems
Subversion or SVN
Mercurial or hg
Scientific Data Storage
Introduction to HDF5
Creating a file
Datasets
Writing the data
Reading
Scientific Libraries
The Portable Extendable Toolkit for Scientific Computing
Libraries for dense linear algebra: Lapack and Scalapack
Plotting with GNUplot
Usage modes
Plotting
Workflow
Good coding practices
Defensive programming
Guarding against memory errors
Testing
Debugging
Invoking gdb
Finding errors
Memory debugging with Valgrind
Stepping through a program
Inspecting values
Breakpoints
Further reading
Performance measurement
Timers
Accurate counters
Profiling tools
Tracing
C/Fortran interoperability
Linker conventions
Arrays
Strings
Subprogram arguments
Input/output
Fortran/C interoperability in Fortran2003
LaTeX for scientific documentation
The idea behind LaTeX, some history
A gentle introduction to LaTeX
A worked out example
Where to take it from here
Review questions
V Projects, codes, index
Class projects
Cache simulation and analysis
Evaluation of Bulk Synchronous Programming
Heat equation
The memory wall
Codes
Hardware event counting
Test setup
Cache size
Cachelines
Cache associativity
TLB
Unrepresentible numbers
Index and list of acronyms
The book hasn't received reviews yet.