Introduction to High Performance Scientific Computing
Free

Introduction to High Performance Scientific Computing

By Victor Eijkhout
Free
Book Description

The field of high performance scientific computing lies at the
crossroads of a number of disciplines and skill sets, and
correspondingly, for someone to be successful at using high
performance computing in science requires at least elementary
knowledge of and skills in all these areas. This book brings
together the strands of numerical modeling, numerical linear
algebra, computer architecture, parallel computing, performance
optimization in a unified manner.

The contents of this book are a combination of theoretical material
and self-guided tutorials on various practical skills. Together,
this teaches a graduate student or advanced undergraduate the
necessary skills to be a successful computational scientist.

Victor Eijkhout is a research scientist at the
Texas Advanced Computing Center of
The University of Texas at Austin.

This book is released under a CC-BY license, thanks to a gift from the Saylor Foundation. Print copies and course materials are available from the author's web page.

Table of Contents
  • I Theory
    • Single-processor Computing
      • The Von Neumann architecture
      • Modern processors
        • The processing cores
        • 8-bit, 16-bit, 32-bit, 64-bit
        • Caches: on-chip memory
        • Graphics, controllers, special purpose hardware
        • Superscalar processing and instruction-level parallelism
      • Memory Hierarchies
        • Busses
        • Latency and Bandwidth
        • Registers
        • Caches
        • Prefetch streams
        • Concurrency and memory transfer
        • Memory banks
        • TLB and virtual memory
      • Multicore architectures
        • Cache coherence
        • Computations on multicore chips
      • Locality and data reuse
        • Data reuse and arithmetic intensity
        • Locality
      • Programming strategies for high performance
        • Peak performance
        • Pipelining
        • Cache size
        • Cache lines
        • TLB
        • Cache associativity
        • Loop tiling
        • Optimization strategies
        • Cache aware and cache oblivious programming
        • Case study: Matrix-vector product
        • Case study: Goto matrix-matrix product
      • Power consumption
        • Derivation of scaling properties
        • Multicore
        • Total computer power
      • Review questions
    • Parallel Computing
      • Introduction
      • Quantifying parallelism
        • Definitions
        • Asymptotics
        • Amdahl's law
        • Scalability
        • Simulation scaling
      • Parallel Computers Architectures
        • SIMD
        • MIMD / SPMD computers
        • The commoditization of supercomputers
      • Different types of memory access
        • Symmetric Multi-Processors: Uniform Memory Access
        • Non-Uniform Memory Access
        • Logically and physically distributed memory
      • Granularity of parallelism
        • Data parallelism
        • Instruction-level parallelism
        • Task-level parallelism
        • Conveniently parallel computing
        • Medium-grain data parallelism
        • Task granularity
      • Parallel programming
        • Thread parallelism
        • OpenMP
        • Distributed memory programming through message passing
        • Hybrid shared/distributed memory computing
        • Parallel languages
        • OS-based approaches
        • Active messages
        • Bulk synchronous parallelism
        • Program design for parallelism
      • Topologies
        • Some graph theory
        • Busses
        • Linear arrays and rings
        • 2D and 3D arrays
        • Hypercubes
        • Switched networks
        • Cluster networks
        • Bandwidth and latency
        • Locality in parallel computing
      • Multi-threaded architectures
      • Co-processors
        • A little history
        • Bottlenecks
        • GPU computing
        • Intel Xeon Phi
      • Remaining topics
        • Load balancing
        • Distributed computing, grid computing, cloud computing
        • Usage scenarios
        • Characterization
      • Capability versus capacity computing
      • FPGA computing
      • MapReduce
      • The top500 list
        • The top500 list as a recent history of supercomputing
      • Heterogeneous computing
    • Computer Arithmetic
      • Integers
      • Real numbers
        • They're not really real numbers
        • Representation of real numbers
        • Limitations
        • Normalized and unnormalized numbers
        • Representation error
        • Machine precision
        • The IEEE 754 standard for floating point numbers
      • Round-off error analysis
        • Correct rounding
        • Addition
        • Multiplication
        • Subtraction
        • Examples
        • Roundoff error in parallel computations
      • Compilers and round-off
      • More about floating point arithmetic
        • Programming languages
        • Other computer arithmetic systems
        • Extended precision
        • Fixed-point arithmetic
        • Complex numbers
      • Conclusions
    • Numerical treatment of differential equations
      • Initial value problems
        • Error and stability
        • Finite difference approximation: Euler explicit and implicit methods
      • Boundary value problems
        • General PDE theory
        • The Poisson equation in one space dimension
        • The Poisson equation in two space dimensions
        • Difference stencils
        • Other discretization techniques
      • Initial boundary value problem
        • Discretization
        • Stability analysis
    • Numerical linear algebra
      • Elimination of unknowns
      • Linear algebra in computer arithmetic
        • Roundoff control during elimination
        • Influence of roundoff on eigenvalue computations
      • LU factorization
        • The algorithm
        • The Cholesky factorization
        • Uniqueness
        • Pivoting
        • Solving the system
        • Complexity
        • Block algorithms
      • Sparse matrices
        • Storage of sparse matrices
        • Sparse matrices and graph theory
        • LU factorizations of sparse matrices
      • Iterative methods
        • Abstract presentation
        • Convergence and error analysis
        • Computational form
        • Convergence of the method
        • Choice of K
        • Stopping tests
        • Theory of general iterative methods
        • Iterating by orthogonalization
        • Coupled recurrences form of iterative methods
        • The method of Conjugate Gradients
        • Derivation from minimization
        • GMRES
        • Complexity
      • Further Reading
    • High performance linear algebra
      • Collective operations
        • Broadcast
        • Reduction
        • Allreduce
        • Allgather
        • Reduce-scatter
      • The sparse matrix-vector product
      • Parallel dense matrix-vector product
        • Implementing the block-row case
        • Scalability of the dense matrix-vector product
      • LU factorization in parallel
        • Solving a triangular system
        • Factorization
      • Parallel sparse matrix-vector product
        • Parallel efficiency of the sparse matrix-vector product
        • Memory behaviour of the sparse matrix-vector product
        • The transpose product
        • Setup of the sparse matrix-vector product
      • Parallelism in solving linear systems from PDEs
      • Computational aspects of iterative methods
        • Vector operations
        • Finite element matrix construction
      • Parallel preconditioners
        • Jacobi preconditioning
        • The trouble with parallel ILU
        • Block Jacobi methods
        • Parallel ILU
      • Ordering strategies and parallelism
        • Nested dissection
        • Variable reordering and colouring: independent sets
      • Operator splitting
      • Parallelism and implicit operations
        • Wavefronts
        • Recursive doubling
        • Approximating implicit by explicit operations, series expansion
      • Grid updates
        • Analysis
        • Communication and work minimizing strategy
      • Block algorithms on multicore architectures
  • II Applications
    • Molecular dynamics
      • Force Computation
        • Force Fields
        • Computing Short-Range Nonbonded Forces
        • Computing Long-Range Forces
      • Parallel Decompositions
        • Atom Decompositions
        • Force Decompositions
        • Spatial Decompositions
        • Neutral Territory Methods
      • Parallel Fast Fourier Transform
        • Parallel 1-D FFT
        • Parallel 3-D FFT
      • Integration for Molecular Dynamics
    • Sorting
      • Brief introduction to sorting
      • Quicksort
        • Quicksort in shared memory
        • Quicksort on a hypercube
        • Quicksort on a general parallel processor
      • Bitonic sort
    • Graph analytics
      • Traditional graph algorithms
        • Shortest path algorithms
        • Parallelization
        • Floyd-Warshall all-pairs shortest path
        • Spanning trees
      • `Real world' graphs
        • Properties of random graphs
        • Concepts
      • Hypertext algorithms
        • HITS
        • PageRank
      • Large-scale computational graph theory
    • N-body problems
      • The Barnes-Hut algorithm
      • The Fast Multipole Method
      • Full computation
      • Implementation
        • Vectorization
        • Shared memory implementation
        • Distributed memory implementation
    • Monte Carlo Methods
      • Parallel Random Number Generation
      • Examples
        • Integration by statistics
        • Monte Carlo simulation of the Ising model
    • Computational biology
      • Dynamic programming approaches
      • Suffix tree
    • Big data
      • Recommender systems
        • Stochastic gradient descent
        • Alternating least squares
    • Other physics applications
      • Lattice Boltzmann methods
      • Hartree-Fock / Density Functional Theory
  • III Appendices
    • Linear algebra
      • Norms
      • Gram-Schmidt orthogonalization
      • The power method
      • Nonnegative matrices; Perron vectors
      • The Gershgorin theorem
      • Householder reflectors
    • Complexity
    • Partial Differential Equations
      • Partial derivatives
      • Poisson or Laplace Equation
      • Heat Equation
      • Steady state
    • Taylor series
    • Graph theory
      • Definitions
      • Common types of graphs
      • Graph colouring and independent sets
      • Graphs and matrices
      • Spectral graph theory
    • Fourier Transforms
    • Automata theory
      • Finite State Automatons (FSAs)
      • General discussion
    • Parallel Prefix
  • IV Tutorials
    • Unix intro
      • Files and such
      • Text searching and regular expressions
      • Command execution
      • Scripting
      • Expansion
      • Shell interaction
      • The system and other users
      • The sed and awk tools
      • Review questions
    • Compilers and libraries
      • An introduction to binary files
      • Simple compilation
      • Libraries
    • Managing projects with Make
      • A simple example
      • Makefile power tools
      • Miscellania
      • Shell scripting in a Makefile
      • A Makefile for LaTeX
    • Source code control
      • Workflow in source code control systems
      • Subversion or SVN
      • Mercurial or hg
    • Scientific Data Storage
      • Introduction to HDF5
      • Creating a file
      • Datasets
      • Writing the data
      • Reading
    • Scientific Libraries
      • The Portable Extendable Toolkit for Scientific Computing
      • Libraries for dense linear algebra: Lapack and Scalapack
    • Plotting with GNUplot
      • Usage modes
      • Plotting
      • Workflow
    • Good coding practices
      • Defensive programming
      • Guarding against memory errors
      • Testing
    • Debugging
      • Invoking gdb
      • Finding errors
      • Memory debugging with Valgrind
      • Stepping through a program
      • Inspecting values
      • Breakpoints
      • Further reading
    • Performance measurement
      • Timers
      • Accurate counters
      • Profiling tools
      • Tracing
    • C/Fortran interoperability
      • Linker conventions
      • Arrays
      • Strings
      • Subprogram arguments
      • Input/output
      • Fortran/C interoperability in Fortran2003
    • LaTeX for scientific documentation
      • The idea behind LaTeX, some history
      • A gentle introduction to LaTeX
      • A worked out example
      • Where to take it from here
      • Review questions
  • V Projects, codes, index
    • Class projects
      • Cache simulation and analysis
      • Evaluation of Bulk Synchronous Programming
      • Heat equation
      • The memory wall
    • Codes
      • Hardware event counting
      • Test setup
      • Cache size
      • Cachelines
      • Cache associativity
      • TLB
      • Unrepresentible numbers
    • Index and list of acronyms
    No review for this book yet, be the first to review.
      No comment for this book yet, be the first to comment
      You May Also Like
      Also Available On
      App store smallGoogle play small
      Categories
      Curated Lists
      • Pattern Recognition and Machine Learning (Information Science and Statistics)
        by Christopher M. Bishop
        Data mining
        by I. H. Witten
        The Elements of Statistical Learning: Data Mining, Inference, and Prediction
        by Various
        See more...
      • CK-12 Chemistry
        by Various
        Concept Development Studies in Chemistry
        by John Hutchinson
        An Introduction to Chemistry - Atoms First
        by Mark Bishop
        See more...
      • Microsoft Word - How to Use Advanced Algebra II.doc
        by Jonathan Emmons
        Advanced Algebra II: Activities and Homework
        by Kenny Felder
        de2de
        by
        See more...
      • The Sun Who Lost His Way
        by
        Tania is a Detective
        by Kanika G
        Firenze_s-Light
        by
        See more...
      • Java 3D Programming
        by Daniel Selman
        The Java EE 6 Tutorial
        by Oracle Corporation
        JavaKid811
        by
        See more...