 Unknown
Computers & Technology
Bayesian Reasoning and Machine Learning
Free
Description
Contents
Reviews
Language
English
ISBN
Unknown
Front Matter
Notation List
Preface
BRML toolbox
Contents
I Inference in Probabilistic Models
Probabilistic Reasoning
Probability Refresher
Interpreting Conditional Probability
Probability Tables
Probabilistic Reasoning
Prior, Likelihood and Posterior
Two dice : what were the individual scores?
Summary
Code
Basic Probability code
General utilities
An example
Exercises
Basic Graph Concepts
Graphs
Numerically Encoding Graphs
Edge list
Clique matrix
Summary
Code
Utility routines
Exercises
Belief Networks
The Benefits of Structure
Modelling independencies
Reducing the burden of specification
Uncertain and Unreliable Evidence
Uncertain evidence
Unreliable evidence
Belief Networks
Conditional independence
The impact of collisions
Graphical path manipulations for independence
d-Separation
Graphical and distributional in/dependence
Markov equivalence in belief networks
Belief networks have limited expressibility
Causality
The do-calculus
Influence diagrams and the do-calculus
Summary
Code
Naive inference demo
Conditional independence demo
Utility routines
Exercises
Graphical Models
Graphical Models
Markov Networks
Markov properties
Markov random fields
Hammersley-Clifford Theorem
Conditional independence using Markov networks
Lattice Models
Chain Graphical Models
Factor Graphs
Conditional independence in factor graphs
Expressiveness of Graphical Models
Summary
Code
Exercises
Efficient Inference in Trees
Marginal Inference
Variable elimination in a Markov chain and message passing
The sum-product algorithm on factor graphs
Dealing with Evidence
Computing the marginal likelihood
The problem with loops
Other Forms of Inference
Max-Product
Finding the N most probable states
Most probable path and shortest path
Mixed inference
Inference in Multiply Connected Graphs
Bucket elimination
Loop-cut conditioning
Message Passing for Continuous Distributions
Summary
Code
Factor graph examples
Most probable and shortest path
Bucket elimination
Message passing on Gaussians
Exercises
The Junction Tree Algorithm
Clustering Variables
Reparameterisation
Clique Graphs
Absorption
Absorption schedule on clique trees
Junction Trees
The running intersection property
Constructing a Junction Tree for Singly-Connected Distributions
Moralisation
Forming the clique graph
Forming a junction tree from a clique graph
Assigning potentials to cliques
Junction Trees for Multiply-Connected Distributions
Triangulation algorithms
The Junction Tree Algorithm
Remarks on the JTA
Computing the normalisation constant of a distribution
The marginal likelihood
Some small JTA examples
Shafer-Shenoy propagation
Finding the Most Likely State
Reabsorption : Converting a Junction Tree to a Directed Network
The Need For Approximations
Bounded width junction trees
Summary
Code
Utility routines
Exercises
Making Decisions
Expected Utility
Utility of money
Decision Trees
Extending Bayesian Networks for Decisions
Syntax of influence diagrams
Solving Influence Diagrams
Messages on an ID
Using a junction tree
Markov Decision Processes
Maximising expected utility by message passing
Bellman's equation
Temporally Unbounded MDPs
Value iteration
Policy iteration
A curse of dimensionality
Variational Inference and Planning
Financial Matters
Options pricing and expected utility
Binomial options pricing model
Optimal investment
Further Topics
Partially observable MDPs
Reinforcement learning
Summary
Code
Sum/Max under a partial order
Junction trees for influence diagrams
Party-Friend example
Chest Clinic with Decisions
Markov decision processes
Exercises
II Learning in Probabilistic Models
Statistics for Machine Learning
Representing Data
Categorical
Ordinal
Numerical
Distributions
The Kullback-Leibler Divergence KL( q|p )
Entropy and information
Classical Distributions
Multivariate Gaussian
Completing the square
Conditioning as system reversal
Whitening and centering
Exponential Family
Conjugate priors
Learning distributions
Properties of Maximum Likelihood
Training assuming the correct model class
Training when the assumed model is incorrect
Maximum likelihood and the empirical distribution
Learning a Gaussian
Maximum likelihood training
Bayesian inference of the mean and variance
Gauss-Gamma distribution
Summary
Code
Exercises
Learning as Inference
Learning as Inference
Learning the bias of a coin
Making decisions
A continuum of parameters
Decisions based on continuous intervals
Bayesian methods and ML-II
Maximum Likelihood Training of Belief Networks
Bayesian Belief Network Training
Global and local parameter independence
Learning binary variable tables using a Beta prior
Learning multivariate discrete tables using a Dirichlet prior
Structure learning
PC algorithm
Empirical independence
Network scoring
Chow-Liu Trees
Maximum Likelihood for Undirected models
General tabular clique potentials
Decomposable Markov networks
Exponential form potentials
Conditional random fields
Pseudo likelihood
Learning the structure
Summary
Code
PC algorithm using an oracle
Demo of empirical conditional independence
Bayes Dirichlet structure learning
Exercises
Naive Bayes
Naive Bayes and Conditional Independence
Estimation using Maximum Likelihood
Binary attributes
Multi-state variables
Text classification
Bayesian Naive Bayes
Tree Augmented Naive Bayes
Learning tree augmented Naive Bayes networks
Summary
Code
Exercises
Learning with Hidden Variables
Hidden Variables and Missing Data
Why hidden/missing variables can complicate proceedings
The missing at random assumption
Maximum likelihood
Identifiability issues
Expectation Maximisation
Variational EM
Classical EM
Application to Belief networks
General case
Convergence
Application to Markov networks
Extensions of EM
Partial M step
Partial E-step
A failure case for EM
Variational Bayes
EM is a special case of variational Bayes
An example: VB for the Asbestos-Smoking-Cancer network
Optimising the Likelihood by Gradient Methods
Undirected models
Summary
Code
Exercises
Bayesian Model Selection
Comparing Models the Bayesian Way
Illustrations : coin tossing
A discrete parameter space
A continuous parameter space
Occam's Razor and Bayesian Complexity Penalisation
A continuous example : curve fitting
Approximating the Model Likelihood
Laplace's method
Bayes information criterion (BIC)
Bayesian Hypothesis Testing for Outcome Analysis
Outcome analysis
Hindep : model likelihood
Hsame : model likelihood
Dependent outcome analysis
Is classifier A better than B?
Summary
Code
Exercises
III Machine Learning
Machine Learning Concepts
Styles of Learning
Supervised learning
Unsupervised learning
Anomaly detection
Online (sequential) learning
Interacting with the environment
Semi-supervised learning
Supervised Learning
Utility and Loss
Using the empirical distribution
Bayesian decision approach
Bayes versus Empirical Decisions
Summary
Exercises
Nearest Neighbour Classification
K-Nearest Neighbours
A Probabilistic Interpretation of Nearest Neighbours
When your nearest neighbour is far away
Summary
Code
Exercises
Unsupervised Linear Dimension Reduction
High-Dimensional Spaces – Low Dimensional Manifolds
Principal Components Analysis
Deriving the optimal linear reconstruction
Maximum variance criterion
PCA algorithm
PCA and nearest neighbours classification
High Dimensional Data
Eigen-decomposition for N<D
PCA via Singular value decomposition
Latent Semantic Analysis
Information retrieval
PCA With Missing Data
Finding the principal directions
Collaborative filtering using PCA with missing data
Matrix Decomposition Methods
Probabilistic latent semantic analysis
Extensions and variations
Applications of PLSA/NMF
Kernel PCA
Canonical Correlation Analysis
SVD formulation
Summary
Code
Exercises
Supervised Linear Dimension Reduction
Supervised Linear Projections
Fisher's Linear Discriminant
Canonical Variates
Dealing with the nullspace
Summary
Code
Exercises
Linear Models
Introduction: Fitting A Straight Line
Linear Parameter Models for Regression
Vector outputs
Regularisation
The Dual Representation and Kernels
Regression in the dual-space
Linear Parameter Models for Classification
Logistic regression
Avoiding overconfident classification
Multiple classes
The Kernel Trick for Classification
Support Vector Machines
Maximum margin linear classifier
Using kernels
Performing the optimisation
Probabilistic interpretation
Soft Zero-One Loss for Outlier Robustness
Summary
Code
Exercises
Bayesian Linear Models
Bayesian linear parameter models
Determining hyperparameters: ML-II
Learning the hyperparameters using EM
Hyperparameter optimisation : using the gradient
Validation likelihood
Prediction and model averaging
Sparse linear models
Classification
Hyperparameter optimisation
Laplace approximation
Variational Gaussian approximation
Local variational approximation
Relevance vector machine for classification
Multi-class case
Summary
Code
Exercises
Gaussian Processes
Non-Parametric Prediction
From parametric to non-parametric
From Bayesian linear models to Gaussian processes
A prior on functions
Gaussian Process Prediction
Regression with noisy training outputs
Covariance Functions
Making new covariance functions from old
Stationary covariance functions
Non-stationary covariance functions
Analysis of Covariance Functions
Smoothness of the functions
Mercer kernels
Fourier analysis for stationary kernels
Gaussian Processes for Classification
Binary classification
Laplace's approximation
Hyperparameter optimisation
Multiple classes
Summary
Code
Exercises
Mixture Models
Density Estimation Using Mixtures
Expectation Maximisation for Mixture Models
Unconstrained discrete tables
Mixture of product of Bernoulli distributions
The Gaussian Mixture Model
EM algorithm
Practical issues
Classification using Gaussian mixture models
The Parzen estimator
K-Means
Bayesian mixture models
Semi-supervised learning
Mixture of Experts
Indicator Models
Joint indicator approach: factorised prior
Polya prior
Mixed Membership Models
Latent Dirichlet allocation
Graph based representations of data
Summary
Code
Exercises
Latent Linear Models
Factor Analysis
Finding the optimal bias
Factor Analysis : Maximum Likelihood
Eigen-approach likelihood optimisation
Expectation maximisation
Interlude: Modelling Faces
Probabilistic Principal Components Analysis
Canonical Correlation Analysis and Factor Analysis
Independent Components Analysis
Summary
Code
Exercises
Latent Ability Models
The Rasch Model
Maximum likelihood training
Bayesian Rasch models
Competition Models
Elo ranking model
Glicko and TrueSkill
Summary
Code
Exercises
IV Dynamical Models
Discrete-State Markov Models
Markov Models
Equilibrium and stationary distribution of a Markov chain
Fitting Markov models
Mixture of Markov models
Hidden Markov Models
The classical inference problems
Filtering p(ht|v1:t)
Parallel smoothing p(ht|v1:T)
Correction smoothing
Sampling from p(h1:T|v1:T)
Most likely joint state
Prediction
Self localisation and kidnapped robots
Natural language models
Learning HMMs
EM algorithm
Mixture emission
The HMM-GMM
Discriminative training
Related Models
Explicit duration model
Input-Output HMM
Linear chain CRFs
Dynamic Bayesian networks
Applications
Object tracking
Automatic speech recognition
Bioinformatics
Part-of-speech tagging
Summary
Code
Exercises
Continuous-state Markov Models
Observed Linear Dynamical Systems
Stationary distribution with noise
Auto-Regressive Models
Training an AR model
AR model as an OLDS
Time-varying AR model
Time-varying variance AR models
Latent Linear Dynamical Systems
Inference
Filtering
Smoothing : Rauch-Tung-Striebel correction method
The likelihood
Most likely state
Time independence and Riccati equations
Learning Linear Dynamical Systems
Identifiability issues
EM algorithm
Subspace Methods
Structured LDSs
Bayesian LDSs
Switching Auto-Regressive Models
Inference
Maximum likelihood learning using EM
Summary
Code
Autoregressive models
Exercises
Switching Linear Dynamical Systems
Introduction
The Switching LDS
Exact inference is computationally intractable
Gaussian Sum Filtering
Continuous filtering
Discrete filtering
The likelihood p(v1:T)
Collapsing Gaussians
Relation to other methods
Gaussian Sum Smoothing
Continuous smoothing
Discrete smoothing
Collapsing the mixture
Using mixtures in smoothing
Relation to other methods
Reset Models
A Poisson reset model
Reset-HMM-LDS
Summary
Code
Exercises
Distributed Computation
Introduction
Stochastic Hopfield Networks
Learning Sequences
A single sequence
Multiple sequences
Boolean networks
Sequence disambiguation
Tractable Continuous Latent Variable Models
Deterministic latent variables
An augmented Hopfield network
Neural Models
Stochastically spiking neurons
Hopfield membrane potential
Dynamic synapses
Leaky integrate and fire models
Summary
Code
Exercises
V Approximate Inference
Sampling
Introduction
Univariate sampling
Rejection sampling
Multivariate sampling
Ancestral Sampling
Dealing with evidence
Perfect sampling for a Markov network
Gibbs Sampling
Gibbs sampling as a Markov chain
Structured Gibbs sampling
Remarks
Markov Chain Monte Carlo (MCMC)
Markov chains
Metropolis-Hastings sampling
Auxiliary Variable Methods
Hybrid Monte Carlo
Swendson-Wang
Slice sampling
Importance Sampling
Sequential importance sampling
Particle filtering as an approximate forward pass
Summary
Code
Exercises
Deterministic Approximate Inference
Introduction
The Laplace approximation
Properties of Kullback-Leibler Variational Inference
Bounding the normalisation constant
Bounding the marginal likelihood
Bounding marginal quantities
Gaussian approximations using KL divergence
Marginal and moment matching properties of minimising KL( p|q )
Variational Bounding Using KL( q|p )
Pairwise Markov random field
General mean field equations
Asynchronous updating guarantees approximation improvement
Structured variational approximation
Local and KL Variational Approximations
Local approximation
KL variational approximation
Mutual Information Maximisation : A KL Variational Approach
The information maximisation algorithm
Linear Gaussian decoder
Loopy Belief Propagation
Classical BP on an undirected graph
Loopy BP as a variational procedure
Expectation Propagation
MAP for Markov networks
Pairwise Markov networks
Attractive binary Markov networks
Potts model
Summary
Code
Exercises
End Matter
VI Appendix
Background Mathematics
Linear Algebra
Vector algebra
The scalar product as a projection
Lines in space
Planes and hyperplanes
Matrices
Linear transformations
Determinants
Matrix inversion
Computing the matrix inverse
Eigenvalues and eigenvectors
Matrix decompositions
Multivariate Calculus
Higher derivatives
Matrix calculus
Inequalities
Convexity
Jensen's inequality
Optimisation
Multivariate Optimisation