Pragmatic Machine Learning with Python
US$ 19.95
The publisher has enabled DRM protection, which means that you need to use the BookFusion iOS, Android or Web app to read this eBook. This eBook cannot be used outside of the BookFusion platform.
Description
Contents
Reviews
Language
English
ISBN
9789389845365
Cover Page
Title Page
Copyright Page
Dedication
About the Author
Acknowledgement
Preface
Errata
Table of Contents
1. Introduction to Machine Learning and Mathematical Preliminaries
Structure
Objective
Purpose of machine learning
What is a machine learning model?
What is a dataset?
What are the variables and features?
Predictor and target variables
Types of variables: Continuous and categorical
Lifecycle of a machine learning model
Pre-conditions of a successful ML project
Different types of the learning process
Supervised learning
Unsupervised learning
Parameter and hyperparameter
Machine learning models by objective
Predictive machine learning
Descriptive machine learning
Machine learning models by problem type
Classification model
Regression model
Clustering model
Dimensionality reduction model
Machine learning models by assumptions
Parametric model
Non-parametric model
Accuracy of the ML model
Training and testing dataset
Accuracy for classification
Accuracy for regression
Accuracy for clustering
Bias-Variance decomposition
Underfitting andoverfitting
Mathematical concepts in machine learning
Definition of data point
Dataset as a vector space
Norm of a Vector
Euclidean distance
Similarity of vectors
Eigenvalues and Eigenvectors
Variable transformation andimputation
Scaling and normalization
Min-max scaling
Standard scaling
Categorical to continuous variable transformation
One-hot encoding
Continuous to categorical variable transformation
Imputation
Measures of variance
Coefficient of variance (CV)
Conclusion
2. Classification
Structure
Objective
Problem formulation
Binary and multi-class
Class boundary
Linear and non-linear class boundary
The general approach for solving classification
A brief introduction to scikit-learn
Training process – fit function
Testing/validation process – predict function
Concept of pipeline
Without pipeline
With Pipeline
The Bayesian approach of classification
Applying Bayes theorem in classification
Prior and posterior probability
Formulation
Naïve Bayes classifier
Conditional independence
Accuracy
Example using an abstract dataset
Training process
Validation with a data instance
Laplace estimation
Handling continuous attributes
Naïve Bayes Classifier using Python
Pre-processing
Training and testing set
Building the pipeline
Gaussian Naïve Bayes
Multinomial Naïve Bayes
Advantage of Naïve Bayes Classifier
The disadvantage of Naïve Bayes Classifier
Logistic Regression Classifier
Training process
Logistic Regression Classifier using Python
Pre-processing and building the pipeline
Overfitting and regularization
Regularization
Multi-Class Logistic Regression
Advantage of Logistic Regression Classifier
The disadvantage of Logistic Regression Classifier
Decision Tree Classifier
Anatomy of a Decision Tree
Handling continuous and categorical attributes
Categorical attribute
Continuous attribute
Measure and technique of splitting a node
Impurity and mathematical measures
ID3 algorithm of building Decision Tree
Python implementation of the Decision Tree
Pre-processing and building the pipeline
Visualization of the Decision Tree using Python
Advantage of Decision Tree Classifier
The disadvantage of Decision Tree Classifier
Class imbalance problem
Alternative metrics
Confusion Matrix
Ratio based metrics
Accuracy metric for multi-class and imbalanced dataset
Receiver Operating Characteristic Curve (ROC)
Python implementation of ROC generation
Mitigating class imbalance problem
Class weight adjustment approach
Sampling-based approach
Ensemble classification models
Bagging
RandomForest model
Python implementation of RandomForest
Multi-label classification models
Problem formulation
Problem decomposition and Umbrella classification scheme approach
Binary relevance scheme
Classifier chain scheme
Label powerset scheme
Comparison of each scheme
Accuracy metrics
Hamming loss metric
Python implementation of multi-label classifier
Conclusion
3. Regression
Structure
Objectives
Mathematical problem definition of Regression
Linear vs.non-linear relationships
Conversion between linear and non-linear relationships
Building a linear regression model
General approach tosolving linear regression
Ordinary Least Squares (OLS)
Gradient Descent
Accuracy of linear regression (R2 measure)
Selection of features in linear regression
Adjusted R2
Forward selection
Backward selection
Forward or backward: When to use what
Key points to remember in linear regression
Polynomial regression
Regularization
L1 regularization or Lasso
L2 regularization or Ridge
Parametric regression models to explain facts
Tree-based regression
Comparison of different regression techniques
Conclusion
4. Clustering
Structure
Objectives
Formal definition of clustering
Concept of cluster
Similarity metrics
Center-based clustering
K-means clustering
Basic K-means algorithm
Hyper-parameters
Python implementation of K-means clustering
Pre-processing
Pipeline creation
Clustering as new feature space
The sensitivity of KMeans with centroid initialization
Visualization of clusters
Accuracy metrics
Cohesion andseparation of clusters
Silhouette coefficient
Python implementation of cluster metric
Advantages of KMeans
Disadvantages of KMeans
Determining optimal K in KMeans
Elbow method
XMeans clustering
Computation of Log-likelihood
Python implementation of XMeans
Density-based clustering
DBSCAN clustering
Python implementation of DBSCAN clustering
Visualization of clusters
Advantages of DBSCAN
Disadvantages of DBSCAN
Determining optimal parameters of DBSCAN
K-distance plot for determining Eps
Hierarchical clustering
Python implementation of Agglomerative clustering
Visualization of clusters
Visualization of hierarchical clusters with a dendrogram
Clustering to solve a classification problem
Computation of class probabilities
Classification process
Python implementation of the model
Visualization of clusters and classification accuracy
Key points to remember about clustering-based classification
Conclusion
5. Deep Learning
Structure
Objectives
What is deep learning?
Why is deep learning required?
Neural network
Anatomy of a neural network
Perceptron
Activation function
Sigmoid or logistic
Tanh
ReLU (Rectified Linear Unit)
Linear
Layers
Loss function
Mean Squared Error (MSE)
Cross-Entropy Loss
Optimizer
Stochastic Gradient Descent (SGD)
Mini-batch Gradient Descent
Building a neural network model
Training process of neural network
Forward propagation
Backward propagation
Stopping criteria
Applying neural network for classification and regression problem
Deciding the number of hidden layers and perceptrons
Classification problem
The heuristic approach of deciding the number of nodes in hidden layers
Regression problem
Conventions of building MLP (Multilayer Perceptron) model
Different types of neural network
Convolutional Neural Network (CNN)
Convolution operation
Significance of convolution in the neural network model
Anatomy of a CNN
Convolution layer
Max/average-pool layer
Feature engineering of images using CNN
A brief introduction to PyTorch for designing a neural network
CNN using PyTorch
Input andoutput channel
Auto-Encoder
Disadvantages of deep learning and neural network
Conclusion
6. Miscellaneous Unsupervised Learning
Structure
Objectives
Dimensionality reduction
Principal Component Analysis (PCA)
Computation of PC from Covariance and Eigenvectors
PCA and Co-relation coefficient
Classification using principal components
Regression using principal components
Key points to remember about PCA
Unsupervised outlier detection
Outlier detection using Auto-Encoder
Architecture of the Auto-Encoder for outlier detection
Metric to measure outlier factor
Training of Auto-Encoder
Testing the result
Key points to remember about Auto Encoder based outlier detection
Outlier detection using clustering
Center-based clustering algorithm for outlier detection
Density-based clustering algorithm for outlier detection
Testing the accuracy
Key points to remember about DBSCAN based outlier detection
Outlier detection using Isolation Forest
Isolation Tree
Isolation Forest
Accuracy
Key points to remember about Isolation Forest-based outlier detection
Conclusion
7. Text Mining
Structure
Objectives
Analyzing text
What are a corpus and document?
Pre-processing of text
Steps of cleaning text
Vector space models of text
TF-IDF model
Word2Vec model
Skip-Gram Word2Vec model
CBOW (Continuous Bag of Words) Word2Vec model
Comparison of Skip-Gram and CBOW
Doc2Vec model
Average of Word2Vec
Distributed Memory Model (PV-DM) of Doc2Vec
Distributed Bag of Words of Paragraph Vectors Model (PV-DBOW) of Doc2Vec
Comparison of different Doc2Vec models
Comparisons of different vector space models
Text classification techniques
Visualization techniques for text
Histogram
Word Cloud
Naïve Bayes Classifier for text
TF-IDF with Naïve Bayes classifier
Doc2Vec with Naïve Bayes classifier
Measuring text similarity
Text clustering
Conclusion
8. Machine Learning Models in Production
Structure
Objectives
Challenges of putting a model into production
Exposing model as a service
Save and load a model
A brief introduction to Flask
Exposing model as Flask REST API
Adding scalability support
Scalability for storage
Scalability for computing
Apache Spark-MLlib and pipeline
Building platform-independent model descriptor
Predictive Model Markup Language (PMML)
Elements/tags of a PMML document
Generation of PMML document from a model
Installation of required libraries
scikit-learn pipeline to PMML conversion
How to use PMML document
Python client for PMML model
Java client for PMML model
Building overall architecture
Model deployed in batch mode
Model deployed in ad-hoc/real-time mode
Conclusion
9. Case Studies and Storytelling
Structure
Objectives
What is data science storytelling?
Machinelearning model
Visualizations
Facts
Case study 1: Analysis of sales-profit for superstore sales data from tableau user group using multivariate regression techniques
Data source and problem definition
Data exploration
Data filtering
Data pre-processing
Building the model
Analysis of result and dimension-measure relationships
Subcategory vs.profit analysis
Quantity vs.profit analysis
Postal code vs.profit analysis
Sales vs.profit analysis
Discount vs.profit analysis
Product name vs.profit analysis
Case study 2: Prediction of movie genres with multilabel text classification
Data source and problem definition
Data exploration
Building the model
Analysis of result and testing the model
Case study 3: Classification of natural images using CNN and PyTorch
Data source and problem definition
Data exploration
Effect of applying convolution filter
Building the model
Analysis of result and testing the model
Conclusion
Loading...