Computational and Inferential Thinking: The Foundations of Data Science
Ani Adhikari
Computers & Technology
Computational and Inferential Thinking: The Foundations of Data Science
Free
Description
Contents
Reviews

Data Science is about drawing useful conclusions from large and diverse data sets through exploration, prediction, and inference. Exploration involves identifying patterns in information. Prediction involves using information we know to make informed guesses about values we wish we knew. Inference involves quantifying our degree of certainty: will those patterns we found also appear in new observations? How accurate are our predictions? Our primary tools for exploration are visualizations and descriptive statistics, for prediction are machine learning and optimization, and for inference are statistical tests and models.

Statistics is a central component of data science because statistics studies how to make robust conclusions with incomplete information. Computing is a central component because programming allows us to apply analysis techniques to the large and diverse data sets that arise in real-world applications: not just numbers, but text, images, videos, and sensor readings. Data science is all of these things, but it more than the sum of its parts because of the applications. Through understanding a particular domain, data scientists learn to ask appropriate questions about their data and correctly interpret the answers provided by our inferential and computational tools.

This is the textbook for the Foundations of Data Science class at UC Berkeley. Read online at the book's website.

 

Language
English
ISBN
9124423963
Introduction
Data Science
Introduction
Computational Tools
Statistical Techniques
Why Data Science?
Plotting the Classics
Literary Characters
Another Kind of Character
Causality and Experiments
John Snow and the Broad Street Pump
Snow’s “Grand Experiment”
Establishing Causality
Randomization
Endnote
Programming in Python
Expressions
Numbers
Names
Example: Growth Rates
Call Expressions
Data Types
Strings
String Methods
Comparisons
Sequences
Arrays
Ranges
More on Arrays
Tables
Sorting Rows
Selecting Rows
Example: Population Trends
Example: Trends in Gender
Visualization
Categorical Distributions
Numerical Distributions
Overlaid Graphs
Functions and Tables
Applying Functions to Columns
Classifying by One Variable
Cross-Classifying
Joining Tables by Columns
Bike Sharing in the Bay Area
Randomness
Conditional Statements
Iteration
The Monty Hall Problem
Finding Probabilities
Sampling
Empirical Distributions
Sampling from a Population
At the Roulette Table
Empirical Distibution of a Statistic
Testing Hypotheses
Jury Selection
Terminology of Testing
Error Probabilities
Example: Deflategate
Estimation
Percentiles
The Bootstrap
Confidence Intervals
Using Confidence Intervals
Why the Mean Matters
Properties of the Mean
Variability
The SD and the Normal Curve
The Central Limit Theorem
The Variability of the Sample Mean
Choosing a Sample Size
Prediction
Correlation
The Regression Line
The Method of Least Squares
Least Squares Regression
Visual Diagnostics
Numerical Diagnostics
Inference for Regression
A Regression Model
Inference for the True Slope
Prediction Intervals
Classification
Nearest Neighbors
Training and Testing
Rows of Tables
Implementing the Classifier
The Accuracy of the Classifier
Comparing Two Samples
Two Categorical Distributions
A/B Testing
Causality
Updating Predictions
A "More Likely Than Not" Binary Classifier
Making Decisions
The book hasn't received reviews yet.