# thinkstats2

Book Description
• Preface
• How I wrote this book
• Using the code
• Exploratory data analysis
• A statistical approach
• The National Survey of Family Growth
• Importing the data
• DataFrames
• Variables
• Transformation
• Validation
• Interpretation
• Exercises
• Glossary
• Distributions
• Histograms
• Representing histograms
• Plotting histograms
• NSFG variables
• Outliers
• First babies
• Summarizing distributions
• Variance
• Effect size
• Reporting results
• Exercises
• Glossary
• Probability mass functions
• Pmfs
• Plotting PMFs
• Other visualizations
• The class size paradox
• DataFrame indexing
• Exercises
• Glossary
• Cumulative distribution functions
• The limits of PMFs
• Percentiles
• CDFs
• Representing CDFs
• Comparing CDFs
• Percentile-based statistics
• Random numbers
• Comparing percentile ranks
• Exercises
• Glossary
• Modeling distributions
• The exponential distribution
• The normal distribution
• Normal probability plot
• The lognormal distribution
• The Pareto distribution
• Generating random numbers
• Why model?
• Exercises
• Glossary
• Probability density functions
• PDFs
• Kernel density estimation
• The distribution framework
• Hist implementation
• Pmf implementation
• Cdf implementation
• Moments
• Skewness
• Exercises
• Glossary
• Relationships between variables
• Scatter plots
• Characterizing relationships
• Correlation
• Covariance
• Pearson's correlation
• Nonlinear relationships
• Spearman's rank correlation
• Correlation and causation
• Exercises
• Glossary
• Estimation
• The estimation game
• Guess the variance
• Sampling distributions
• Sampling bias
• Exponential distributions
• Exercises
• Glossary
• Hypothesis testing
• Classical hypothesis testing
• HypothesisTest
• Testing a difference in means
• Other test statistics
• Testing a correlation
• Testing proportions
• Chi-squared tests
• First babies again
• Errors
• Power
• Replication
• Exercises
• Glossary
• Linear least squares
• Least squares fit
• Implementation
• Residuals
• Estimation
• Goodness of fit
• Testing a linear model
• Weighted resampling
• Exercises
• Glossary
• Regression
• StatsModels
• Multiple regression
• Nonlinear relationships
• Data mining
• Prediction
• Logistic regression
• Estimating parameters
• Implementation
• Accuracy
• Exercises
• Glossary
• Time series analysis
• Importing and cleaning
• Plotting
• Linear regression
• Moving averages
• Missing values
• Serial correlation
• Autocorrelation
• Prediction
• Exercises
• Glossary
• Survival analysis
• Survival curves
• Hazard function
• Inferring survival curves
• Kaplan-Meier estimation
• The marriage curve
• Estimating the survival curve
• Confidence intervals
• Cohort effects
• Extrapolation
• Expected remaining lifetime
• Exercises
• Glossary
• Analytic methods
• Normal distributions
• Sampling distributions
• Representing normal distributions
• Central limit theorem
• Testing the CLT
• Applying the CLT
• Correlation test
• Chi-squared test
• Discussion
• Exercises
