Photo by the editor
To learn data science, you also need a solid foundation in mathematics. Statistics is one of the fundamental mathematical skills in data science.
However, learning statistics can be intimidating, especially if you are a major other than math or computer science. To lend a hand you get started, we’ve put together a list of free books that provide data science statistics.
Most of these books provide a practical approach to statistical concepts, which is what you need to apply statistics effectively as a data scientist. So let’s take a look at these statistical books.
The Preliminary statistics The book is an accessible introduction to statistics, covering topics typically covered in a semester-long college introductory statistics course.
Available for free on OpenStax and written by a team of expert authors, this book focuses on an approach to statistics rather than a theory, and includes examples in exercises on each topic.
This book will lend a hand you learn the following:
- Sampling and data
- Descriptive statistics
- Topics in probability and random variables
- Normal distribution
- Central limit theorem
- Confidence intervals
- Hypothesis testing
- Chi-square distribution
- Linear regression and correlation
- F distribution and one-way ANOVA
To combine: Introductory statistics 2e
Introduction to modern statistics is a free online manual of the OpenIntro project, authored by Mine Çetinkaya-Rundel and Johanna Hardin.
If you want to learn the basics of statistics for effective data analysis, this book is for you. The content of this book is as follows:
- Introduction to data
- Analysis of reconnaissance data
- Regression modeling
- Basics of reasoning
- Statistical inference
- Inference modeling
To combine: Introduction to modern statistics
Think about the statistics by Allen B. Downey will lend a hand you learn and practice statistical concepts using Python.
So you can apply your Python skills to learn concepts related to statistics and probability to work with data effectively. As you work through this book, you’ll be able to write tiny Python programs and practice on real data sets to solidify your understanding of statistical concepts.
The topics covered are:
- Analysis of reconnaissance data
- Distribution
- Probability mass functions
- Cumulative distribution functions
- Modeling of distributions
- Probability density functions
- Relationships between variables
- Estimate
- Hypothesis testing
- Linear least squares method
- Regression
- Survival analysis
- Analytical methods
To combine: Think about 2e stats
Computational Thinking and Inference: Fundamentals of Data Science by Ania Adhikari, John DeNero, and David Wagner will lend a hand you learn the basics of statistics in data science.
This book was created as a supplement to the book entitled Data 8: Data Science Fundamentals course offered at the University of California, Berkeley. Topics covered in this book include:
- Introduction to data science
- Python programming
- Data types, sequences and tables
- Imagining
- Functions and arrays
- Randomness
- Empirical sampling and distribution
- Hypothesis testing
- Estimate
- Regression
- Classification
To combine: Computational Thinking and Inference: Fundamentals of Data Science
Probabilistic programming and Bayesian methods for hackers or Bayesian Methods for Hackers is a popular book about Bayesian methods in statistics.
“Bayesian Methods for Hackers”: An introduction to Bayesian methods + probabilistic programming with a computation/comprehension first, mathematics second perspective. Everything in pure Python 😉 – Source
You will become familiar with probability theory and Bayesian inference as you apply PyMC package. The content of this book is as follows:
- Introduction to Bayesian methods
- PyMC library
- Markov chain Monte Carlo
- The law of gigantic numbers
- Loss functions
- Priors
To combine: Probabilistic programming and Bayesian methods for hackers
I hope you found this roundup of free statistics books helpful. The combination of theory and hands-on practice should lend a hand you refine your data science skills and make more informed decisions when working with gigantic data in the real world.
If you prefer to take advantage of free courses or want to supplement your reading with courses, check out 5 free courses for Master Statistics for Data Science.
Bala Priya C is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She enjoys reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates intriguing resource overviews and coding tutorials.
