
Image generated with ChatGPT
Learning statistics is a fundamental part of your journey towards becoming a data scientist, data analyst, or even an AI engineer. Most of the machine learning models used in current technology are statistical models. So, having a good understanding of statistics will make it easier for you to learn and build advanced AI technologies.
In this blog, we will look at 10 GitHub repositories that will facilitate you master statistics. These repositories include code samples, books, Python libraries, guides, documentation, and visual learning materials.
1. Practical Statistics for Data Scientists
Warehouse: gedeck/practical-statistics-for-data-scientists
This repository offers practical examples and code snippets from the book “Practical Statistics for Data Scientists” that cover fundamental statistical techniques and concepts. It is a great starting point for data scientists who want to apply statistical methods to real-world scenarios.
The book’s code repository contains proper R and Python code examples. If you’re used to the Jupyter Notebook coding style, it also contains similar examples in Jupyter Notebook for Python and R.
2. Probabilistic Programming and Bayesian Methods for Hackers
Warehouse: CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-For-Hackers
This repository provides an interactive, hands-on introduction to Bayesian methods using Python. The content is presented as Jupyter notebooks using nbviewer, making it straightforward to follow the theory and Python code for Bayesian models and probabilistic programming.
This interactive book includes an introduction to Bayesian methods, information on getting started with the Python PyMC library, Markov Chain Monte Carlo, the law of huge numbers, loss functions, and more.
3. Statsmodels: Statistical Modeling and Econometrics in Python
Warehouse: state models/state models
4. TensorFlow Probability
Warehouse: tensorflow/probability
TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. It extends the core TensorFlow library with tools for building and training probabilistic models, making it an excellent resource for those interested in combining deep learning with statistical modeling.
The documentation includes examples of linear mixed effects models, hierarchical linear models, probabilistic principal components analysis, Bayesian neural networks, and more.
5. The Probability and Statistics Cookbook
Warehouse: I have a cookbook/stat
This repository is a collection of recipes for solving common statistical problems, serving as a helpful source of quick solutions and examples for various statistical tasks. It provides concise guidance on probability and statistics, including concepts such as continuous distribution, probability theory, random variables, expectation, variance, and inequalities. You can exploit the make command to access the cookbook locally or download the PDF. The repository also contains LaTeX files for various statistical concepts.
6. Seeing the theory
Warehouse: theory of vision/Theory of vision
Seeing Theory is a visual introduction to probability and statistics. This repository features interactive visualizations and explanations that make complicated statistical concepts more accessible and easier to understand, especially for visual learners.
This is a highly interactive book for beginners, covering various topics such as probability basics, compound probability, probability distributions, frequency inference, Bayesian inference, and regression analysis.
7. Statistics Mathematics with Python
Warehouse: tirthajyoti/Statistics-Mathematics-with-Python
This repository contains scripts and Jupyter notebooks covering general statistics, mathematical programming, and scientific computing using Python. It is a valuable resource for anyone looking to strengthen their statistical and mathematical programming skills.
Includes examples on Bayes’ rule, Brownian motion, hypothesis testing, linear regression, and more.
8. Python for Probability, Statistics, and Machine Learning
Warehouse: unpingco/Python-for-statistics-probability-and-machine-learning
This repository contains code examples and Jupyter notebooks from the book “Python for Probability, Statistics, and Machine Learning”, which cover a wide range of topics from the basics of probability and statistics to advanced machine learning techniques.
In the “chapters” folder, there are three subfolders containing Jupyter notebooks for statistics, probability, and machine learning. Each notebook contains code, output, and a description explaining the methodology, code, and results.
9. VIP Probability and Statistics Cheat Sheets
Warehouse: shervinea/stanford-cme-106-probability-and-statistics
This repository contains VIP cheat sheets for Stanford’s Probability and Statistics for Engineers course. The cheat sheets provide concise summaries of key concepts and formulas, making them a useful resource for students and professionals.
This is a popular cheat sheet covering topics such as conditional probability, random variables, parameter estimation, hypothesis testing, and more.
10. Basic Mathematics for Machine Learning
Warehouse: hrnbot/Mathematical Basics for Machine Learning
Understanding the basics of mathematics is crucial to mastering machine learning and statistics. This repository aims to demystify mathematics and facilitate you learn the basics of algebra, calculus, statistics, probability, vectors, and matrices using Python Jupyter Notebooks.
Final thoughts
The learning resources hosted on GitHub are created by experts and the open-source community, with the goal of sharing knowledge to make it easier for beginners to learn data science and statistics. You’ll learn statistics by reading theory, solving code examples, understanding mathematical concepts, building projects, performing various analyses, and exploring popular statistical tools. All of these topics are covered in the GitHub repository mentioned above. These resources are free, and anyone can contribute to improving them. So learn and build amazing things.
Abid Ali Awan (@1abidaliawan) is a certified data science professional who loves building machine learning models. He currently focuses on content creation and writing technical blogs on machine learning and data science technologies. Abid has a Masters in Technology Management and a Bachelors in Telecommunication Engineering. His vision is to build an AI product using Graph Neural Network for students struggling with mental illness.
