
Photo by the author
If you’re preparing for data science interviews, you know how overwhelming it can be to sift through all the online resources available. It’s effortless to get lost in the details. That’s why I’m joyful to present a hidden gem of a resource:Data Science Interviews book” By Dip Ranjan Chatterjee.
This freely available online book covers all the vital topics you need to know for data science interviews, from statistics and model building to algorithms, neural networks, and business analysis. But what sets it apart from other resources is its focus on providing only the vital information that will support you prepare for your interview. This makes it an ideal resource for busy data scientists who need to quickly brush up on a wide range of concepts. Here are a few things that I think make this book unique:
- Real world interview questions: This book includes real-world interview questions from companies like Google, DoorDash, and Airbnb, along with detailed solutions and case studies.
- Updated content: The book is constantly updated with novel sections, questions, and richer content.
- Cheat sheets and references: The book includes cheat sheets that provide a quick guide to various topics, as well as additional resources for those who want to delve deeper into the topic.
Don’t panic if you come across a section followed by the ⚠️ symbol. This simply means that these sections are still a work in progress and are subject to change. Here are the main sections covered in this book:
1. Statistics
This section covers the basics of statistics that are necessary for analyzing data and building models. Topics include probability foundations, probability distributions, central limit theorem, Bayesian and frequentist reasoning, hypothesis testing, and A/B testing.
2. Building a model
This part of the book guides you through the process of building a successful model, from data collection to model selection. It also teaches you the data preprocessing techniques vital to any data scientist, including feature scaling, handling outliers, dealing with missing values, and encoding categorical variables. It also includes a subsection on hyperparameter optimization and the well-known open-source tools used to do so.
3. Algorithms
4. Python
Python is a versatile language used in data science for a variety of tasks. This section contains the following subsections:
- Theoretical: It covers some basic Python concepts such as mesh grid, statistical methods, range vs xrange, switch sizing and lambda functions.
- Basics: There are some common programming techniques you need to know to solve Python interview questions, such as lists, tuples, and dictionaries, as well as understand control flow with loops and conditions.
- Coding algorithms from scratch: Often companies ask candidates to code algorithms from scratch during the coding demo round. The general steps to code an algorithm from scratch are discussed here.
- Questions: It includes some sample questions related to statistics, data manipulation and NLP.
5. SQL
In data science interviews, SQL queries are often used to assess a candidate’s ability to work with data and solve elaborate problems. This section covers the basics of SQL, including joins, ephemeral tables, table variables and CTEs, window functions, time functions, stored procedures, indexing, and performance tuning. The Temperature Table vs. Table Variable vs. CTE section explains the differences between these three ephemeral data structures and when to utilize each. You will also learn how to create and utilize stored procedures. The Performance Tuning section provides various tips for optimizing SQL queries. Overall, it will give you a solid foundation in SQL.
6. Analytical thinking
While the book has some current sections such as Excel, Neural Networks, NLP, Machine Learning Frameworks, Business Intelligence, etc., I would like to highlight this one in particular. I think it’s unique because it covers business scenarios and behavioral management questions that are becoming increasingly critical in data science conversations. Companies are looking not only for technical knowledge, but also for candidates who can think strategically and communicate effectively.
For example, here is a question Salesforce asked in one of its interviews:
“As a data analyst at Salesforce, you talk to a Product Manager who wants to understand Salesforce’s user base. What would be your approach?”
By reviewing scenario-based questions, you will be well prepared for your interviews.
7. Cheat sheets
Instead of spending hours searching for cheat sheets online, you can find quick and comprehensive guides on topics like Numpy, Pandas, SQL, Statistics, Git, PowerBI, Python Basics, Keras, and R Basics in one place. These guides are perfect for quickly to refresh your knowledge before a job interview or to utilize during a coding challenge.
I fully understand the importance of having reliable and comprehensive resources to prepare for job interviews, and I believe this book does the job. I’m sure it will support you succeed. I wish you all the best on your data science preparation journey! If you have any questions, please contact us.
Kanwal Mehreen Kanwal is a machine learning engineer and technical writer with a deep passion for data science and the intersection of artificial intelligence and medicine. She is co-author of the e-book “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she promotes diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Scholar, a Mitacs Globalink Research Scholar, and a Harvard WeCode Scholar. Kanwal is a staunch supporter of change and founded FEMCodes to empower women in STEM fields.
