# SQL + Python is simply not enough
For years, the formula seemed elementary: learn SQL + learn Python = get a data job. Especially as mid-sized companies have become “data-driven.” Hiring managers were cheerful to hire anyone who could write halfway decently GROUP BY and fight panda DataFrame without destroying anything. Do you know what PostgreSQL is? Get in, you’ve got the job! This worked for a while. Until it happened.
In case you haven’t noticed, the data scientist job market has undergone structural changes. Yes, SQL and Python are still critical; are in every job description. But they were relegated from distinctions to entry requirements.
You’re probably still optimizing answers to the interview questions you practiced three years ago. Forget it. This article addresses the gap between what candidates are preparing for and what companies currently need.
# What the labor market is actually asking for
January 2026 summary conducted by Future Proof Data Science based on over 700 job advertisements for data scientists showed that Python and SQL are still among the three most critical skills, but Machine learning and AI skills are in second and fourth place.

Image source: The future of data science
Not all AI-related posts require working knowledge of AI, but 1 in 3 do. The most required specific AI skills If:
- Gigantic Language Models (LLM)
- Search Assisted Generation (RAG)
- Swift engineering
- Vector databases
This appeals to growing demand for data specialists who can build and implement AI systems.
Remember the direction AND the speed of these changes matters. It reminds me of how machine learning went from a niche requirement in 2012 to almost universal in 2020.
Second story it’s less observable, but probably more immediate for most candidates: : the basic engineering bar has skyrocketed. Data engineering skills – pipelines, orchestration, cloud platforms, data quality control – and machine learning in production – model monitoring, drift detection, assessment design – are now the main expectations instead of bonuses in data analytics job ads.
A glance at any major job portals confirms this: alongside skills related to artificial intelligence, roles titled “Data Scientist” regularly appear. Snowflake, db, Airflowand ETL pipeline ownership as a requirement, not something worth having.
There are four skills you probably lack. These are the recent distinguishing features in the current labor market.
# Skill #1: Data modeling
// What is this
Data modeling is the skill design the structure, connection and storage of data. Think of it as deciding what tables to create, what they represent, and how they are related to each other.
// Why it became a differentiator
Improvements in tools have changed the landscape. Snowflake, dbAND BigQuery all this made it easier for data scientists owner of the data transformation layer. In other words, modeling decisions that once belonged to data engineers are now handed over to data scientists.
If you get your data schema wrong, you’re in threatening waters. Typically, these errors are not immediately observable. Once they become obvious, it’s too behind schedule. Your machine learning work has already been impacted by feature engineering based on data at the wrong granularity – a direct consequence of a poorly modeled foundation.
// How to get it
Take the real dataset you’re working with and redesign its schema from scratch. Ask yourself the following questions:
- What entities are these?
- What do they involve?
- What grain makes sense?
- What queries will be run most often?
Then read about dimensional modeling. Kimball’s approachdescribed in detail in his book Data warehouse toolkitremains a useful point of reference.
# Skill #2: Performance optimization
// What is this
Performance optimization requires understanding why the query works this way and how to make it work faster, cheaper or at scale. You can optimize SQL queriesbut also Python pipelines AND data workflows broadly speaking, data scientists are increasingly the end-to-end owners of data.
// Why it became a differentiator
First, data volume has increased to the point where a correct but unskilled query can cost hundreds of dollars and production downtime.
Secondly, as mentioned earlier, Data scientists now need to own a much larger portion of the pipeline than they did before. Your code must be production-ready, not just runnable in Jupyter notebooks.

// How to get it
Select a few complicated SQL queries you’ve written and run them EXPLAIN ANALYZE on them and read what the query planner actually did. Then operate this to optimize your query. You’ll likely find at least one index, restructure, or rewrite that improves each query.
For tardy python pipeline, profile it. There are two main tools on time: :
- cProfile: Run this from
python -m cProfile -s cumulative your_script.pyand look at the top of the results to see the features using the most cumulative time. - line_profiler: Goes deeper by showing the execution time line by line within a specific function. Employ it when cProfile tells you Which The function is tardy and you need to know Why.
For memoryoperate memory_profile.
Find the bottleneck – is it tardy because the Python loop should be vectorized? Is data loaded into memory all at once rather than in chunks? — fix it and measure the difference.
# Skill #3: Infrastructure Awareness
// What is this
This skill means you understand what system data lives in and moves through. These systems include cloud platforms, distributed computing, data pipelines, storage formats and cost models.
You should know enough about the infrastructure to design the systems that can be deployed on it.
// Why it became a differentiator
Once again, much of the data engineering work has fallen into the hands of the data scientist. If you’re dependent on data engineers for every infrastructure decision, you’re effectively creating a bottleneck – and that’s not what hiring managers are looking for.
Infrastructure awareness covers these main, interconnected areas.
You will most likely need to familiarize yourself with these tools.

// How to get it
Schedule a session with your data engineering team. Sit down with them and ask them about it will guide you through the pipeline from end to end. Understand where the data is located, how it is partitioned, and what happens when something breaks.
Then come over construction of a tiny pipeline yourself: take advantage of the free cloud tier, understand cost and execution metrics, and then intentionally break the pipeline to understand how it’s failing.
# Skill #4: Design RAG systems, evaluate LLM results and conduct AI experiments
// What is this
This skill set is about practical AI work. You need to know how to design search-assisted generation (RAG) systems (connecting LLM to real data sources), build evaluation frameworks (measuring whether an LLM-based feature actually works), and run experiments with AI features.
// Why it became a differentiator
The reason is AI tools. They made it possible to build the RAG pipeline without extensive research knowledge. Frameworks like LangChain AND Llama Indexcombined with native cloud vector databases has significantly lowered this barrier.
So the question is no longer whether it can be built – yes, it can. But can it be built well, assessed and trusted in production? Answering this question is what you need to be able to do: define metrics, design experiments, and measure results.
As you apply these skills, you will operate these tools.

// How to get it
Find some interview questions that will assist you refine your thinking about artificial intelligence. Here are some examples from AI and GenAI product interview questions on StrataScratch.
Example No. 1: Measuring the implementation of AI features in retail stores
How would you measure the impact of implementing an AI-powered inventory recommendation system in a sample of retail stores? How would you design the experiment and account for store-level variation?
Example No. 2: RAG system architecture
Describe how you would design a RAG system from scratch. What components are needed and how to optimize search quality?
Once you have clarified your thinking, build a tiny RAG application: select a domain, embed the document body, connect the download and evaluate the results with structured metrics.
Also design an experiment: write a hypothesis, define metrics, and think of a valid test to evaluate it.
# Application
Four skills – data modeling, performance optimization, infrastructure awareness and practical AI skills – make up the gap between you and the job market. I hope you don’t fall into this. To make sure this doesn’t happen, this article provides practical advice on how to get each of them.
Nate Rosidi is a data scientist and product strategist. He is also an adjunct professor of analytics and the founder of StrataScratch, a platform that helps data scientists prepare for job interviews using real interview questions from top companies. Nate writes about the latest career trends, gives interview advice, shares data science projects, and discusses all things SQL.
