Friday, May 22, 2026

How to go from text to sql from llms

Share

Text for SQL from LLMS
Photo by the author Canva

Thanks to the huge LAgnuage (LLM) models, everyone is a codeor today! This is the message you receive from LLM promotional materials. This is obviously not true, just like any advertisement. Coding is much more than producing code at a breakthrough speed. However, translation of English (or other natural languages) into executable SQL queries is one of the most convincing applications of LLM and has its place in the world.

# Why exploit LLM to generate SQL?

There are several advantages of using LLM to generate SQL, and as in the case of everything, there are also some disadvantages.

LLM to generate SQLLLM to generate SQL

# Two types of LLM text for SQL

We can distinguish between two very wide types of text technology for SQL currently regarding their access to the database scheme.

  1. LLM without direct access
  2. LLM with direct access

// 1. LLM without direct access to the database diagram

These LLM do not combine or do not perform queries with a real database. The closest you can get is to send the data sets you want to ask. These tools are based on providing the context of their scheme.

Examples of tools:

Cases of exploit:

  • Inquiry and prototyping
  • Learning and teaching
  • Unchanging code generation for a later review

// 2. LLMS with direct access to the database scheme

These LLM connect directly with your live data sources, such as PostgresqlIN SnowflakeIN BigqueryOr Shift to red. They allow you to generate, do and return results from SQL inquiries live in your database.

Examples of tools:

Cases of exploit:

  • Conversation analytics for business users
  • Real -time data exploration
  • Built -in AI assistants on BI

# Step by step: how to go from text to sql

The basic flow of SQL work from the text is similar, regardless of whether you exploit disconnected or connected LLM.

From text to SQLFrom text to SQL

We will try to solve Interview with Shopify and Amazon Using the above steps in chatgpt.

// 1. Define the diagram

In order for your question to work on your data, LLM must clearly understand the data structure. This usually includes:

  • Table names
  • Names and types of columns
  • Relations between tables (joining, keys)

This information can be transmitted directly in prompt or can be taken dynamically using Vector search under the download generation (RAG) pipeline.

// 2. Monit with a natural language

The following prompt usually consists of two segments:

  • Diagram definition
  • Questions ( -y) to which we need SQL answers

Example: Let me get a quick structure that includes substitute symbols. Then we will write a real prompt.

We will exploit Signing roleswhich means instructing chatgpt to take a specific role.

Here’s how to structure prompt.

Dataset: My dataset consists of [number of tables] tables.

The first one is [table “” not found /]
with the following columns and data types: [column names and data types] The second table is [table “” not found /]
with the following columns and data types: [column names and data types] Question: [provide a question to be answered] Assumptions: [provide assumptions for solving the question] Role: [describe a role LLM has to play]

Let’s write a real prompt that solves our questions for the interview.

Here is and we hope that it will provide the correct SQL solution. Column names, data types, questions and assumptions are pasted from the question itself.

Dataset: My dataset consists of two tables.

The first one is "customers" with the following columns and data types:

address:
text
city:
text
first_name:
text
id:
bigint
last_name:
text
phone_number:
text

The second table is "orders" with the following columns and data types:

cust_id:
bigint
id:
bigint
order_date:
date
order_details:
text
total_order_cost:
bigint

Question: Find the customers with the highest daily total order cost between 2019-02-01 and 2019-05-01. If a customer had more than one order on a certain day, sum the order costs on a daily basis. Output each customer's first name, total cost of their items, and the date.

Assumption: For simplicity, you can assume that every first name in the dataset is unique.

Role: Act as a SQL expert and write a PostgreSQL query that answers the question.

// 3. Generate SQL inquiry

At this stage, the LLM selected by you generates the SQL query.

Example: here is the postgresql chatgpt code produced for us.

SELECT
    c.first_name,
    daily_totals.order_date,
    daily_totals.total_cost
FROM (
    SELECT
        cust_id,
        order_date,
        SUM(total_order_cost) AS total_cost
    FROM orders
    WHERE order_date BETWEEN '2019-02-01' AND '2019-05-01'
    GROUP BY cust_id, order_date
) AS daily_totals
JOIN customers c ON c.id = daily_totals.cust_id
WHERE (daily_totals.cust_id, daily_totals.order_date) IN (
    SELECT
        cust_id,
        order_date
    FROM (
        SELECT
            cust_id,
            order_date,
            SUM(total_order_cost) AS total_cost,
            RANK() OVER (ORDER BY SUM(total_order_cost) DESC) AS rnk
        FROM orders
        WHERE order_date BETWEEN '2019-02-01' AND '2019-05-01'
        GROUP BY cust_id, order_date
    ) ranked
    WHERE rnk = 1
);

// 4. Make an inquiry

Now you can ask whether it is directly (if LLM supports live calls), or by copying and running them in the database system.

Example: in our case we will copy the code for the Stratascratch code editor and exploit the “Check the solution” button to confirm the solution.

This is what sends and this is the correct solution. Bravo for chatgpt! He nailed him at the first attempt!

name order_date total_cost
Jill 2019-04-19 275
Rate 2019-04-19 275

// 5. Review, visualize and improve

Depending on the purpose of using LLM to save SQL, this step can be optional. In the business world, you would usually present an inquiry output in a user -friendly format, which usually includes:

  • Showing the results as a table and/or chart
  • By allowing control requirements (e.g. “Can you take into account the city of the client?”) And ensuring a changed question and exit

# Traps and the best practices

In our example, ChatgPT immediately invented the correct answer. This does not mean, however, that always yes, especially when data and requirements become more and more complicated. The exploit of LLM for downloading SQL queries from the text is not without traps. You can avoid them by using the best practices if you want LLM Query Generation to be part of the flow of scientific work.

Traps and the best practicesTraps and the best practices

# Application

LLM can be your best friend when you want to create SQL queries from the text. However, to best exploit these tools, you need to understand well what you want to achieve, and cases of exploit in which the exploit of LLM is beneficial.

In this article, it contains such guidelines along with an example of how to monitor LLM in natural language and obtain a working SQL code.

Nate Rosidi He is a scientist of data and in the product strategy. He is also an analytical teacher and the founder of Stratascratch, platforms facilitate scientists to prepare for interviews with real questions from the highest companies. Nate writes about the latest trends on the career market, gives intelligence advice, divides data projects and includes everything SQL.

Latest Posts

More News