Photo by the author# Entry
Data has become an imperative resource for any successful business because it provides valuable information to make informed decisions. Given the importance of data, many companies are creating systems to store and analyze it. However, in many cases it is challenging to obtain and analyze the necessary data, especially as the complexity of the data system increases.
With the advent of generative AI, working with data has become much easier because we can now employ straightforward natural language to get mostly precise output that closely matches the inputs we provide. This also applies to data processing and analysis using SQL, where we can ask for query development.
In this article, we will develop a straightforward API application that translates natural language into SQL queries that our database can understand. We will employ three main tools: OpenAI, FastAPIAND SQLite.
Here’s the plan.
# Creating Text-to-SQL applications
First, we will prepare everything needed for our project. All you need to provide is OpenAI API keywhich we will employ to access the generative model. We will employ to containerize the application Dockerwhich can be acquired for local implementation using Docker desktop.
Other components such as SQLite will already be available once Python is installed, and FastAPI will be installed later.
For the overall structure of the project we will employ the following elements:
text_to_sql_app/
├── app/
│ ├── __init__.py
│ ├── database.py
│ ├── openai_utils.py
│ └── main.py
├── demo.db
├── init_db.sql
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── .env
Create a structure as above or you can employ the following warehouse to make it easier. We will continue to review each file to understand how to develop the application.
Let’s start with the filling .env file with the OpenAI API key that we acquired earlier. This can be done with the following code:
OPENAI_API_KEY=YOUR-API-KEY
Then go to requirements.txt to populate the necessary libraries we will employ
fastapi
uvicorn
sqlalchemy
openai
pydantic
python-dotenv
Then we move on to __init__.py file and inside we will put the following code:
from pathlib import Path
from dotenv import load_dotenv
load_dotenv(dotenv_path=Path(__file__).resolve().parent.parent / ".env", override=False)
The above code ensures that the environment contains all the keys we need.
Then we will develop the Python code in the format database.py a file enabling connection to the SQLite database that we will create later (so-calleddemo.db) and allow you to run SQL queries.
from sqlalchemy import create_engine, text
from sqlalchemy.orm import Session
ENGINE = create_engine("sqlite:///demo.db", future=True, echo=False)
def run_query(sql: str) -> list[dict]:
with Session(ENGINE) as session:
rows = session.execute(text(sql)).mappings().all()
return [dict(r) for r in rows]
Then we will prepare openai_utils.py a file that will accept the database schema and input questions. The output will be in JSON format containing an SQL query (with protection preventing any write operations).
import os
import json
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
_SYSTEM_PROMPT = """
You convert natural-language questions into read-only SQLite SQL.
Never output INSERT / UPDATE / DELETE.
Return JSON: { "sql": "..." }.
"""
def text_to_sql(question: str, schema: str) -> str:
response = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0.1,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": _SYSTEM_PROMPT},
{"role": "user",
"content": f"schema:n{schema}nnquestion: {question}"}
]
)
payload = json.loads(response.choices[0].message.content)
return payload["sql"]
With the code and connection ready, we will prepare the application using FastAPI. The application will accept natural language questions and database schema and convert them to SQL SELECT queries, run them in a SQLite database and return the results in JSON format. The application will be an API that we can access via the CLI.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from sqlalchemy import inspect
from .database import ENGINE, run_query
from .openai_utils import text_to_sql
app = FastAPI(title="Text-to-SQL Demo")
class NLRequest(BaseModel):
question: str
@app.on_event("startup")
def capture_schema() -> None:
insp = inspect(ENGINE)
global SCHEMA_STR
SCHEMA_STR = "n".join(
f"CREATE TABLE {t} ({', '.join(c['name'] for c in insp.get_columns
for t in insp.get_table_names()
)
@app.post("/query")
def query(req: NLRequest):
try:
sql = text_to_sql(req.question, SCHEMA_STR)
if not sql.lstrip().lower().startswith("select"):
raise ValueError("Only SELECT statements are allowed")
return {"sql": sql, "result": run_query(sql)}
except Exception as e:
raise HTTPException(status_code=400, detail=str(e))
That’s all we need for the main application. The next thing we will prepare is the database. Employ the database below in init_db.sql for example for goals, but you can always change it if you want.
DROP TABLE IF EXISTS order_items;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS payments;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS customers;
CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
country TEXT,
signup_date DATE
);
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
price REAL
);
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER,
order_date DATE,
total REAL,
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE order_items (
order_id INTEGER,
product_id INTEGER,
quantity INTEGER,
unit_price REAL,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(id),
FOREIGN KEY (product_id) REFERENCES products(id)
);
CREATE TABLE payments (
id INTEGER PRIMARY KEY,
order_id INTEGER,
payment_date DATE,
amount REAL,
method TEXT,
FOREIGN KEY (order_id) REFERENCES orders(id)
);
INSERT INTO customers (id, name, country, signup_date) VALUES
(1,'Alice','USA','2024-01-05'),
(2,'Bob','UK','2024-03-10'),
(3,'Choi','KR','2024-06-22'),
(4,'Dara','ID','2025-01-15');
INSERT INTO products (id, name, category, price) VALUES
(1,'Laptop Pro','Electronics',1500.00),
(2,'Noise-Canceling Headphones','Electronics',300.00),
(3,'Standing Desk','Furniture',450.00),
(4,'Ergonomic Chair','Furniture',250.00),
(5,'Monitor 27"','Electronics',350.00);
INSERT INTO orders (id, customer_id, order_date, total) VALUES
(1,1,'2025-02-01',1850.00),
(2,2,'2025-02-03',600.00),
(3,3,'2025-02-05',350.00),
(4,1,'2025-02-07',450.00);
INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
(1,1,1,1500.00),
(1,2,1,300.00),
(1,5,1,350.00),
(2,3,1,450.00),
(2,4,1,250.00),
(3,5,1,350.00),
(4,3,1,450.00);
INSERT INTO payments (id, order_id, payment_date, amount, method) VALUES
(1,1,'2025-02-01',1850.00,'Credit Card'),
(2,2,'2025-02-03',600.00,'PayPal'),
(3,3,'2025-02-05',350.00,'Credit Card'),
(4,4,'2025-02-07',450.00,'Bank Transfer');
Then run the following code in your CLI to create a SQLite database for our project.
sqlite3 demo.db < init_db.sql
Once we have the database ready, we will create a file Dockerfile to containerize our application.
FROM python:3.12-slim
WORKDIR /code
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
We will also create the so-called docker-compose.yml file that allows the application to run more smoothly.
services:
text2sql:
build: .
env_file: .env
ports:
- "8000:8000"
restart: unless-stopped
volumes:
- ./demo.db:/code/demo.db
Once everything is ready, launch the Docker desktop and run the following code to build the application.
docker compose build --no-cache
docker compose up -d
If everything was done well, you can test the application with the code below. We will ask in the data how many customers we have.
curl -X POST "http://localhost:8000/query" -H "Content-Type: application/json" -d "{"question":"How many customers?"}"
The output will look like this.
{"sql":"SELECT COUNT(*) AS customer_count FROM customers;","result":[{"customer_count":4}]}
We can try something more intricate, such as the number of orders for each customer:
curl -X POST "http://localhost:8000/query" -H "Content-Type: application/json" -d "{"question":"What is the number of orders placed by each customer"}"
With output as below.
{"sql":"SELECT customer_id, COUNT(*) AS number_of_orders FROM orders GROUP BY customer_id;","result":[{"customer_id":1,"number_of_orders":2},{"customer_id":2,"number_of_orders":1},{"customer_id":3,"number_of_orders":1}]}
That's all you need to build a basic text-to-SQL application. You can further enhance it with a front-end interface and a more intricate system tailored to your needs.
# Summary
Data is at the heart of all data-related work, and companies employ it to make decisions. In many cases, the system we have is too intricate and we have to rely on generative AI to facilitate us navigate it.
In this article, we learned how to create a straightforward Text-to-SQL application using the OpenAI, FastAPI and SQLite model.
I hope this helped!
Cornelius Yudha Vijaya is an assistant data analytics manager and data writer. Working full time at Allianz Indonesia, he loves sharing Python tips and data through social media and writing media. Cornellius writes on a variety of topics related to artificial intelligence and machine learning.
