Photo by the editor# Entry
In this digital era, data has become an easier commodity to store. With so much data for enterprises, analyzing it to gain insights has become more significant than ever.
In most enterprises, data is stored in a structured database and SQL is used to obtain data. Thanks to SQL, we can query data in any form, as long as the script is valid.
The problem is that sometimes the query to get the data we need is sophisticated and non-dynamic. In this case, we can utilize SQL stored procedures to simplify tedious scripts into elementary calls.
This article discusses creating data analysis automation scripts using SQL stored procedures.
Inquisitive? Here’s how to do it.
# SQL Stored Procedures
Therefore, it is worth learning SQL stored procedures that allow you to simplify your code and automate repetitive tasks.
Let’s try it out with an example. In this tutorial I will utilize MySQL for database i stock market data from Kaggle for a table example. Set up MySQL Workbench on your local machine and create a schema in which we can store the table. In my example, I created a database called finance_db with the so-called table stock_data.
We can query the data using something like below.
USE finance_db;
SELECT * FROM stock_data;
Generally, a stored procedure has the following structure.
DELIMITER $$
CREATE PROCEDURE procedure_name(param_1, param_2, . . ., param_n)
BEGIN
instruct_1;
instruct_2;
. . .
instruct_n;
END $$
DELIMITER ;
As you can see, the stored procedure can receive parameters passed to our query.
Let’s analyze the actual implementation. For example, we can create a stored procedure to aggregate stock metrics for a specific date range.
USE finance_db;
DELIMITER $$
CREATE PROCEDURE AggregateStockMetrics(
IN p_StartDate DATE,
IN p_EndDate DATE
)
BEGIN
SELECT
COUNT(*) AS TradingDays,
AVG(Close) AS AvgClose,
MIN(Low) AS MinLow,
MAX(High) AS MaxHigh,
SUM(Volume) AS TotalVolume
FROM stock_data
WHERE
(p_StartDate IS NULL OR Date >= p_StartDate)
AND (p_EndDate IS NULL OR Date <= p_EndDate);
END $$
DELIMITER ;
In the above query, we have created a stored procedure called AggregateStockMetrics. This procedure takes the start and end dates as parameters. The parameters are then used as conditions to filter the data.
You can call a stored procedure like this:
CALL AggregateStockMetrics('2015-01-01', '2015-12-31');
The procedure will be performed with the parameters provided by us. Because a stored procedure is stored in the database, it can be used from any script that connects to the database that contains the procedure.
With stored procedures, we can easily reuse logic in other environments. For example, I will call a procedure from Python using a MySQL connector.
To do this, first install the library:
pip install mysql-connector-python
Then create a function that connects to the database, calls the stored procedure, retrieves the result, and closes the connection.
import mysql.connector
def call_aggregate_stock_metrics(start_date, end_date):
cnx = mysql.connector.connect(
user="your_username",
password='your_password',
host="localhost",
database="finance_db"
)
cursor = cnx.cursor()
try:
cursor.callproc('AggregateStockMetrics', [start_date, end_date])
results = []
for result in cursor.stored_results():
results.extend(result.fetchall())
return results
finally:
cursor.close()
cnx.close()
The result will be similar to the result below.
[(39, 2058.875660431691, 1993.260009765625, 2104.27001953125, 140137260000.0)]
That's everything you need to know about SQL Stored Procedures. You can extend this further for automation by using a scheduler in your pipeline.
# Summary
SQL stored procedures enable you to encapsulate sophisticated queries into lively, single-unit functions that can be reused for repeatable data analysis tasks. The procedures are stored in the database and can be easily accessed through various scripts or applications such as Python.
I hope this helped!
Cornelius Yudha Vijaya is an assistant data analytics manager and data writer. Working full time at Allianz Indonesia, he loves sharing Python tips and data through social media and writing media. Cornellius writes on a variety of topics related to artificial intelligence and machine learning.
