How to build a lithe data pipeline from Airtable and Python

Share

How to build a lithe data pipeline from Airtable and Python
Photo via editor Chatgpt

# Entry

Stable Stable It offers not only a malleable, spreadsheet interface for storage and data analysis, but also API for program interaction. In other words, you can connect it to external tools and technology – for example, Python – to build data pipelines or process work flows by restoring results to the Airtable database (or simply “basics”, in Airtable jargon).

This article shows how to create a straightforward python python, similar to an ETL pipeline. We will stick to the free level, ensuring that the approach works without paid functions.

# Airtable Data Configuat

While the pipeline built in this article can be easily adapted to different data sets, for those who novel to the air and need a stable project and a stored data set as a starting point, we recommend compliance with the latter tutorial introducing to Airtable and creating a tabular set of data called “customers”, containing 200 rows and the following columns (see photo): see):

Data set/customer table in AirtableData set/customer table in Airtable
Data set/Customer table in Airtable Photo by the author

# Airtable-Python Pipelin

At Airtable, go to the user avatar-at the time of writing it is a circle of avatar located in the lower left corner of the application interface-select “Hub Builder Hub”. On the novel screen (see screenshot below), click “Personal Access tokens” and then “Create token”. Give him the name and make sure you add at least these two ranges: data.records:read AND data.records:write. Similarly, select the base in which the customer table is in the “Access” section so that your token configures access to this database.

Creating the API Airtable API tokenCreating the API Airtable API token
Creating the API Airtable token | Photo by the author

After creating the token, copy and keep it carefully in a sheltered place, because it will be shown only once. We need it later. The token starts with pat and then a long alphanumeric code.

Another key information that we will need to build our python pipeline, which interacts with Airtable, is the identifier of our database. Return to your base in the Airtable interface, and when you should see that his URL address in the browser has a syntax like: https://airtable.com/app[xxxxxx]/xxxx/xxxx. The part we are interested in copying app[xxxx] ID contained between two consecutive cuts (/): This is the basic identifier we need.

With it at hand and assuming that you already have a populated table called “customers” in your database, we are ready to start our Python program. I will employ a coding notebook. If you employ an idea, it may be necessary to change the part in which three environmental variables are defined to read them from .env Instead. In this version, for simplicity and ease of illustrations, we will directly define them in our notebook. Let’s start by installing the necessary dependencies:

!pip install pyairtable python-dotenv

Then we define variables in the air. Note that for the first two you need to properly replace the value with the actual access token and the basic identifier:

import os
from dotenv import load_dotenv # Necessary only if reading variables from a .env file
from pyairtable import Api, Table
import pandas as pd

PAT = "pat-xxx" # Your PAT (Personal Access Token) is pasted here
BASE_ID = "app-xxx" # Your Airtable Base ID is pasted here
TABLE_NAME = "Customers"

api = Api(PAT)
table = Table(PAT, BASE_ID, TABLE_NAME)

We have just configured the instance of the API Python Airtable interface and created a connection point with the client table in our database. Now in this way we read the entire set of data contained in our table from the air and load it to A Pandy DataFrame. Just be careful to employ the exact names of the column from the source table for the arguments of the routes inside get() The method causes:

rows = []
for rec in table.all():  # honors 5 rps; auto-retries on 429s
    fields = rec.get("fields", {})
    rows.append({
        "id": rec["id"],
        "CustomerID": fields.get("CustomerID"),
        "Gender": fields.get("Gender"),
        "Age": fields.get("Age"),
        "Annual Income (k$)": fields.get("Annual Income (k$)"),
        "Spending Score (1-100)": fields.get("Spending Score (1-100)"),
        "Income class": fields.get("Income Class"),
    })

df = pd.DataFrame(rows)

After loading the data, it’s time to apply a straightforward transformation. For simplicity, we will employ only one transformation, but we could employ as much as we need, as usual, when preliminary processing or cleaning of panda data sets. We will create a novel binary attribute, called Is High ValueTo mark high -value customers, i.e. those whose income and expenditure result are high:

def high_value(row):
    try:
        return (row["Spending Score (1-100)"] >= 70) and (row["Annual Income (k$)"] >= 70)
    except TypeError:
        return False

df["Is High Value"] = df.apply(high_value, axis=1)
df.head()

Resulting data set:

Transformation of data on transmission using Python and PandaTransformation of data on transmission using Python and Panda
Data transformation in Python and Panda | Photo by the author

Finally, it’s time to write changes back to Airtable, including novel data related to the novel column. There is a bit of reservations: first we must manually create a novel column called “High Value” in our Airtable client table, with its type set to the “check box” (equivalent to binary category attributes). After creating this empty column, run the following code in Python, and novel data will be automatically added to Airtable!

updates = []
for _, r in df.iterrows():
    if pd.isna(r["id"]):
        continue
    updates.append({
        "id": r["id"],
        "fields": {
            "High Value": bool(r["Is High Value"])
        }
    })

if updates:
    table.batch_update(updates)

It’s time to go back to Airtable and see what has changed in our table of source customers! If you don’t see any changes at first glance, and the novel column still seems empty, don’t panic yet. Few customers are marked as “high value” and you may need to scroll down to see some of the green tick sign:

The customer table was updatedThe customer table was updated
Updated customer table Photo by the author

That’s all! You just built your own lithe, ETL data pipeline based on two -way interaction between Airtable and Python. Well done!

# Wrapping

This article focused on presenting the possibilities of data from Airtable, versatile and user -friendly platform for data management and analysis, which combines the functions of spreadsheets and relational databases with functions powered by AI. In particular, we showed how to run a lithe data transformation pipeline with Aithon API Python Airtable, which reads the data from Airtable, transforms it and loads it back to Airtable – all in the possibilities and limitations of the free version of Airtable.

IVán Palomares Carrascosa He is a leader, writer, speaker and advisor in artificial intelligence, machine learning, deep learning and LLM. He trains and runs others, using artificial intelligence in the real world.

Latest Posts

More News