Thursday, April 30, 2026

Using Pandas artificial intelligence for data analysis

Share

Are you proficient in data field using Python? If so, I bet most of you operate Pandas for data manipulation.

If you don’t know Pandas is an open source Python package specifically developed for data analysis and manipulation. It is one of the most commonly used packages that you usually learn when starting your data science journey with Python.

What is Pandas AI? I think you are reading this article because you want to know about it.

Well, as you know, we live in times when generative artificial intelligence is everywhere. Imagine being able to analyze your data using generative AI; everything would be much simpler.

This is what Pandas artificial intelligence provides. With uncomplicated prompts, we can quickly analyze and manipulate our data set without having to send it somewhere.

In this article, we will discuss how to operate Pandas AI for data analysis tasks. In the article we will find out the following:

  • Pandas AI setup
  • Data mining with Pandas artificial intelligence
  • Data visualization with Pandas AI
  • Advanced operate of Pandas AI

If you’re ready to learn, let’s get started!

Panda artificial intelligence is a Python package that implements the capabilities of the Vast Language Model (LLM) in the Pandas API. We can operate the standard Pandas API with a Generative AI enhancement that turns Pandas into a conversation tool.

We mainly want to operate Pandas AI because of the uncomplicated process provided by the package. The package can automatically analyze data with a uncomplicated prompt, without the need for elaborate code.

Enough introduction. Let’s move on to practical classes.

We need to install the package first before anything else.

Next, we need to configure the LLM we want to operate for Pandas AI. There are several options such as OpenAI GPT and HuggingFace. However, in this tutorial we will operate OpenAI GPT.

Setting up an OpenAI model in Pandas AI is uncomplicated, but you will need an OpenAI API key. If you don’t have one, you can ride them website.

If everything is ready, let’s configure Pandas AI LLM using the code below.

from pandasai.llm import OpenAI

llm = OpenAI(api_token="Your OpenAI API Key")

You can now perform data analysis with Pandas AI.

Data mining with Pandas artificial intelligence

Let’s start with a sample dataset and try data mining with Pandas AI. For this example I would operate the Titanic data from the Seaborn package.

import seaborn as sns
from pandasai import SmartDataframe

data = sns.load_dataset('titanic')
df = SmartDataframe(data, config = {'llm': llm})

We need to pass them to the Pandas AI Sharp DataFrame object to initialize Pandas AI. We can then perform conversational actions on our DataFrame.

Let’s try to ask a uncomplicated question.

response = df.chat("""Return the survived class in percentage""")

response

The percentage of passengers who survived is: 38.38%

Thanks to the hints, Pandas AI can find a solution and answer our questions.

We can ask Pandas AI questions and the answers are in the DataFrame object. For example, here are some data analysis tips.

#Data Summary
summary = df.chat("""Can you get me the statistical summary of the dataset""")

#Class percentage
surv_pclass_perc = df.chat("""Return the survived in percentage breakdown by pclass""")

#Missing Data
missing_data_perc = df.chat("""Return the missing data percentage for the columns""")

#Outlier Data
outlier_fare_data = response = df.chat("""Please provide me the data rows that
contains outlier data based on fare column""")

Using Pandas artificial intelligence for data analysis
Photo by the author

In the image above, you can see that the Pandas AI can provide information via a DataFrame object, even if the prompt is quite elaborate.

However, Pandas AI cannot handle overly elaborate computations because the packets are confined to the LLM that we pass to the SmartDataFrame object. I’m sure that in the future, Pandas AI will be able to perform much more detailed analysis as LLM capabilities evolve.

Data visualization with Pandas AI

Pandas AI is useful for data mining and can perform data visualization. As long as we specify the prompt, Pandas AI will display the visualization result.

Let’s try a uncomplicated example.

response = df.chat('Please provide me the fare data distribution visualization')

response

Using Pandas artificial intelligence for data analysisUsing Pandas artificial intelligence for data analysis
Photo by the author

In the example above, we ask Pandas AI to visualize the distribution of the Tariff column. The output is the distribution of a bar chart from the dataset.

As with data mining, you can do any data visualization you want. However, Pandas’ artificial intelligence still cannot cope with more elaborate visualization processes.

Here are some other examples of data visualization using Pandas AI.

kde_plot = df.chat("""Please plot the kde distribution of age column and separate them with survived column""")

box_plot = df.chat("""Return me the box plot visualization of the age column separated by sex""")

heat_map = df.chat("""Give me heat map plot to visualize the numerical columns correlation""")

count_plot = df.chat("""Visualize the categorical column sex and survived""")

Using Pandas artificial intelligence for data analysisUsing Pandas artificial intelligence for data analysis
Photo by the author

The plot looks nice and neat. You can continue to ask the Pandas AI for more details if needed.

AI pandas raise usage

We can operate several built-in Pandas AI APIs to improve the performance of Pandas AI.

Clearing the cache

By default, all prompts and results from a Pandas AI object are stored in a local directory to reduce processing time and the time it takes for Pandas AI to invoke the model.

However, this cache can sometimes make the Pandas AI score irrelevant because it takes into account the past score. Therefore, it is good practice to clear the cache. You can clear them with the following code.

import pandasai as pai
pai.clear_cache()

You can also disable the cache initially.

df = SmartDataframe(data, {"enable_cache": False})

This way, no prompts or results are saved from scratch.

Custom head

It is possible to pass a sample DataFrame to Pandas AI. This is helpful if you don’t want to share some private data with LLM or just want to give an example of Pandas AI.

To do this, you can operate the code below.

from pandasai import SmartDataframe
import pandas as pd

# head df
head_df = data.sample(5)

df = SmartDataframe(data, config={
    "custom_head": head_df,
    'llm': llm
})

Pandas AI skills and agents

Pandas AI allows users to pass a sample function and execute it based on the Agent’s decisions. For example, the function below combines two different DataFrames and we pass a sample graph function to the Pandas AI agent to execute.

import pandas as pd
from pandasai import Agent
from pandasai.skills import skill

employees_data = {
    "EmployeeID": [1, 2, 3, 4, 5],
    "Name": ["John", "Emma", "Liam", "Olivia", "William"],
    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
}

salaries_data = {
    "EmployeeID": [1, 2, 3, 4, 5],
    "Salary": [5000, 6000, 4500, 7000, 5500],
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)

# Function doc string to give more context to the model for operate of this skill
@skill
def plot_salaries(names: list[str], salaries: list[int]):
    """
    Displays the bar chart  having name on x-axis and salaries on y-axis
    Args:
        names (list[str]): Employees' names
        salaries (list[int]): Salaries
    """
    # plot bars
    import matplotlib.pyplot as plt

    plt.bar(names, salaries)
    plt.xlabel("Employee Name")
    plt.ylabel("Salary")
    plt.title("Employee Salaries")
    plt.xticks(rotation=45)

    # Adding count above for each bar
    for i, salary in enumerate(salaries):
        plt.text(i, salary + 1000, str(salary), ha="center", va="bottom")
    plt.show()


agent = Agent([employees_df, salaries_df], config = {'llm': llm})
agent.add_skills(plot_salaries)

response = agent.chat("Plot the employee salaries against names")

The agent would decide whether it should operate the function we assigned to Panda’s AI or not.

The combination of skills and agent provides more controlled DataFrame analysis results.

We learned how effortless it is to operate Pandas AI to support our data analysis. By using the power of the LLM, we can reduce some of the coding work related to data analysis and instead focus on critical work.

In this article, we learned how to configure Pandas AI, perform data mining and visualization with Pandas AI, and advanced usage. There’s a lot more you can do with the package, so visit them documentation to find out more.

Cornelius Yudha Vijaya is an assistant data analytics manager and data writer. Working full time at Allianz Indonesia, he loves sharing Python tips and data through social media and writing media. Cornellius writes on a variety of topics related to artificial intelligence and machine learning.

Latest Posts

More News