
Photo by the author Canva
According to a data science report AnacondaData scientists spend almost 60% of the time to neat and organize data. These are routine, time -consuming tasks that make them ideal candidates for chatgpt to take over.
In this article, we will examine five routine tasks that ChatgPT can handle if you apply the appropriate hints, including cleaning and organizing data. We will apply a real data project from the ghettos, a London taxi application, similar to Uber, used in the recruitment process to show how it works in practice.
Case study: Analysis of unsuccessful driving orders from the ghetto
IN This data designThe ghetto asks for an analysis of the rider’s unsuccessful orders, examining key adjustment indicators to understand why some customers did not get a car.
Here is a description of the data.

Now let’s examine it by sending data to chatgpt.
In the next five steps we will go through routine tasks that ChatgPT can handle in the data project. The steps are shown below.

Step 1: Data exploration and data analysis
In data exploration, we apply the same functions every time as headIN informationOr describe.
When we ask chatgpt, we will attach key functions in the poem. We will also paste the design description and attach a set of data.

We will apply the poem below. Just replace the text inside square brackets for the project description. You can find a description of the project Here:
Here is the data project description: [paste here ]
Perform basic EDA, show head, info, and summary stats, missing values, and correlation heatmap.
Here is the output data.

As you can see, ChatgPT summarizes the set of data, emphasizing key columns, missing values, and then creates a heat map of correlation for exploration of relationships.
Step 2: Data cleaning
Both sets of data contain missing values.

Let’s write a prompt for a job on it.
Pristine this dataset: identify and handle missing values appropriately (e.g., drop or impute based on context). Provide a summary of the cleaning steps.
Here is a summary of what chatgpt did:

ChatgPT converted the dates column, dropped incorrect orders and assigned the missing values to m_order_eta.
Step 3: Generate visualizations
To fully apply data, it is critical to visualize the right things. Instead of generating random charts, we can lead chatgpt, giving a link to the source, which is called Recovery generation.
We will apply This article. Here is the prompt:
Before generating visualizations, read this article on choosing the right plots for different data types and distributions: [LINK]. hen, show most suitable visualizations for this dataset and explain why each was selected and produce the plots in this chat by running code on the dataset.
Here is the output data.

We have six different charts that we made with chatgpt.

You will see why a related chart, chart and explanation of this chart were chosen.
Step 4: Prepare your set of data learning data
Now, when we managed the missing values and examined the set of data, the next step is to prepare it for machine learning. This covers such steps Coding of categorical variables AND Scaling of numerical functions.
Here is our prompt.
Prepare this set of data learning data: Code categorical variables, scaling of numerical functions and return a neat data frame ready for modeling. Explain every step briefly.
Here is the output data.

Now your functions have been scaled and coded, so your data set is ready to apply the machine learning model.
Step 5: Application of the machine learning model
Let’s get to Modeling of machine learning. We will apply the following quick structure to apply the basic machine learning model.
Employ this set of data to predict [target variable]. Employ [model type] and report machine learning grade indicators, such as [accuracy, precision, recall, F1-score]. Employ the appropriate 5 functions and explain your modeling stages.
Let’s update this prompt based on our project.
Employ this set of data to predict order_status_key. Apply a multi -class classification model (e.g. random forest) and report evaluation indicators such as accuracy, precision, withdrawal and F1 result. Employ only the 5 most appropriate functions and explain your modeling stages.
Now paste it into the ongoing conversation and review the results.
Here is the output data.

As you can see, the model worked well, maybe too well?
Bonus: Gemini Cli
Twins launched Open Source agent that you can interact with the terminal. You can install it with this code. (60 models’ demands per minute and 1000 requests per day without any fees.)
In addition to chatgPT, you can also apply Gemini CLI to support routine scientific tasks, such as cleaning, exploration and even building a navigation desktop to automate these tasks.
Gemini CLI provides a plain command line interface and is available without any costs. Let’s start by installing it using the code below.
sudo npm install -g @google/gemini-cli
After starting the above code, open the terminal and paste the following code to start building with it:
After starting the above commands, you will see Cli Gemini as shown on the screenshot below.

Gemini Cli allows you to run the code, ask questions and even create applications directly from the terminal. In this case, we will apply Gemini CLI to build an improved application, which automates everything we have done so far, EDA, cleaning, visualization and modeling.
To build Tasty Application, we will apply a poem that covers all steps. It is shown below.
An improved application has been built, which automates EDA, cleaning data, creates automatic data visualization, prepares a set of data for machine learning and uses a machine learning model after the user selects the target variables.
It will prepare you for the right when creating a catalog or starting the code on the terminal.

After a few steps of approval, like us, the improved application will be ready, as shown below.

Now let’s check it.


Final thoughts
In this article, we first used chatgpt to support routine tasks, such as data cleaning, exploration and data visualization. Then we went a step further, using it to prepare our set of data learning and machine learning models used.
Finally, we used Gemini CLI to create a discharge resolution that performs all these steps with only clicking.
To demonstrate all this, we used the data Project from the ghettos. Although artificial intelligence is not yet completely reliable for every task, you can apply them to deal with routine tasks, saving a lot of time.
Nate Rosidi He is a scientist of data and in the product strategy. He is also an analytical teacher and the founder of Stratascratch, platforms assist scientists to prepare for interviews with real questions from the highest companies. Nate writes about the latest trends on the career market, gives intelligence advice, divides data projects and includes everything SQL.
