5 routine tasks that CHATGPT can handle to data scientists

Tasks that ChatgPT can handle for data scientists

Photo by the author Canva

According to a data science report AnacondaData scientists spend almost 60% of the time to neat and organize data. These are routine, time -consuming tasks that make them ideal candidates for chatgpt to take over.

In this article, we will examine five routine tasks that ChatgPT can handle if you apply the appropriate hints, including cleaning and organizing data. We will apply a real data project from the ghettos, a London taxi application, similar to Uber, used in the recruitment process to show how it works in practice.

Case study: Analysis of unsuccessful driving orders from the ghetto

IN This data designThe ghetto asks for an analysis of the rider’s unsuccessful orders, examining key adjustment indicators to understand why some customers did not get a car.

Here is a description of the data.

Analysis of unsuccessful orders for a ghetto ride

Now let’s examine it by sending data to chatgpt.

In the next five steps we will go through routine tasks that ChatgPT can handle in the data project. The steps are shown below.

Analysis of unsuccessful orders for a ghetto ride

Step 1: Data exploration and data analysis

In data exploration, we apply the same functions every time as headIN informationOr describe.

When we ask chatgpt, we will attach key functions in the poem. We will also paste the design description and attach a set of data.

Data exploration and analysis

We will apply the poem below. Just replace the text inside square brackets for the project description. You can find a description of the project Here:

Here is the data project description: [paste here ] 
Perform basic EDA, show head, info, and summary stats, missing values, and correlation heatmap.

Here is the output data.

Data exploration and analysis

As you can see, ChatgPT summarizes the set of data, emphasizing key columns, missing values, and then creates a heat map of correlation for exploration of relationships.

Step 2: Data cleaning

Both sets of data contain missing values.

Data cleaning

Let’s write a prompt for a job on it.

Pristine this dataset: identify and handle missing values appropriately (e.g., drop or impute based on context). Provide a summary of the cleaning steps.

Here is a summary of what chatgpt did:

Data cleaning

ChatgPT converted the dates column, dropped incorrect orders and assigned the missing values to m_order_eta.

Step 3: Generate visualizations

To fully apply data, it is critical to visualize the right things. Instead of generating random charts, we can lead chatgpt, giving a link to the source, which is called Recovery generation.

We will apply This article. Here is the prompt:

Before generating visualizations, read this article on choosing the right plots for different data types and distributions: [LINK]. hen, show most suitable visualizations for this dataset and explain why each was selected and produce the plots in this chat by running code on the dataset.

Here is the output data.

Visualize

We have six different charts that we made with chatgpt.

Visualize

You will see why a related chart, chart and explanation of this chart were chosen.

Step 4: Prepare your set of data learning data

Now, when we managed the missing values and examined the set of data, the next step is to prepare it for machine learning. This covers such steps Coding of categorical variables AND Scaling of numerical functions.

Here is our prompt.

Prepare this set of data learning data: Code categorical variables, scaling of numerical functions and return a neat data frame ready for modeling. Explain every step briefly.

Here is the output data.

Prepare your set of data learning data

Now your functions have been scaled and coded, so your data set is ready to apply the machine learning model.

Step 5: Application of the machine learning model

Let’s get to Modeling of machine learning. We will apply the following quick structure to apply the basic machine learning model.

Employ this set of data to predict [target variable]. Employ [model type] and report machine learning grade indicators, such as [accuracy, precision, recall, F1-score]. Employ the appropriate 5 functions and explain your modeling stages.

Let’s update this prompt based on our project.

Employ this set of data to predict order_status_key. Apply a multi -class classification model (e.g. random forest) and report evaluation indicators such as accuracy, precision, withdrawal and F1 result. Employ only the 5 most appropriate functions and explain your modeling stages.

Now paste it into the ongoing conversation and review the results.

Here is the output data.

Use of a machine learning model

As you can see, the model worked well, maybe too well?

Bonus: Gemini Cli

Twins launched Open Source agent that you can interact with the terminal. You can install it with this code. (60 models’ demands per minute and 1000 requests per day without any fees.)

In addition to chatgPT, you can also apply Gemini CLI to support routine scientific tasks, such as cleaning, exploration and even building a navigation desktop to automate these tasks.

Gemini CLI provides a plain command line interface and is available without any costs. Let’s start by installing it using the code below.

sudo npm install -g @google/gemini-cli

After starting the above code, open the terminal and paste the following code to start building with it:

After starting the above commands, you will see Cli Gemini as shown on the screenshot below.

Gemini Cli

Gemini Cli allows you to run the code, ask questions and even create applications directly from the terminal. In this case, we will apply Gemini CLI to build an improved application, which automates everything we have done so far, EDA, cleaning, visualization and modeling.

To build Tasty Application, we will apply a poem that covers all steps. It is shown below.

An improved application has been built, which automates EDA, cleaning data, creates automatic data visualization, prepares a set of data for machine learning and uses a machine learning model after the user selects the target variables.

It will prepare you for the right when creating a catalog or starting the code on the terminal.

Gemini Cli

After a few steps of approval, like us, the improved application will be ready, as shown below.

Gemini Cli

Now let’s check it.

Gemini Cli

Final thoughts

In this article, we first used chatgpt to support routine tasks, such as data cleaning, exploration and data visualization. Then we went a step further, using it to prepare our set of data learning and machine learning models used.

Finally, we used Gemini CLI to create a discharge resolution that performs all these steps with only clicking.

To demonstrate all this, we used the data Project from the ghettos. Although artificial intelligence is not yet completely reliable for every task, you can apply them to deal with routine tasks, saving a lot of time.

Nate Rosidi He is a scientist of data and in the product strategy. He is also an analytical teacher and the founder of Stratascratch, platforms assist scientists to prepare for interviews with real questions from the highest companies. Nate writes about the latest trends on the career market, gives intelligence advice, divides data projects and includes everything SQL.

Categories

5 routine tasks that CHATGPT can handle to data scientists

Case study: Analysis of unsuccessful driving orders from the ghetto

Step 1: Data exploration and data analysis

Step 2: Data cleaning

Step 3: Generate visualizations

Step 4: Prepare your set of data learning data

Step 5: Application of the machine learning model

Bonus: Gemini Cli

Final thoughts

3 questions: About the future of artificial intelligence and mathematical and physical sciences

The measles outbreak in South Carolina is slowing down

Nvidia will spend $26 billion to build open-weight artificial intelligence models, filings show

Iran warns that US technology companies could become targets as the war expands

Inside OpenAI’s race to catch up with Claude Code

More News

The measles outbreak in South Carolina is slowing down

5 free AI tools to understand code and generate documentation

The interstellar comet 3I/Atlas has another surprise: it’s full of alcohol

Run miniature AI models locally with BitNet – a beginner’s guide

3 questions: About the future of artificial intelligence and mathematical and physical sciences

The measles outbreak in South Carolina is slowing down

Nvidia will spend $26 billion to build open-weight artificial intelligence models, filings show