10 useful tiny Python statements for processing CSV

Photo by the editor

# Entry

CSV files are everywhere in data workflows, from database exports to API responses and spreadsheet downloads. While pandas works great, sometimes you need quick solutions that you can code with Python without having to install pandas.

Python’s built-in csv module combined with list characters and generator expressions can handle the most common CSV tasks in a single line of code. These single-line statements are perfect for quick data exploration, ETL debugging, or working in narrow environments where external libraries are not available.

Let’s apply an example business data set with 50 records: data.csv and start!

🔗 Link to code on GitHub

# 1. Find the sum of the columns

Calculate the sum of any numeric column across all rows.

print(f"Total: ${sum(float(r[3]) for r in __import__('csv').reader(open(path)) if r[0] != 'transaction_id'):,.2f}")

Here, path is a variable that stores the path to the sample CSV file. This example in Google Colab does path = "/content/data.csv".

Exit:

Here, __import__('csv') imports the built-in CSV module. The generator expression omits the header row, converts column values to floating-point numbers, sums them, and formats them using currency notation. If necessary, adjust the column index (3) and check the header.

# 2. Group by maximum

Find which group has the highest aggregate value in the entire dataset.

print(max({r[5]: sum(float(row[3]) for row in __import__('csv').reader(open(path)) if row[5] == r[5] and row[0] != 'transaction_id') for r in __import__('csv').reader(open(path)) if r[0] != 'transaction_id'}.items(), key=lambda x: x[1]))

Exit:

('Mike Rodriguez', 502252.0)

Vocabulary Comprehension groups by column 5, summing the values in column 3 for each group. One pass collects group keys and the other performs aggregation. max() with lambda finds the highest sum. Customize column indexes for various grouping operations.

# 3. Filter and display a subset of rows

Display only rows that meet a specific condition with formatted output.

print("n".join(f"{r[1]}: ${float(r[3]):,.2f}" for r in __import__('csv').reader(open(path)) if r[7] == 'Enterprise' and r[0] != 'transaction_id'))

Exit:

Acme Corp: $45,000.00
Gamma Solutions: $78,900.00
Zeta Systems: $156,000.00
Iota Industries: $67,500.25
Kappa LLC: $91,200.75
Nu Technologies: $76,800.25
Omicron LLC: $128,900.00
Sigma Corp: $89,700.75
Phi Corp: $176,500.25
Omega Technologies: $134,600.50
Alpha Solutions: $71,200.25
Matrix Systems: $105,600.25

The generator expression filters rows where column 7 is equal to Enterpriseand then formats columns 1 and 3. Using "n".join(...) avoids printing the list None values.

# 4. Group by sum distribution

Get totals for each unique value in a grouping column.

print({g: f"${sum(float(row[3]) for row in __import__('csv').reader(open(path)) if row[6] == g and row[0] != 'transaction_id'):,.2f}" for g in set(row[6] for row in __import__('csv').reader(open(path)) if row[0] != 'transaction_id')})

Exit:

{'Asia Pacific': '$326,551.75', 'Europe': '$502,252.00', 'North America': '$985,556.00'}

Dictionary understanding first extracts the unique values from column 6 using collective understanding, and then calculates the sum of column 3 for each group. This is memory capable due to the generator expressions. Change column indexes to group by different fields.

# 5. Threshold filter with sorting

Find and rate all records above a specific numeric threshold.

print([(n, f"${v:,.2f}") for n, v in sorted([(r[1], float(r[3])) for r in list(__import__('csv').reader(open(path)))[1:] if float(r[3]) > 100000], key=lambda x: x[1], reverse=True)])

Exit:

[('Phi Corp', '$176,500.25'), ('Zeta Systems', '$156,000.00'), ('Omega Technologies', '$134,600.50'), ('Omicron LLC', '$128,900.00'), ('Matrix Systems', '$105,600.25')]

Filters rows where column 3 exceeds 100000creates tuples of name and numeric value, sorts by numeric value, and then formats the values as currency for display. Adjust the threshold and columns if necessary.

# 6. Count unique values

Quickly determine how many different values exist in any column.

print(len(set(r[2] for r in __import__('csv').reader(open(path)) if r[0] != 'transaction_id')))

Exit:

Here, set understanding extracts unique values from column 2; len() counts them. This is useful for checking the diversity of your data or finding distinct categories.

# 7. Conditional aggregation

Calculate averages or other statistics for specific subsets of data.

print(f"Average: ${sum(float(r[3]) for r in __import__('csv').reader(open(path)) if r[6] == 'North America' and r[0] != 'transaction_id') / sum(1 for r in __import__('csv').reader(open(path)) if r[6] == 'North America' and r[0] != 'transaction_id'):,.2f}")

Exit:

This one-row method calculates the average of column 3 for the rows that match the condition in column 6. It uses the sum divided by a number (using a generator expression). Reads the file twice but keeps memory usage low.

# 8. Multi-column filter

Apply multiple filter conditions to different columns at once.

print("n".join(f"{r[1]} | {r[2]} | ${float(r[3]):,.2f}" for r in __import__('csv').reader(open(path)) if r[2] == 'Software' and float(r[3]) > 50000 and r[0] != 'transaction_id'))

Exit:

Zeta Systems | Software | $156,000.00
Iota Industries | Software | $67,500.25
Omicron LLC | Software | $128,900.00
Sigma Corp | Software | $89,700.75
Phi Corp | Software | $176,500.25
Omega Technologies | Software | $134,600.50
Nexus Corp | Software | $92,300.75
Apex Industries | Software | $57,800.00

Combines multiple filter conditions with and operators, checks string equality and numeric comparisons, and formats output with piped separators for spotless display.

# 9. Calculate column statistics

Generate minimum, maximum and average statistics for numeric columns in one go.

vals = [float(r[3]) for r in __import__('csv').reader(open(path)) if r[0] != 'transaction_id']; print(f"Min: ${min(vals):,.2f} | Max: ${max(vals):,.2f} | Avg: ${sum(vals)/len(vals):,.2f}"); print(vals)

Exit:

Min: $8,750.25 | Max: $176,500.25 | Avg: $62,564.13
[45000.0, 12500.5, 78900.0, 23400.75, 8750.25, 156000.0, 34500.5, 19800.0, 67500.25, 91200.75, 28750.0, 43200.5, 76800.25, 15600.75, 128900.0, 52300.5, 31200.25, 89700.75, 64800.0, 22450.5, 176500.25, 38900.75, 27300.0, 134600.5, 71200.25, 92300.75, 18900.5, 105600.25, 57800.0]

This will create a list of numeric values from column 3 and then calculate the minimum, maximum and average values in one row. A semicolon separates statements. For these statistics, it requires more memory than streaming, but is faster than reading files multiple times.

# 10. Export filtered data

Create a up-to-date CSV file containing only the rows that match your criteria.

__import__('csv').writer(open('filtered.csv','w',newline="")).writerows([r for r in list(__import__('csv').reader(open(path)))[1:] if float(r[3]) > 75000])

This will read the CSV file, filter the rows based on the condition, and save them to a up-to-date file. The newline="" The parameter prevents additional line breaks. Note that this example omitted the header (uses [1:]), so include it explicitly if you need a header in the output.

Summary

I hope you find these one-line CSV processing instructions helpful.

These one-liners are useful for:

Rapid data exploration and verification
Basic data transformations
Prototyping before writing full scripts

But you should avoid them because of:

Production data processing
Files requiring intricate error handling
Multi-stage transformations

These techniques work with Python’s built-in CSV module when you need quick solutions without the need for configuration. Have fun analyzing!

Bala Priya C is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She enjoys reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates fascinating resource overviews and coding tutorials.

Categories

10 useful tiny Python statements for processing CSV

# Entry

# 1. Find the sum of the columns

# 2. Group by maximum

# 3. Filter and display a subset of rows

# 4. Group by sum distribution

# 5. Threshold filter with sorting

# 6. Count unique values

# 7. Conditional aggregation

# 8. Multi-column filter

# 9. Calculate column statistics

# 10. Export filtered data

Summary

3 questions: Building predictive models to characterize cancer progression

Run miniature AI models locally with BitNet – a beginner’s guide

ChatGPT can now create interactive visualizations to facilitate you understand math and science concepts

From gaming to biology and beyond: 10 years of AlphaGo’s impact

Why CDC RFK Supports “Shared Decision Making” on Vaccines

More News

Run miniature AI models locally with BitNet – a beginner’s guide

Why CDC RFK Supports “Shared Decision Making” on Vaccines

Are language models a commodity?

Don’t expect any massive surprises in government foreign files

3 questions: Building predictive models to characterize cancer progression

Run miniature AI models locally with BitNet – a beginner’s guide

ChatGPT can now create interactive visualizations to facilitate you understand math and science concepts