
Photo by the author Chatgpt
A bottleneck of data quality that he knows that the scientist of the data
You have just received a up-to-date set of data. Before immersing yourself in the analysis, you need to understand what you work with: how much is the lack of value? Which columns are problematic? What is the overall result of the quality of the data?
Most data scientists spend 15-30 minutes by manually examining each up-to-date set of data-putting it to Panda, acting .info()IN .describe()AND .isnull().sum()Then create visualization to understand the missing data patterns. This routine becomes tedious when you rate many sets of data every day.
What if you can paste any URL CSV address and get a professional data quality report in less than 30 seconds? No configuration of Python’s environment, without manual coding, no switching between tools.
Solution: 4-fold work flow N8N
N8N (pronounced “N-Eight-N”) is a platform for the Open Source work flow automation, which combines various services, API interfaces and tools via visual, dragging and dropping. While most people associate the automation of work flow with business processes such as E -Mail Marketing or customer service, N8N can also assist automate tasks on data learning that traditionally require non -standard scenarios.
Unlike writing Python independent scripts, N8N’s flows are visual, reusable and effortless to modify. You can connect data sources, carry out transformations, run analyzes and provide results – all without switching between different tools or environments. Each work flow consists of “nodes” that represent various actions, combined to create an automated pipeline.
Our automated data quality analyzer consists of four connected nodes:

- Manual trigger – starts work after clicking “do”
- HTTP request – downloads any CSV file from the URL
- Clock node – analyzes data and generates quality indicators
- HTML node – creates a attractive, professional report
Building work flow: Step by step implementation
Preliminary requirements
- N8N account (free 14 -day process at n8n.io)
- Our pre -built work flow template (JSON file delivered)
- Any set of CSV data available via the public URL (we will provide test examples)
Step 1: Import the work flow template
Instead of building from scratch, we will employ a pre -configured template, which includes all logic of the analysis:
- Download work flow file
- Open N8N and click “import from the file”
- Choose downloaded JSON file – all four nodes will appear automatically
- Save the work flow with the preferred name
The imported flow of work contains four connected nodes with all configured convoluted analysis code.
Step 2: Understanding your work flow
Let’s go through what every knot does:
Manual trigger: Start the analysis after clicking “Make a workflow”. Ideal for data quality control on demand.
HTTP request node: Downloads CSV data from any public URL. Initially configured to support most of the standard CSV formats and return the text data needed for analysis.
Clock node: Analysis engine, which includes solid logic of analyzing CSV in order to support typical changes in the employ of limiters, cited fields and formats of missing values. It’s automatically:
- There is CSV data with wise field detection
- Identifies the missing values in many formats (zero, empty, “not applicable”, etc.)
- Calculates the results of quality and severity assessment
- Generates a specific recommendation
HTML node: Transforms the results of the analysis into a attractive, professional report on quality results coded in color and pure formatting.
Step 3: Data adjustment
To analyze your own set of data:
- Click the HTTP request node
- List the URL address with the URL of the CSV data set:
- Current:
https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv - Your data:
https://your-domain.com/your-dataset.csv
- Current:
- Save the work flow

That’s all! The logic of the analysis automatically adapts to various CSV structures, column names and data types.
Step 4: Make and display the results
- Click “Make work” on the top toolbar
- Watch the node process – each will display a green mark after completion
- Click HTML and select the “HTML” card to view the report
- Copy the report or make screenshots to share with the team
The whole process lasts less than 30 seconds after configuring work flow.
Understanding the results
The colorful quality result gives you an immediate assessment of your data set:
- 95-100%: Ideal (or almost ideal) data quality, ready for immediate analysis
- 85-94%: Excellent quality with minimal needed cleaning
- 75-84%: Good quality, preliminary processing required
- 60-74%: Righteous quality, moderate cleaning needed
- Below 60%: Low quality, significant data work required
Note: This implementation uses a straightforward scoring system based on missing data. Advanced quality indicators, such as data consistency, protruding detection or scheme validation, can be added to future versions.
Here’s what the final report looks like:
Our sample analysis shows 99.42% quality assessment – indicating that the data set is largely complete and ready for analysis with minimal initial processing.
Data set review:
- 173 Total records: Compact but sufficient sample size ideal for quick exploration analysis
- 21 total columns: Management possible number of functions that allow concentrated observations
- 4 columns with missing data: Several selected fields include gaps
- 17 complete columns: Most fields are fully populated
Testing with different data sets
To see how the work flow supports various data quality patterns, try these sample data sets:
- IRIS data set (
https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv) Usually shows an excellent result (100%) without missing values. - Titanic data set (
https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv) Shows a more realistic result of 67.6% due to strategic missing data in columns such as age and cabin. - Your own data: Send to GitHub RAW or employ any public URL CSV address
Based on the quality result, you can specify the next steps: above 95% means to go directly to the analysis of exploratory data, 85-94% suggests minimal cleaning of identified problematic columns, 75-84% indicates that the assessment of the data set is needed moderate initial work, or whether significant data is appropriate, it is appropriate. The work flow is automatically adapted to any CSV structure, enabling a quick assessment of many data sets and priority to determine the priority of data preparation.
Next steps
1. Integration e -mail
Add Send E -Mail A node for automatically providing reports to stakeholders by connecting it on the HTML node. This transforms your work flow into a distribution system, in which quality reports are automatically sent to project managers, data engineers or customers every time you analyze a up-to-date set of data. You can adjust the E -Mail template to contain executive summaries or specific recommendations based on the quality result.
2. Planned analysis
List manual trigger with Trigger schedule To automatically analyze data sets at regular intervals, ideal for monitoring data sources that often update. Configure everyday, weekly or monthly checks of your key data sets to catch quality degradation early. This proactive approach helps to identify problems with data pipelines before they affect the lower analysis or model performance.
3. Analysis of many data sets
Modify workflow to accept the list of URL CSV addresses and generate a comparative quality report in many data sets at the same time. This approach to batch processing is invaluable when assessing data sources for a up-to-date project or conducting regular audits in the field of organization’s data. You can create summary navigation desktops that arise data sets according to quality assessment, helping to determine priorities, which data sources require immediate attention compared to those ready for analysis.
4. Various file formats
Expand work flow to handle other data formats except CSV, modifying the logic of analyzing in the code node. For JSON files, adjust the data extraction to support nested structures and boards, while Excel files can be processed by adding initial processing stage to convert XLSX to CSV format. Support for many formats means that your quality analyzer is a universal tool for any data source in your organization, regardless of how data is stored or delivered.
Application
This N8N work flow shows how visual automation can improve routine data learning, maintaining a technical depth that scientists require. By using the existing coding background, you can adjust the logic of JavaScript analysis, expand HTML reporting templates and integrate with the preferred data infrastructure – all in an intuitive visual interface.
The modular work flow design makes it particularly valuable for data scientists who understand both technical requirements and the business context of data quality assessment. Unlike unyielding tools without code, N8N allows you to modify the basic logic of analysis, while ensuring visual transparency, which makes it easier to share, debug and maintain work flows. You can start with this foundation and gradually add sophisticated functions, such as detection of statistical anomalies, non -standard quality indicators or integration with the existing MLOPS pipeline.
Most importantly, this approach combines the gap between specialist knowledge of data sciences and organizational availability. Your technical colleagues can modify the code, while non -technical stakeholders can perform work flows and immediately interpret the results. This combination of technical sophistication and user -friendly performance makes N8N ideal for data scientists who want to scale their influence beyond individual analysis.
Born in India and raised in Japan, Vinod brings a global perspective on data learning and machine education. The gap between the emerging artificial intelligence technologies and practical implementation for working professionals will win. Vinod focuses on creating available learning paths for convoluted topics, such as agentic AI, performance optimization and AI engineering. He focuses on practical implementation of machine learning and mentoring the next generation of data specialists through live sessions and personalized tips.
