STEP 01
Upload & Size Validation
CSV, Excel, JSON, TSV, Google Sheets. File integrity check. Row/column count validation with smart warnings.
01
STEP 02
Smart Goal Selection
Machine Learning (Classification · Regression · Clustering · Time-Series), Research, or Visualization. Strategy adapts entirely.
GOAL-AWARE
STEP 03
Custom Rules Engine
Define your own validation rules before scanning. Age 0–120, Revenue > 0, Email format. Rules saved for reuse.
NOVEL FEATURE
03
STEP 04
Full Dataset Scanner
Missing values, duplicates, type errors, outliers, format issues, bias, PII detection (Luhn check, email, phone, ID), data freshness.
STEP 05
Health Score + Trust Scores
Six-dimensional score (0–100) plus individual column trust ratings. Filter columns below your trust threshold.
PATENT-WORTHY
05
STEP 06
Risk & Severity Analyzer
Every issue classified as Low / Medium / High. Full risk table with affected column, row count, severity badge, and recommended fix.
STEP 07
Visualisation Dashboard
Missing value heatmap, outlier box plots, distribution histograms, categorical frequency, anomaly timeline for date columns.
07
STEP 08
Structure Map + Column Meanings
DOB → Date of Birth. ZIP → Location Code. Correlation heatmap. Foreign key detection. Causal dependency mapping.
NOVEL FEATURE
STEP 09
AI Story Narrator
Plain-English story of your dataset using only real scan results. No invented facts. Simple language for non-technical users.
AI POWERED
09
STEP 10
Smart Cleaning Suggestions
Goal-aware recommendations per issue. Fill with median vs remove rows vs regression prediction — pros and risks explained.
STEP 11
What-If Simulator
Toggle strategies before applying. Estimated health score, dataset size, and ML accuracy update instantly. Sampled for large datasets.
INTERACTIVE
11
STEP 12
Cleaning Preview Confirmation
User approves every sensitive action. Keep / Cap / Remove outliers. Zero row deletion without explicit confirmation.
STEP 13
Auto-Cleanse + Post-Clean Summary
Fill missing, remove duplicates, fix types, normalise dates, convert word numbers. Immediate before/after comparison card.
13
STEP 14
PII Anonymisation
Mask · Hash · Tokenise · Drop — per column. GDPR compliance status updated after each action.
PRIVACY
STEP 15
Advanced Bias Detection
Class imbalance, intersectional bias (Age + Gender combined), demographic parity score, equal opportunity score.
AI ETHICS
15
STEP 16
Synthetic Data Generator
Privacy-safe artificial dataset. User sets row count and differential privacy ε value. Quality validated via KS test and correlation comparison.
STEP 17
ML Performance Comparison
Accuracy, Precision, Recall, F1 — raw vs clean vs synthetic. Feature importance chart shows which columns matter most after cleaning.
17
STEP 18
Compliance Report
GDPR · HIPAA · ISO 8000 · EU AI Act. Each gap shows: Issue · Framework · Risk Level · Required Fix.
ENTERPRISE
STEP 19
Reliability Certificate + Data Dictionary
Downloadable PDF certificate. Full data dictionary with column meanings, trust scores, and cleaning actions taken.
19
STEP 20
Data Contract Generator
YAML schema generated from clean data. All future uploads validated automatically against this contract. Zero code needed.
PATENT-WORTHY
STEP 21
Export Cleaning Code
Full Python (pandas) + R (tidyverse) scripts matching every cleaning action applied. Ready to run immediately.
21
STEP 22
Colab Notebook + Session Summary
One-click Colab with AutoML. Final session summary screen with all 6 downloads. Option to re-upload and compare versions.