VeritasData — Intelligent Data Quality Platform

Start Here

Upload Your Dataset

CSV · Excel · JSON · TSV · Google Sheets — all supported

⬆

Drag & drop your file here

or click anywhere here to browse · Maximum 100MB

CSV XLSX JSON TSV

— SELECT YOUR GOAL —

◉

Machine Learning

Classification · Regression · Clustering

▦

Research / Analysis

Statistics · Insights · Reports

△

Visualization

Charts · Dashboards · Plots

Health Score

Six-dimensional quality scoring

Every dataset receives a comprehensive quality score across six critical dimensions — giving you an honest picture of your data's health before you use it.

DATASET HEALTH SCORE

/ 100

Excellent Quality

Completeness

Consistency

Duplicate Risk

Bias Risk

Outlier Risk

Freshness

DETECTED ISSUES

HIGH

Outliers Detected

Salary column — 5 values above IQR bound

5 rows

HIGH

PII Detected

Email column — personal data unmasked

1,200

MED

Missing Values

Age column — 12% empty

143 rows

MED

Class Imbalance

Gender: 92% Male — bias risk

All rows

LOW

Date Format Mix

DOB — 3 formats detected

48 rows

LOW

Duplicate Records

20 exact duplicate rows

20 rows

61 → 87

Score After Cleaning

71% → 86%

ML Accuracy Gain

Full Pipeline

28-step intelligence pipeline

Every dataset passes through a complete automated pipeline — from raw upload to certified clean output.

Upload & Validate

File format check, size validation, corruption detection

Size Validation

Row/column count, small dataset warning, large dataset sampling

Goal Selection

ML / Research / Visualization — adapts full strategy

Custom Rules

User-defined validation rules applied before scanning

Dataset Scanner

Missing · Duplicates · Types · Outliers · Formats · PII · Bias

Health Score

Six-dimensional quality score (0–100)

Trust Scores

Per-column reliability rating with filter

Risk Analyzer

Low / Medium / High severity per issue

Visualisation

Heatmap · Outlier chart · Distribution · Timeline

Structure Map

Column meaning detector + relationship map

AI Narrator

Plain-English dataset story from scan results only

Smart Suggestions

Goal-aware cleaning recommendations with risk/benefit

What-If Simulator

Test strategies before applying — instant preview

Auto-Cleanse

Fill · Deduplicate · Fix types · Normalise dates

PII Anonymiser

Mask · Hash · Tokenise · Drop — GDPR cleared

Bias Detection

Class imbalance + intersectional bias + fairness metrics

Synthetic Data

Privacy-safe artificial dataset generation + validation

ML Comparison

Accuracy · Precision · Recall · F1 before vs after

Compliance Report

GDPR · HIPAA · ISO 8000 · EU AI Act mapping

Certificate

Dataset Reliability Certificate — downloadable PDF

Capabilities

Everything your data needs

◈

PII Detection Engine

Automatically identifies emails, phone numbers, credit card patterns, national IDs, and IP addresses. Flags GDPR risk per column.

◉

AI Dataset Story Narrator

Generates a plain-English narrative of your dataset's problems using only real scan results — no invented information.

▦

What-If Cleaning Simulator

Test cleaning strategies before applying them. Toggle outlier removal, fill methods, and deduplication — see score impact instantly.

△

Intersectional Bias Detection

Detects combined bias across multiple columns — e.g. Age + Gender together, not just each in isolation. Includes fairness metrics.

◫

Synthetic Data Generator

Creates privacy-safe artificial datasets that statistically mirror your real data. Includes differential privacy slider and quality validator.

⬡

Data Contract Generator

Auto-generates YAML schema contracts. Future uploads of the same dataset are validated against the contract automatically.

▤

Regulatory Compliance

Maps findings to GDPR, HIPAA, ISO 8000, and EU AI Act. Generates a compliance gap report with risk level and fix for each.

✦

Column Trust Scores

Every column receives a 0–100 reliability score. Filter to show only columns below your trust threshold — instantly prioritise fixes.

⊕

Colab Notebook Export

One-click Google Colab notebook with cleaned data, cleaning code, AutoML training, evaluation metrics, and all visualisations.

Step by Step

The complete workflow

From raw upload to certified clean dataset — every step documented.

STEP 01

Upload & Size Validation

CSV, Excel, JSON, TSV, Google Sheets. File integrity check. Row/column count validation with smart warnings.

STEP 02

Smart Goal Selection

Machine Learning (Classification · Regression · Clustering · Time-Series), Research, or Visualization. Strategy adapts entirely.

GOAL-AWARE

STEP 03

Custom Rules Engine

Define your own validation rules before scanning. Age 0–120, Revenue > 0, Email format. Rules saved for reuse.

NOVEL FEATURE

STEP 04

Full Dataset Scanner

Missing values, duplicates, type errors, outliers, format issues, bias, PII detection (Luhn check, email, phone, ID), data freshness.

STEP 05

Health Score + Trust Scores

Six-dimensional score (0–100) plus individual column trust ratings. Filter columns below your trust threshold.

PATENT-WORTHY

STEP 06

Risk & Severity Analyzer

Every issue classified as Low / Medium / High. Full risk table with affected column, row count, severity badge, and recommended fix.

STEP 07

Visualisation Dashboard

Missing value heatmap, outlier box plots, distribution histograms, categorical frequency, anomaly timeline for date columns.

STEP 08

Structure Map + Column Meanings

DOB → Date of Birth. ZIP → Location Code. Correlation heatmap. Foreign key detection. Causal dependency mapping.

NOVEL FEATURE

STEP 09

AI Story Narrator

Plain-English story of your dataset using only real scan results. No invented facts. Simple language for non-technical users.

AI POWERED

STEP 10

Smart Cleaning Suggestions

Goal-aware recommendations per issue. Fill with median vs remove rows vs regression prediction — pros and risks explained.

STEP 11

What-If Simulator

Toggle strategies before applying. Estimated health score, dataset size, and ML accuracy update instantly. Sampled for large datasets.

INTERACTIVE

STEP 12

Cleaning Preview Confirmation

User approves every sensitive action. Keep / Cap / Remove outliers. Zero row deletion without explicit confirmation.

STEP 13

Auto-Cleanse + Post-Clean Summary

Fill missing, remove duplicates, fix types, normalise dates, convert word numbers. Immediate before/after comparison card.

STEP 14

PII Anonymisation

Mask · Hash · Tokenise · Drop — per column. GDPR compliance status updated after each action.

PRIVACY

STEP 15

Advanced Bias Detection

Class imbalance, intersectional bias (Age + Gender combined), demographic parity score, equal opportunity score.

AI ETHICS

STEP 16

Synthetic Data Generator

Privacy-safe artificial dataset. User sets row count and differential privacy ε value. Quality validated via KS test and correlation comparison.

STEP 17

ML Performance Comparison

Accuracy, Precision, Recall, F1 — raw vs clean vs synthetic. Feature importance chart shows which columns matter most after cleaning.

STEP 18

Compliance Report

GDPR · HIPAA · ISO 8000 · EU AI Act. Each gap shows: Issue · Framework · Risk Level · Required Fix.

ENTERPRISE

STEP 19

Reliability Certificate + Data Dictionary

Downloadable PDF certificate. Full data dictionary with column meanings, trust scores, and cleaning actions taken.

STEP 20

Data Contract Generator

YAML schema generated from clean data. All future uploads validated automatically against this contract. Zero code needed.

PATENT-WORTHY

STEP 21

Export Cleaning Code

Full Python (pandas) + R (tidyverse) scripts matching every cleaning action applied. Ready to run immediately.

STEP 22

Colab Notebook + Session Summary

One-click Colab with AutoML. Final session summary screen with all 6 downloads. Option to re-upload and compare versions.

Your data,
verified.
Completely.

Upload Your Dataset

Dataset Report

Six-dimensional quality scoring

28-step intelligence pipeline

Everything your data needs

The complete workflow

Dataset Reliability Certificate

Built on proven technology

Your data deserves to be
verified.

Your data, verified. Completely.

Upload Your Dataset

Dataset Report

Six-dimensional quality scoring

28-step intelligence pipeline

Everything your data needs

The complete workflow

Dataset Reliability Certificate

Built on proven technology

Your data deserves to beverified.

Your data,
verified.
Completely.

Your data deserves to be
verified.