Skip to content
View kaviarasanmani's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report kaviarasanmani

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kaviarasanmani/README.md

Typing SVG


πŸ§ͺ I don't just test data pipelines β€” I build the tools that test them.

I'm a Senior SDET (SDET III) at UST specializing in Data Quality Engineering and ETL Automation Testing. With 4+ years of experience validating production-scale PySpark pipelines (10M+ records/day), I sit at the intersection of data engineering and QA β€” catching the bugs that hide inside your data, not just your code.

In 2026, I published ValidateX β€” a lightweight Python data quality validation framework β€” to PyPI. Because after years of writing the same validation boilerplate across projects, I decided to ship it as a library instead.

pip install validatex

πŸš€ Featured: ValidateX

A lightweight, production-ready data quality validation framework for Python Supports Pandas & PySpark β€’ 25+ built-in expectations β€’ Weighted quality scoring β€’ Modern HTML reports

PyPI Latest Version Build Status Code Coverage Tests License: MIT Python

import pandas as pd
import validatex as vx

suite = (
    vx.ExpectationSuite("production_data")
    .add("expect_column_to_not_be_null",          column="user_id")
    .add("expect_column_values_to_be_unique",      column="user_id")
    .add("expect_column_values_to_be_between",     column="age", min_value=0, max_value=150)
    .add("expect_column_values_to_match_regex",    column="email", regex=r"^[\w.]+@[\w]+\.\w+$")
)

result = vx.validate(df, suite)
print(result.summary())          # Data Quality Score: 97/100
result.to_html("report.html")    # Beautiful dark-theme HTML report

Why ValidateX?

ValidateX Great Expectations
Setup pip install β†’ validate in 5 lines Multi-step setup with contexts & stores
Quality Score βœ… Weighted 0–100 ❌
Severity Levels βœ… Critical / Warning / Info ❌
CI/CD CLI βœ… Built-in ❌
Learning Curve Minutes Hours to days

πŸ“¦ PyPI β€’ πŸ’» GitHub β€’ πŸ“– Docs


πŸ’Ό What I Do

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ETL Testing          β†’  Validate PySpark pipelines at scale β”‚
β”‚  Data Quality         β†’  Schema checks, SCD-2, drift detect  β”‚
β”‚  Test Automation      β†’  Selenium + Robot Framework + pytest β”‚
β”‚  Open Source          β†’  Building tools the data world needs β”‚
β”‚  CI/CD Integration    β†’  Jenkins, GitHub Actions, Airflow    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

By the numbers from my 4+ years in production:

  • πŸ”΄ 60% reduction in data quality issues through automated testing frameworks
  • ⚑ 40% reduction in manual reconciliation effort via Python automation
  • πŸ“Š 10M+ records/day validated across PySpark ETL pipelines
  • πŸ§ͺ 96% code coverage on ValidateX (66 tests passing)

πŸ› οΈ Tech Stack

Data & ETL

Python PySpark Pandas SQL Apache Airflow Azure Databricks Apache Iceberg

Testing & Automation

Robot Framework Selenium pytest Postman Jenkins

Cloud & Storage

Azure Data Lake MongoDB Git


πŸ“‚ Projects

πŸ§ͺ ValidateX β€” Published on PyPI

Open-source Python data quality validation framework. Pandas + PySpark support, 25+ expectations, severity scoring, HTML reports, CLI for CI/CD integration. pip install validatex

Python ETL pipeline for Indian stock market data β€” bulk ingestion via CSV/Excel, OHLCV schema normalization, API constraint handling, Streamlit control layer. A hands-on data engineering project focused on ingestion, transformation, and delivery.


πŸ… Certifications

  • πŸ† Databricks β€” Data Governance Fundamentals (Jan 2026)
  • πŸ† Databricks β€” Databricks Fundamentals (Nov 2025)
  • πŸ“œ Big Data Analytics with Hadoop & Apache Spark β€” LinkedIn Learning (Sep 2025)
  • πŸ“œ Selenium WebDriver with Python β€” Udemy (Apr 2025)
  • πŸ“œ Getting Started in Test Automation Engineering β€” LinkedIn Learning (Apr 2025)

✍️ Writing

I write about data engineering, ETL automation, and real-world pipeline challenges on Medium.

πŸ“ medium.com/@kavim1996


πŸ“Š GitHub Stats


🀝 Let's Connect


"Bad data is worse than no data β€” it gives you false confidence."
That's why I build systems that catch it before it reaches your dashboards.

@kaviarasanmani's activity is private