Skip to content

H0wMind/pyspark-etl-boilerplate

Repository files navigation

simplepyetl

Boilerplate ETL project using PySpark

Installation

pip install -e .

Development Setup

  1. Create virtual environment
  2. Install development dependencies:
    pip install -r requirements-dev.txt
  3. Install package in editable mode:
    pip install -e .

Usage

from simplepyetl import split_first_last_name

Testing

Run tests:

python -m unittest discover -s tests

GitHub Actions

Automated unit testing runs on push and pull requests via .github/workflows/test_unittest.yaml.

About

Boilerplate PySpark ETL project with a modular structure for building, testing, and deploying scalable data pipelines.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages