Auditing ML Models for Fairness

Reproducible pipelines for baseline and fairness‑aware training on the Adult Census Income dataset.

This code accompanies the paper “Auditing Machine‑Learning Models for Fairness: Theory and Case Study, a Reproducible Pipeline from Metrics to Mitigation.”
Everything below is tested on Docker 20.10+.

Repository layout

.
├── Biased_Models/                   # accuracy‑only benchmark
│   ├── ML_Biased.py
│   ├── audit_adult.py
│   ├── Dockerfile
│   └── requirements.txt
└── Fair_Machine_Learning_Models/    # same pipeline + ExponentiatedGradient
    ├── Fair_machine_learning.py
    ├── audit.py
    ├── Dockerfile
    └── requirements.txt

Each directory is completely self‑contained: its own Docker image, requirements, and outputs (although the original outputs were preserved, the .joblib files were eliminated due to their size).

1 · Baseline (accuracy‑only) run

cd "Biased Models"

# build
docker build -t adult_bias_benchmark .

# run (results will be written inside audit/, models/ and predictions/)
docker run --rm -v "$(pwd)":/app adult_bias_benchmark

What you get

Biased_Models/
└─ audit/
   ├─ metrics_<best>.csv
   ├─ parity_gaps_<best>.png
   ├─ roc_age_<best>.png
   ├─ … (other plots)
   └─ audit_provenance.json
models/
└─ model_<all_models>.joblib
predictions/
└─ preds_<all_models>.csv

The benchmark tunes 14 classifiers with nested Optuna TPE (5×3 CV) and audits the best performer.

2 · Fairness‑aware run (Equalised Odds constraint)

cd "Fair Machine Learning Models"

# build
docker build -t adult_bias_fair .

# run
docker run --rm -v "$(pwd)":/app adult_bias_fair

Outputs

Fair_Machine_Learning_Models/
└─ audit/
   ├─ metrics_XGB_fair.csv
   ├─ parity_gaps_XGB_fair.png
   ├─ roc_age_band_xgb_fair.png
   ├─ … (other plots)
   └─ audit_provenance.json
models/
└─ model_XGB_fair.joblib
predictions/
└─ preds_XGB.csv

This pipeline takes the best hyper‑parameters found in the baseline, wraps the model in Fairlearn’s Exponentiated Gradient with an Equalised‑Odds constraint (ε = 0.005), and re‑audits the result.

3 · Compare the two runs

Baseline <best_model>: higher accuracy / larger DP & EO gaps
Fair XGB: ≈ lower accuracy / smaller fairness gaps

Data

If adult.csv is absent the scripts automatically download it via fairlearn.datasets.fetch_adult() and cache a local copy.

License

MIT—see LICENSE.

Citation

Please cite the forthcoming CORE 2025 paper (DOI to appear):

@inproceedings{rodriguez2025fairaudit,
  title     = {Auditing Machine‑Learning Models for Fairness:
               Theory and Case Study, a Reproducible Pipeline from Metrics to Mitigation},
  author    = {Rodríguez Rodríguez, Noé Oswaldo and Rosas Alatriste, Carolina
               and Alarcón Paredes, Antonio and Yáñez Márquez, Cornelio
               and Recio García, Juan Antonio},
  booktitle = {Proceedings of the CORE 2025 Conference},
  year      = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Biased Models		Biased Models
Fair machine learning models		Fair machine learning models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auditing ML Models for Fairness

Repository layout

1 · Baseline (accuracy‑only) run

What you get

2 · Fairness‑aware run (Equalised Odds constraint)

Outputs

3 · Compare the two runs

Data

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Auditing ML Models for Fairness

Repository layout

1 · Baseline (accuracy‑only) run

What you get

2 · Fairness‑aware run (Equalised Odds constraint)

Outputs

3 · Compare the two runs

Data

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1 · Baseline (accuracy‑only) run

2 · Fairness‑aware run (Equalised Odds constraint)

3 · Compare the two runs

Packages