This repository contains the official implementation of the paper:
"Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting"
We introduce a unified benchmark framework to evaluate multiple graph unlearning techniques across diverse datasets — measuring efficiency, utility, and forgetting.
This benchmark provides:
- A standardized evaluation of graph unlearning methods.
- Comparisons on time, memory, accuracy, and forgetting behavior.
- Support for multiple datasets and GNN architectures.
- Evaluation across three core pillars:
| Pillar | What we measure |
|---|---|
| Efficiency | Runtime and peak GPU memory vs. retraining from scratch |
| Utility | Accuracy, per-node fidelity, logit-space L2 distance, weight-space distance |
| Forgetting | MIA AUROC, unlearning inversion attack, noisy-labeler attack |
- Python: 3.8.0
- CUDA: Ensure the CUDA version is compatible with your PyTorch installation.
git clone https://github.com/idea-iitd/Unlearning_Benchmark.git
cd Unlearning_Benchmarkpip install -e .Example for CUDA 12.1:
pip install torch==2.2.1 torchvision==0.17.1 torchaudio --index-url https://download.pytorch.org/whl/cu121Required Versions:
torch==2.2.1torchvision==0.17.1
Example for CUDA 12.x:
pip install cupy-cuda12xFor other CUDA versions, refer to the official CuPy Installation Guide.
pip install -r requirements.txtIf you encounter build errors, install the precompiled wheels from the PyTorch Geometric Installation Guide.
Example for CUDA 12.1:
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.2.1+cu121.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-2.2.1+cu121.html
pip install torch-geometricFor other CUDA versions, replace cu121 with your version (e.g., cu118).
Important: Always run
GOLDfirst for a given dataset + ratio combination.
GOLD creates the train/test split, unlearning index files, and the retrained reference
model that all evaluation compares against.
python GULib-master/main.py \
--dataset_name cora \
--base_model GCN \
--unlearning_methods GOLD \
--num_epochs 100 \
--batch_size 64 \
--unlearn_ratio 0.1 \
--num_runs 1 \
--cal_mem TrueTo unlearn a model, run unlearn_model.sh or use the following command:
python GULib-master/main.py \
--dataset_name cora \
--base_model GCN \
--unlearning_methods MEGU \
--attack False \
--num_epochs 100 \
--batch_size 64 \
--unlearn_ratio 0.1 \
--num_runs 1 \
--cal_mem TrueThis command will train, unlearn, and save the unlearned model.
| Argument | Description | Example |
|---|---|---|
--cuda <device> |
Specify GPU device to use | --cuda 0 |
--dataset_name <name> |
Graph dataset name | --dataset_name cora |
--base_model <model> |
Base GNN model architecture | GCN, GAT, GIN |
--unlearning_methods <method> |
Unlearning method | MEGU, GIF, GraphEraser, GUIDE, GNNDelete, IDEA, Projector, ScaleGUN, CGU, COGNAC, ETR, GOLD |
--unlearn_ratio <value> |
Fraction of data to unlearn | 0.1 |
--num_unlearned_nodes <N> |
Absolute count of nodes to unlearn | 271 |
--unlearn_task <task> |
Unlearning granularity | node, edge, feature |
--num_epochs <N> |
Number of training epochs | 100 |
--batch_size <N> |
Batch size | 64 |
--num_runs <N> |
Independent runs to average over | 5 |
--attack <True/False> |
Enable MIA during unlearning | False |
--attack_type <name> |
Forgetting attack for evaluation | MIattack, TrendAttack, MRattack |
--cal_mem <True/False> |
Record time and memory stats | True |
To record efficiency metrics (time and memory usage), pass --cal_mem True to main.py.
Results are stored in:
efficiency_stats.txt
Run utility_stats.sh or call evaluate_unlearning.py directly:
bash utility_stats.shpython GULib-master/evaluate_unlearning.py \
--dataset_name cora \
--base_model GCN \
--unlearning_methods MEGU \
--unlearn_ratio 0.1 \
--unlearn_task node \
--num_runs 1This computes:
- Accuracy
- Fidelity (per-node prediction agreement with GOLD)
- Logit L2 distance (output distribution similarity to GOLD)
- Weight-space distance — L2, Cosine, and Relative-L2 between the unlearned and GOLD model parameters.
Reported automatically for methods that can support it: MEGU, GIF, IDEA, COGNAC, ETR.
To evaluate forgetting performance, pass --attack_type to evaluate_unlearning.py:
python GULib-master/evaluate_unlearning.py \
--dataset_name cora \
--unlearning_methods MEGU \
--unlearn_ratio 0.1 \
--attack_type MIattackWhere --attack_type can be:
| Attack | What it tests |
|---|---|
MIattack |
Membership Inference — can an adversary tell if a node was in training? |
TrendAttack |
Inversion attack — can deleted edges be reconstructed from logits? |
MRattack |
Noisy-labeler — does the model assign high-confidence original labels to deleted nodes? |
An AUROC close to 0.5 indicates strong forgetting. Values above 0.5 indicate residual leakage.
Note: Utility results for GraphEraser and GUIDE are automatically stored during unlearning time in
GraphEraser_utility_stats.txtandGUIDE_utility_stats.txtfiles. And for getting forgetting results for them pass the--attack_typeargument tomain.pyinstead.
By default all methods use randomly sampled training nodes. To evaluate on a structured deletion strategy, generate the node set first, then run unlearning and evaluation as normal:
# Step 1 — overwrite the initial random node set with your chosen strategy
python GULib-master/generate_workload_sets.py \
--dataset_name cora --unlearn_ratio 0.1 \
--strategy high_freq # or: second_freq | low_degree | high_degree | random
# Step 2 — run unlearning (reads the node set as usual)
python GULib-master/main.py \
--dataset_name cora --base_model GCN \
--unlearning_methods MEGU --unlearn_ratio 0.1
# Step 3 — evaluate as usual
python GULib-master/evaluate_unlearning.py \
--dataset_name cora --unlearning_methods MEGU \
--unlearn_ratio 0.1| Dataset | Nodes | Edges | Type |
|---|---|---|---|
| Cora | 2,708 | 5,278 | Homophily |
| Citeseer | 3,327 | 4,732 | Homophily |
| Photo | 7,487 | 119,043 | Homophily |
| ogbn-arxiv | 169,343 | 1,166,243 | Homophily |
| Amazon-ratings | 24,492 | 93,050 | Heterophily |
| Roman-empire | 22,662 | 32,927 | Heterophily |
| 232,965 | 114,615,892 | Homophily (scalability) |
| Method | Paradigm | Model-agnostic | Cont. Training | Train. Mode | Guarantee |
|---|---|---|---|---|---|
| MEGU | Learning-based | ✓ | ✓ | Post-hoc | — |
| GIF | Influence function | ✓ | — | Post-hoc | — |
| IDEA | IF + certified | ✓ | — | Post-hoc | — |
| GST | Influence function | ✓ | ✓ | Post-hoc | — |
| ETR | IF + Learning | ✓ | ✓ | Post-hoc | — |
| COGNAC | Corrective | ✓ | ✓ | Post-hoc | — |
| GNNDelete | Learning-based | ✓ | — | Post-hoc | — |
| GraphEraser | SISA / Partition | ✓ | ✓ | Train-time | ✓ |
| GUIDE | SISA / Partition | ✓ | ✓ | Train-time | ✓ |
| Projector | Projection | — | — | Train-time | ✓ |
| ScaleGUN | Certified (linear GNN, binary) | — | ✓ | Train-time | ✓ |
| CGU | Certified (linear GNN, binary) | — | ✓ | Train-time | ✓ |
Unlearning_Benchmark/
├── GULib-master/
│ ├── config.py # Derived file paths
│ ├── evaluate_unlearning.py # Compute all metrics (utility and forgetting)
│ ├── generate_workload_sets.py # generates node sets for robustness analysis
│ ├── main.py # Train + unlearn
│ ├── parameter_parser.py # All CLI arguments and defaults
│ ├── unlearning_manager.py # method-name → class dispatch
│ ├── unlearning/unlearning_methods/ # One subfolder per method
│ │ ├── MEGU/megu.py
│ │ ├── GIF/gif.py
│ │ └── …
│ ├── pipeline/ # Base pipeline classes
│ │ ├── Learning_based_pipeline.py
│ │ ├── IF_based_pipeline.py
│ │ └── Shard_based_pipeline.py
│ ├── task/ # Trainer classes
│ ├── attack/
│ │ ├── MIA_attack.py # MIA AUROC scorer (evaluation)
│ │ ├── shadow_model.py # Shadow/attack model classes (training)
│ │ ├── Trend_attack.py
│ │ └── Membership_Recall_Attack.py
│ ├── dataset/ # Dataset loaders and splits
│ ├── model/ # GNN architectures and model zoo
│ └── utils/ # Logging and data utilities
├── unlearn_model.sh # Run training + unlearning
├── utility_stats.sh # Run evaluation metrics
└── requirements.txt
See CONTRIBUTIONS.md for step-by-step instructions on adding new unlearning methods, datasets, and attack families.
@inproceedings{jain2026is,
title = {Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting},
author = {Samyak Jain and Ronak Kalvani and Sainyam Galhotra and Sayan Ranu},
booktitle = {The Fourteenth International Conference on Learning Representations},
year = {2026},
url = {https://openreview.net/forum?id=gSPkuTTWgU}
}For questions, issues, or contributions, please open a GitHub issue or contact the authors.