Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions dev/release/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Although some tasks can only be performed by a PMC member, many tasks can be per
| Task | Role Required |
|--------------------------------------------------------------| ------------- |
| Create PR against datafusion-site with updated documentation | None |
| Publish Python wheels to PyPI | PMC |

## Detailed Guide

Expand Down Expand Up @@ -281,6 +282,166 @@ dot -Tsvg dev/release/crate-deps.dot > dev/release/crate-deps.svg
(cd ballista-cli && cargo publish)
```

### Publish Python Wheels to PyPI

Only approved releases of the tarball should be published to PyPI, in order to
conform to Apache Software Foundation governance standards. The Python wheels
that get uploaded must be the same artifacts that the community voted on — they
are downloaded from the release candidate's CI run, not rebuilt.

#### Prerequisites

A DataFusion committer can publish the [`ballista` package on
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A DataFusion committer can publish the [`ballista` package on
A DataFusion PMC member can publish the [`ballista` package on

PyPI](https://pypi.org/project/ballista/) after an official project release has
been made. One-time setup:

- Create accounts on [pypi.org](https://pypi.org) and
[test.pypi.org](https://test.pypi.org) (separate accounts).
- Ask an existing maintainer of the `ballista` PyPI project — listed on the
project page — to add you as a maintainer. The request should be made on the
dev mailing list so it is publicly tracked.
- Generate project-scoped API tokens for both PyPI and TestPyPI.
- Configure `~/.pypirc`:

```ini
[distutils]
index-servers =
pypi
testpypi

[pypi]
username = __token__
password = pypi-...

[testpypi]
repository = https://test.pypi.org/legacy/
username = __token__
password = pypi-...
```
Copy link
Copy Markdown
Member

@martin-g martin-g Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Secure ~/.pypirc
chmod 600 ~/.pypirc


- Install `twine`:

```bash
pip install twine
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also says that requests is needed

```

#### Download the Voted-On Wheels

Once the vote passes and the final tag has been created from the RC commit,
download the same wheels that were voted on from the RC's CI run. Retagging the
RC commit does not trigger a fresh build, so the RC artifacts remain the
canonical source.

> **Artifact retention warning:** GitHub Actions artifacts default to 90-day
> retention. If the elapsed time from cutting the RC to publishing PyPI wheels
> exceeds that window, the wheels will have been deleted and are unrecoverable
> — you cannot publish the voted-on artifacts and must cut a new RC and revote.
> Plan the vote and the post-vote publish so the publish step happens
> comfortably inside the 90-day window. Check the run's `expires_at` on
> `https://github.com/apache/datafusion-ballista/actions` if in doubt.

```bash
export GH_TOKEN=... # GitHub PAT with read access to actions
mkdir ballista-pypi-<version> && cd ballista-pypi-<version>
python ../dev/release/download-python-wheels.py <version>-rc<N>
ls *.whl *.tar.gz # confirm filenames carry the right version
```

The merged artifact should contain one of each of the following platform wheels
(file naming uses [PEP 425](https://peps.python.org/pep-0425/) tags):

- `ballista-<version>-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl`
- `ballista-<version>-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl`
- `ballista-<version>-cp310-abi3-macosx_*_arm64.whl`
- `ballista-<version>-cp310-abi3-win_amd64.whl`
- `ballista-<version>.tar.gz` (sdist)

> **Known CI caveat:** the merged artifact currently contains the macOS arm64
> wheel twice (jobs `build-python-mac-win`'s macOS leg and `build-macos-x86_64`
> both run on `macos-latest`, which is now arm64) and **no** macOS x86_64 wheel.
> Keep one copy of the arm64 wheel and delete any duplicate before upload.
> Tracked in [#1608](https://github.com/apache/datafusion-ballista/issues/1608).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this note is needed here.
The issue will be resolved soon (before next release).


#### Validate the Artifacts

```bash
twine check *.whl *.tar.gz
```

The `download-python-wheels.py` script also writes `.asc` GPG signatures and
`.sha256` / `.sha512` checksum files alongside each artifact. Those are for ASF
SVN — PyPI rejects them. Pass explicit globs to `twine` so only the wheels and
sdist are considered.

#### TestPyPI Dry-Run

PyPI uploads are immutable: once a version is published it cannot be replaced
or re-uploaded, only yanked. A TestPyPI dry-run takes a few minutes and catches
the common ways a release goes wrong.

```bash
twine upload --repository testpypi *.whl *.tar.gz

python -m venv /tmp/ballista-pypi-smoke
source /tmp/ballista-pypi-smoke/bin/activate
pip install -i https://test.pypi.org/simple/ \
--extra-index-url https://pypi.org/simple/ \
ballista==<version>
python -c "from ballista import BallistaSessionContext; print('ok')"
deactivate
```

`--extra-index-url` is required because TestPyPI does not mirror dependencies
like `pyarrow` and `datafusion`.

#### Upload to PyPI

```bash
twine upload *.whl *.tar.gz
```

If the upload fails partway through, re-run with `--skip-existing` to retry only
the files that did not get through.

#### Verify

Confirm the new version appears at
`https://pypi.org/project/ballista/<version>/`. Then in another fresh
virtual environment:

```bash
python -m venv /tmp/ballista-pypi-verify
source /tmp/ballista-pypi-verify/bin/activate
pip install ballista==<version>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: Instead of using <version> I'd suggest to use ${BALLISTA_VERSION} that is defined earlier. This way the release manager could just copy/paste/run the code without any manual edits.

python -c "from ballista import BallistaSessionContext; print('ok')"
deactivate
```

#### Recovery

**`twine check` fails.** The artifacts shipped from CI are malformed (bad
metadata, missing `LICENSE.txt`, etc.). Do not proceed. Open an issue, fix in
`python/pyproject.toml` or the `generate-license` job, cut a new RC, re-vote.
Do not hand-edit wheels.

**TestPyPI smoke install or import fails.** Same recovery — the wheels are
broken; cut a new RC. The TestPyPI version stays published forever; you can
yank it with `twine yank --repository testpypi ballista <version>` so it does
not resolve, but the filename is permanently consumed on TestPyPI.

**PyPI upload fails partway.** Some wheels uploaded, others did not. Re-run
with `--skip-existing`:

```bash
twine upload --skip-existing *.whl *.tar.gz
```

If a *broken* file actually made it to PyPI, it cannot be replaced. `twine yank
ballista <version>` removes the version from `pip install ballista` resolution,
but the version number is permanently consumed. Recovery requires bumping to
`<version>.post1` and starting over from "Download the Voted-On Wheels" — which
in turn requires cutting a new RC, since post-releases must also be voted on.

### Publish Docker Images

Pushing a release tag causes Docker images to be published.
Expand Down
21 changes: 19 additions & 2 deletions dev/release/download-python-wheels.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,18 +53,35 @@ def main():
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {ghp_token}",
}
url = f"https://api.github.com/repos/apache/datafusion-ballista/actions/runs?branch={tag}"

# Resolve the tag to a commit SHA. Filtering runs by branch name does not
# work reliably: when the RC tag points at a commit that is also the head
# of a release branch (e.g. branch-51), GitHub associates the successful
# workflow run with the branch rather than the tag, and the branch query
# only returns the (possibly cancelled) tag-triggered run.
commit_url = f"https://api.github.com/repos/apache/datafusion-ballista/commits/{tag}"
resp = requests.get(commit_url, headers=headers)
resp.raise_for_status()
sha = resp.json()["sha"]
print(f"Tag {tag} resolves to commit {sha}")

url = f"https://api.github.com/repos/apache/datafusion-ballista/actions/runs?head_sha={sha}"
resp = requests.get(url, headers=headers)
resp.raise_for_status()

artifacts_url = None
for run in resp.json()["workflow_runs"]:
if run["name"] != "Python Release Build":
continue
if run.get("conclusion") != "success":
continue
artifacts_url = run["artifacts_url"]
break

if artifacts_url is None:
print("ERROR: Could not find python wheel binaries from Github Action run")
print(
f"ERROR: Could not find a successful 'Python Release Build' run for "
f"commit {sha}. Re-run the workflow or cut a new RC.")
sys.exit(1)
print(f"Found artifacts url: {artifacts_url}")

Expand Down
Loading