Skip to content

HyperTools 2.0: modernized toolbox, interactive backend, soft clustering, comprehensive visual verification#270

Open
jeremymanning wants to merge 48 commits into
masterfrom
dev-2.0
Open

HyperTools 2.0: modernized toolbox, interactive backend, soft clustering, comprehensive visual verification#270
jeremymanning wants to merge 48 commits into
masterfrom
dev-2.0

Conversation

@jeremymanning

@jeremymanning jeremymanning commented Jul 2, 2026

Copy link
Copy Markdown
Member

HyperTools 2.0: modernized toolbox with interactive backend, soft clustering, and comprehensive visual verification

This PR modernizes hypertools end-to-end while preserving the public API, integrating the best ideas from the earlier refactor attempt (jeremymanning/hypertools dev branch + backend experiments) into the current, tested codebase. Full design rationale: notes/hypertools_2.0_roadmap.md.

Highlights

Interactive plotly backend with visual parity (backend='auto' | 'matplotlib' | 'plotly')

  • matplotlib remains the default renderer everywhere; backend='auto' (the default) switches to plotly only on Google Colab and Kaggle. Existing local/CI workflows see zero change.
  • The two backends produce visually matched output: identical colors, line/marker styles and sizes (pt→px calibrated), format strings (markers + dash styles), the signature wireframe cube / square frame, hidden axes, and matched camera angles. Evidence: 22 side-by-side montages in docs/images/v2.0-parity/ — matplotlib left, plotly right, same call.
  • Animations on both backends (sliding window + camera spin with play/pause controls on plotly).

Mixture-model ("soft") clustering + robust coloring

  • hyp.cluster(x, cluster='GaussianMixture' | 'BayesianGaussianMixture' | 'LatentDirichletAllocation' | 'NMF') returns (n_samples, n_components) membership proportions (rows sum to 1). Hard clustering unchanged.
  • hyp.plot(x, cluster='GaussianMixture', ...) colors observations by proportion-weighted blends of component colors.
  • hue accepts categorical labels, continuous values, or any 2D matrix via the new mat2colors.

Multicolored lines

  • Continuous or matrix-valued hue + a line format string colors each trajectory continuously along its length on both backends (matplotlib Line3DCollection/LineCollection; plotly per-point line colors in 3D, segment traces in 2D).

Nested-list input with multilevel styling

  • hyp.plot([[a, b], [c]]) colors datasets by outermost group; deeper nesting renders thinner + fainter. Text corpora keep existing behavior.

hyp.apply_model: the stack/unstack core

  • Datasets are stacked, the model fits once across all of them, and results unstack to the input structure — what makes embeddings/labels comparable across datasets. Model specs: registry name / dict with params / sklearn-style instance / pipeline list. mode='auto'|fit_transform|fit_predict|predict_proba, return_model=True for held-out reuse, stack=False for per-dataset fits. Explicit whitelist registry (no eval).

Retired legacy arguments (long-deprecated)

  • plot(group=...)hue; plot(model=/model_params=)reduce; reduce(model=/model_params=/normalize=/align=); align(method=/normalize=/ndims=) and the ambiguous align=True (now a clear ValueError with a migration hint); cluster(ndims=). Saved geos from hypertools 0.x still load — retired kwargs are translated (grouphue) or dropped with a warning on replay.

Bug fixes

  • importing hypertools updates matplotlib rcParams #259 fixed: plotting no longer mutates global matplotlib rcParams (verified by before/after diff; regression-tested).
  • Problem with creating multiple hypertools figures in a for loop #264 fixed: plots in loops no longer repeat the first plot — root cause was the memoize cache, whose str()-keys truncated numpy arrays so new data collided with stale entries. Cache removed; regression test reproduces the reported loop scenario.
  • Plotting animations in Jupyter does not seem to be compatible with current Numpy (version 2 or greater) #265 fixed: animate=True under numpy≥2 — regression test reproduces the exact array from the issue report; also fixed the Colab/ipympl backend crash (matplotlib ≥3.9 raises ValueError, not ImportError, for broken module:// backends) and a broken-Tcl/Tk fallback.
  • Long-standing is_line() bug: '' in Line2D.markers made it return False for every format string, silently disabling hypertools' smooth line interpolation on modern matplotlib. Fixed (with linestyle-aware parsing), restoring the intended rendering; per-point labels are now re-mapped onto interpolated trajectories.
  • Fixed redundant format_data/PPCA pass per plot.

Performance, packaging, docs

  • import hypertools: 5.1s → 1.4s (lazy umap/seaborn/scipy.interpolate).
  • PEP 621 pyproject.toml (2.0.0.dev0, Python 3.10–3.13); setup.py/requirements.txt/MANIFEST.in/.travis.yml removed; extras [interactive], [dev]; CI matrix 3 OS × py3.10–3.13 with bumped actions + screenshot artifacts; readthedocs on py3.11; external hdbscan → sklearn's built-in; unmaintained pca-magic dropped.
  • Docs updated throughout: README "What's new in 2.0"; 5 new gallery examples (interactive backend, mixture models, multicolored lines, nested lists, apply_model) — full sphinx site + gallery builds cleanly; apply_model added to the API reference; docstrings updated for all changed signatures.

Evidence: every function verified on both backends

Sample parity montages (matplotlib | plotly — same call)

case montage
trajectories
multicolored line
mixture blending
dashed lines
nested multilevel

Breaking changes

  • Python ≥3.10; dependency floors raised (numpy≥2, pandas≥2.2, sklearn≥1.4, matplotlib≥3.8).
  • Retired arguments listed above raise TypeError/ValueError with migration hints (old saved geos are translated on replay).
  • The buggy result cache is gone (recompute instead of risking wrong cached results).

⚠️ Do not merge without @jeremymanning's explicit sign-off.

🤖 Generated with Claude Code

jeremymanning and others added 22 commits July 1, 2026 23:57
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…audit

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erator, dev notebook

- scripts/screenshot_harness.py: headless PNG capture per function/use-case
- scripts/generate_baseline_screenshots.py: 13 baseline cases, all passing on v0.8.2
- dev/hypertools_2.0_dev.ipynb: interactive test matrix, one section per public function
- Roadmap updated with design decisions mined from fork issue tracker (incl. comments)
- tests/screenshots/ gitignored (reviewed locally / CI artifacts, not committed)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…mixture models to first-class 2.0 features; record approved backend='auto' policy

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…models, robust coloring

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lazy heavy imports

- Migrate to PEP 621 pyproject.toml (v2.0.0.dev0, py3.10+); delete setup.py,
  requirements.txt, MANIFEST.in, stale .travis.yml
- CI: py3.10-3.13 matrix, setup-python@v5, cache@v4, codecov@v4, screenshot
  artifact upload; readthedocs python 3.9->3.11
- Remove memoize entirely (user requirement): str()-keyed cache truncated
  numpy arrays -> cache collisions returned wrong results (fork issue #3)
- Lazy-import umap, seaborn, scipy.interpolate: import hypertools 5.1s -> 1.46s
- 136/136 tests pass

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…re-formatting, scope plot styling (fixes #259)

- Replace external hdbscan package with sklearn.cluster.HDBSCAN (always
  available); drop the SyntaxWarning filter that existed only for it
- plot(): pass format_data=False to the post-analyze reduction (data was
  already formatted; avoids a redundant format_data/PPCA pass per plot)
- plot(): apply seaborn palette/style inside plt.rc_context() so plotting no
  longer permanently mutates matplotlib rcParams (GH #259) - verified with a
  real before/after rcParams diff
- 136/136 tests pass

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lors)

- cluster() supports GaussianMixture, BayesianGaussianMixture, LDA, NMF and
  returns (n_samples, n_components) membership proportions (rows sum to 1);
  hard-clustering behavior unchanged
- New hypertools/tools/colors.py: mat2colors maps categorical labels,
  continuous 1D values, or 2D matrices (soft assignments / arbitrary numeric
  matrices) to RGB; colors2groups quantizes per-point colors into traces for
  the matplotlib renderer
- plot() accepts cluster='GaussianMixture' etc. (points colored by
  proportion-weighted blends) and matrix-valued hue
- 145/145 tests pass (9 new: real GaussianMixture/BGM/LDA/NMF calls,
  color-blend math, end-to-end mixture plot)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- plot([[a, b], [c]]) flattens arbitrarily nested dataset lists, coloring
  every leaf by its outermost group and rendering deeper leaves thinner and
  fainter (summary -> detail, per fork design issues #14/#16)
- Nested string lists (text corpora) are explicitly excluded and keep their
  existing text-pipeline behavior
- 156/156 tests pass (6 new, incl. rendered-line color/width assertions)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- New hypertools/plot/interactive.py: plotly renderer mirroring _draw's
  contract (2D/3D traces, fmt-string mode mapping, per-trace colors/labels,
  hypertools no-ticks aesthetic, matplotlib elev/azim -> plotly camera)
- Animations: sliding-window frames (animate=True) and camera spin
  (animate='spin') with play/pause controls
- hyp.plot(..., backend='auto'|'matplotlib'|'plotly'): auto uses plotly ONLY
  on Colab/Kaggle (approved policy); matplotlib default everywhere else
- Screenshot harness exports plotly figures via kaleido
- 169/169 tests pass (13 new: policy resolution incl. Colab/Kaggle markers,
  fmt mapping, camera math, end-to-end plotly figure/animation assertions)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ok, backend fix, README 2.0 docs

- scripts/generate_verification_screenshots.py: 44/44 cases pass covering
  every public function (plot/reduce/align/normalize/cluster/analyze/
  describe/format_data/load/text) on both backends; INDEX.md manifest;
  curated copy committed to docs/images/v2.0-verification/ for PR evidence
- dev notebook executed end-to-end with 0 errors via nbclient
  (dev/hypertools_2.0_dev_executed.ipynb); notebook cells updated to
  exercise implemented 2.0 APIs
- backend.py: catch ValueError from mpl.use() -- matplotlib >=3.9 raises it
  (not ImportError) for missing ipympl; likely root cause of Colab
  animate=True failures (#235)
- README: What's new in 2.0 + modernized requirements; ipykernel in [dev]
- Full suite re-verified: 169/169 tests, 13/13 baselines, import 1.5-1.7s

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… time

GitHub's windows/py3.13 runners ship a broken Tcl/Tk: TkAgg imports fine
(so backend probing selects it) but window creation raises _tkinter.TclError.
manage_backend now retries the plot once on the original backend after an
interactive-backend TclError instead of crashing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- plotly renderer now reproduces the matplotlib aesthetic exactly: black
  wireframe cube (3D) / square frame (2D), hidden axes, unit-cube range,
  matched camera (elev/azim), pt->px line/marker sizing, full fmt-string
  support (marker symbols + dash styles, with 3D symbol fallbacks)
- MULTICOLORED LINES: continuous or matrix hue + line fmt colors each
  trajectory continuously along its length (matplotlib Line3DCollection /
  LineCollection; plotly per-point line colors in 3D, segment traces in 2D)
- Fix long-standing is_line() bug: '' in Line2D.markers made it return
  False for every fmt string, silently disabling line interpolation on
  modern matplotlib; also parse linestyles before marker chars ('-.')
- Re-mapped per-point labels onto interpolated trajectories (fixes latent
  IndexError that interpolation re-enablement exposed)
- Parity montage generator (scripts/generate_parity_screenshots.py):
  matplotlib|plotly side-by-side for 22 identical calls
- 173/173 tests pass

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Removed (previously deprecated, now retired for 2.0): plot's group/model/
  model_params; reduce's model/model_params/normalize/align; align's
  method/normalize/ndims and the ambiguous align=True form (now a clear
  ValueError with migration hint); cluster's ndims
- DataGeometry.plot translates/drops retired kwargs when replaying geos
  saved by hypertools < 2.0 (group -> hue), so old files still load
- New hyp.apply_model: the stack/unstack core from the revamp design --
  one model fit across stacked datasets then unstacked to input structure
  (stack=False for per-dataset fits); model specs as registry name / dict /
  sklearn instance / pipeline list; mode auto|fit_transform|fit_predict|
  predict_proba; return_model for reuse on held-out data; explicit
  whitelist registry (no eval)
- 185/185 tests pass (12 new apply_model tests)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s for all 2.0 features

- docs/images/v2.0-parity/: 22 side-by-side (matplotlib | plotly) montages
  of identical calls -- line/marker styles, dashes, sizing, colors, hue
  variants, clustering, mixtures, nested lists, multicolored lines
- docs/images/v2.0-verification/: refreshed 75-case matrix (was 44) now
  covering every plotting feature on BOTH backends, incl. multicolored
  lines, mixture models, nested lists, marker/line styles, animations,
  and apply_model
- 5 new gallery examples (interactive backend, mixture models,
  multicolored lines, nested lists, apply_model), all executing cleanly;
  gallery rebuilt; apply_model added to the API reference
- dev notebook updated for all implemented 2.0 features and re-executed
  end-to-end with 0 errors
- README documents multicolored lines, apply_model, backend parity, and
  the retired legacy arguments
- 185/185 tests pass; import 1.41s

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jeremymanning

Copy link
Copy Markdown
Member Author

All review items addressed ✅

Every item from the review is now implemented, tested, and screenshotted (PR body updated with full details). Summary of what changed since the initial submission:

1. Backend visual parity

The plotly renderer was rewritten to reproduce the matplotlib output exactly: same wireframe cube / square framing, hidden axes, matched camera (elev/azim), pt→px-calibrated line widths and marker sizes, full format-string support (marker symbols + dash styles), and identical palette assignment. Evidence: 22 side-by-side montages (matplotlib left | plotly right, same call) in docs/images/v2.0-parity/ (manifest).

Multicolored line, same call on both backends:

2. Complete feature screenshot coverage

The verification matrix grew from 44 → 75 cases, all passing (manifest): clustering (hard + all four mixture models), multilevel/nested lists, multicolored lines (new feature — continuous per-segment coloring along trajectories, both backends), matrix/continuous/categorical hue, marker + line styles, animations, apply_model, and every other public function — each on both backends.

3. Formerly deferred items — all now in this PR

Bonus fix found while restoring parity

is_line() had returned False for every format string on modern matplotlib ('' in Line2D.markers is a substring of everything), silently disabling hypertools' smooth line interpolation. Fixed — line plots are smooth again on both backends.

Final numbers: 185 tests passing · 75/75 verification cases · 22/22 parity montages · dev notebook executes with 0 errors · docs build clean · CI matrix green.

🤖 Generated with Claude Code

jeremymanning and others added 5 commits July 2, 2026 08:22
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ments, plotly gallery

- Animation EXPORT on both backends, format by extension: .gif (Pillow),
  .png/.apng (animated PNG), .mp4/.mov/.avi (ffmpeg). plotly animations
  render each frame via kaleido then assemble; exported frames no longer
  include the play/pause controls; frame counts scale with duration.
  7 new tests save real files and verify frame counts. Sample GIFs from
  BOTH backends committed to docs/images/v2.0-animations/.
- Mixture demos now use OVERLAPPING clusters (1.5 sd apart) so multi-class
  membership is visible as blended colors (examples, screenshots, parity,
  notebook); new test asserts a substantial fraction of genuinely mixed
  assignments.
- Backend parity refinements: centered black 12pt title (matching
  matplotlib), default 640x480 canvas, 2D frame fills the canvas like
  matplotlib (no forced square), 3D box uses matplotlib's 4:4:3 aspect,
  camera distance tuned so cube sizes match (r=1.95).
- Sphinx gallery renders plotly figures (plotly_sg_scraper + kaleido);
  new animate_plotly example with an animated GIF thumbnail wired into
  post_build; interactive-backend example shows the plotly figure inline.
- Dev notebook displays animations inline (to_jshtml + plotly frames) and
  demonstrates gif export; re-executed end-to-end with 0 errors.
- Evidence regenerated: 22/22 parity montages, 75/75 verification cases.
- 192/192 tests pass (185 + 7 animation-export)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Root cause of the macos/py3.11 CI failure: Google Drive answered a dataset
request with an HTML rate-limit page (200 status), which load() cached as
the dataset -- poisoning every subsequent text-data test on that runner
with UnpicklingError.

- _download_example_data: raise_for_status; detect HTML error pages before
  caching (all example datasets are pickles, which never start with '<')
- _load_example_data: on a corrupt cache, delete it and retry the download
  once before failing; never leave a poisoned cache behind
- Regression test poisons the real cache with the actual Drive error page
  and verifies recovery (or clean failure with the cache removed)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… CI jobs

- _download_example_data retries up to 4 times (2s/6s/18s backoff) when the
  host rate-limits, instead of failing on the first error page
- CI caches ~/hypertools_data (immutable datasets, one cross-OS entry) so
  24 concurrent jobs stop re-downloading the same files from Google Drive
  every run -- the root cause of the intermittent text-test failures

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jeremymanning

Copy link
Copy Markdown
Member Author

Round-2 review items addressed ✅

1. Mixture demos now show true multi-class membership

All mixture-model demos (examples, screenshots, parity montages, dev notebook) use overlapping clusters (1.5 sd apart), so points in the overlap regions have genuinely mixed memberships and render with blended, intermediate colors. A new test asserts that a substantial fraction of points have soft (< 0.9 max-proportion) assignments.

2. Axis sizing/ratio and title placement now match

  • Title: centered, black, 12pt on both backends (plotly previously rendered it off-center, blue-gray, oversized).
  • 3D: plotly now uses matplotlib's default 4:4:3 box aspect and a matched camera distance, so the cube's shape and on-canvas size agree.
  • 2D: the frame fills the canvas exactly like matplotlib (no forced square).
  • Default canvas is 640×480 on both (matplotlib's default figsize).

All 22 parity montages regenerated: docs/images/v2.0-parity/.

3. Animation works and exports to gif / animated png / mp4 — both backends

hyp.plot(..., animate=..., save_path='file.gif' | '.png' | '.mp4') — the extension picks the format. matplotlib uses Pillow (gif/APNG) or ffmpeg (video); the plotly backend renders every frame via kaleido and assembles them (play/pause controls are excluded from exports). 7 new tests save real files and verify multi-frame output. Committed samples (INDEX):

matplotlib plotly

The dev notebook now displays animations inline (to_jshtml for matplotlib; interactive frames for plotly) and demonstrates gif export — re-executed end-to-end with 0 errors.

4. Sphinx gallery renders plotly output (including animation)

docs/conf.py now uses the plotly sphinx-gallery scraper, so plotly figures produced by examples render into the gallery (verified in the rebuilt HTML: the interactive-backend page shows the plotly figure). A new animate_plotly example demonstrates plotly animation + export, with an animated GIF thumbnail wired into the existing post-build thumbnail mechanism.

Bonus: dataset-download hardening (found via a CI failure during this round)

One macOS CI job failed because Google Drive rate-limited a dataset download and returned an HTML error page with a 200 status, which load() cached as the dataset — poisoning every subsequent text-data test on that runner. load() now validates downloads (rejects HTML error pages), retries with backoff when rate-limited, and heals corrupt caches instead of leaving a poisoned file behind — regression-tested against the real failure mode. CI additionally shares one cross-OS cache of the example datasets so 24 concurrent jobs no longer hammer the download host every run.

Updated numbers: 193 tests passing · 75/75 verification cases · 22/22 parity montages · 4 committed animation exports · docs + gallery build clean · CI green.

🤖 Generated with Claude Code

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
jeremymanning and others added 3 commits July 2, 2026 12:36
…multi-panel + cache verification

- SVG export on both backends: static (.svg) plus ANIMATED vector SVG via
  SMIL (frames stitched with discrete display switching; verified frame
  advance in headless Chrome by scrubbing setCurrentTime). matplotlib
  frames captured through a public AbstractMovieWriter subclass with
  frame subsampling (<=60 frames)
- plotly window animations now rotate the camera while the window
  advances, matching matplotlib's behavior
- plotly titles centered over the plot area (xref='paper') with a
  matplotlib-matched font stack
- hyperalign: n_iter argument (default 10) iteratively re-estimates the
  common template; dict form no longer returns None; removed leftover
  'method' reference that raised NameError for unknown align strings
- shapes zoo: bunny/cube/dragon/sphere/teapot/vase/biplane + datasaurus
  registered with their Dropbox sources (direct-URL download support in
  the loader; tolerant unpickler for dill/legacy-pandas formats; dill
  added as a dependency). 'egyption_mask' excluded: upstream file is an
  empty (0,3) array
- Multi-panel figures verified (hyp.plot(..., ax=...) into user subplot
  grids, 3D + 2D panels)
- Re-download hygiene verified: repeated loads leave the cache byte-stable
  (no duplication / storage leak)
- Reconstructed the classic readthedocs hyperaligned-weights animation
  (docs/images/v2.0-animations/weights_hyperaligned.gif)
- 9 new tests (tests/test_round3.py), all real calls

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… regen

- Docs: modern pydata-sphinx-theme (replacing sphinx_bootstrap_theme);
  full site + gallery build verified, screenshots committed to
  docs/images/v2.0-theme/. nbsphinx_execute='never': tutorial notebooks
  ship pre-executed, and 'auto' was re-executing every gallery notebook
  (doubling build time and hanging on plotly exports in the nbsphinx
  kernel)
- Fixed zoom: Axes3D.dist was removed in matplotlib >= 3.8, silently
  disabling animation zoom; replaced with set_box_aspect(zoom=...) using
  the exact legacy scale mapping (10 / (9 - zoom))
- Reconstructed the classic readthedocs hyperaligned-weights animation
  (docs/images/v2.0-animations/weights_hyperaligned.gif): 36 subjects,
  align='hyper', smooth interpolated trajectories, working zoom
- Modern demos: gallery examples plot_shapes_zoo + plot_datasaurus;
  executed tutorial notebooks hugging_face_embeddings (sentence-
  transformers + HF ag_news, mixture soft clustering, UMAP, animated spin
  gif) and modern_sklearn_dynamics (HDBSCAN, GaussianMixture, Lorenz
  attractor multicolored line, animated gif); registered in the tutorials
  toctree
- Animation evidence regenerated with the rotation + zoom fixes; parity
  montages regenerated (22/22) with the title-font change
- 202/202 tests pass

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jeremymanning

Copy link
Copy Markdown
Member Author

Round-3 review items — all 10 addressed ✅

1. SVG export: static AND animated, both backends

save_path='plot.svg' works everywhere. Animated exports produce a single self-contained SMIL-animated vector SVG (frames switched via discrete <animate>, looping, no JavaScript). Verified in a real browser: scrubbing the SVG timeline with setCurrentTime() in headless Chrome renders different frames on both backends. 4 new tests.

2. plotly window animation now rotates

animate=True on the plotly backend rotates the camera while the window advances, exactly like matplotlib. Re-exported sample: plotly window

3. Titles match

plotly titles are centered over the plot area (xref='paper'), black, 12pt, with a matplotlib-matched font stack. Parity montages regenerated (22/22): docs/images/v2.0-parity/.

4. Multi-panel figures verified

hyp.plot(..., ax=<subplot>) composes into user figure grids (3D + 2D panels mixed); when embedding, hypertools respects the caller's color cycle. Tested + screenshotted.

5. Hyperalignment n_iter

hyp.align(data, align='hyper', n_iter=10) — the common template is iteratively re-estimated (default 10; also settable via the dict form). Found and fixed two latent bugs in the same block: the dict form silently returned None, and a leftover method reference raised NameError.

6. Classic readthedocs figure reconstructed

hyp.plot(weights, align='hyper', animate=True, zoom=2.5) reproduces the hyperaligned-weights animation:

weights

Getting this right exposed a real bug: Axes3D.dist was removed in matplotlib ≥3.8, so animation zoom had been silently doing nothing — now implemented via set_box_aspect(zoom=...) with the exact legacy scale mapping.

7. Modern sphinx theme

Docs now use pydata-sphinx-theme (numpy/pandas-style), full site + gallery compile verified; screenshots committed to docs/images/v2.0-theme/. Also fixed a docs-build hang: nbsphinx was re-executing every gallery notebook (nbsphinx_execute='never'; tutorials ship pre-executed).

8. Shapes zoo datasets

hyp.load('bunny' | 'cube' | 'dragon' | 'sphere' | 'teapot' | 'vase' | 'biplane' | 'datasaurus') — registered with their Dropbox sources (direct-URL download support; tolerant unpickler for the dill/legacy-pandas formats these were saved in; dill added as a dependency). All verified via real downloads. ⚠️ egyption_mask is intentionally excluded: the source file (locally and at the Dropbox link) is an empty (0, 3) array — flagging for you to re-export it.

9. No re-download copy leak

Verified + regression-tested: repeated hyp.load() calls leave the cache byte-identical (no duplicate files, no re-download, mtime unchanged).

10. Modern demos

  • Gallery: plot_shapes_zoo (2×2 multi-panel point clouds) and plot_datasaurus (identical stats, different shapes).
  • Tutorial notebooks (executed, 0 errors, in the docs toctree): Visualizing Hugging Face text embeddings — sentence-transformers + HF ag_news, category coloring, GaussianMixture soft clustering, UMAP, animated spin export; Modern scikit-learn models and dynamical systems — HDBSCAN, mixture blending, and a Lorenz-attractor multicolored-line animation.

Updated numbers: 202 tests passing · 75/75 verification cases · 22/22 parity montages · 5 committed animation exports · full docs + gallery build clean on the new theme · CI green.

🤖 Generated with Claude Code

jeremymanning and others added 8 commits July 2, 2026 13:41
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ality fixes, embedding demos

- hyperalign now REPEATEDLY applies the full procedure (n_iter passes,
  default 10): each pass's aligned output feeds the next, compounding
  convergence (mean distance to group mean 0.43 -> 0.004 over 10 passes
  on rotated copies). The classic readthedocs weights animation is
  regenerated with the corrected pipeline (normalize='across',
  align='hyper', zoom=3.5, rotations=1, frame_rate=50, linewidth=3)
- NEW animate='serial' mode (both backends): datasets appear one at a
  time in list order, each growing point-by-point while earlier ones stay
  fixed, never connected -- built for conversation-turn visualizations;
  tests assert sequential reveal on both backends
- Animation quality: full-canvas animated axes + skip tight_layout fixes
  cube/data clipping at rotation angles (border-pixel regression test);
  linewidth is now a plot() argument and animations no longer hardcode
  linewidth=1; markersize is now a plot() argument
- EXACT per-point colors for markers: matrix hue / mixture-model scatter
  renders true per-observation blends via scatter (was: quantized color
  groups); plotly path already carried per-point colors
- Shapes morph: 3510 frames @ 30fps (117s, 13 rotations), committed as
  mp4 + 20s preview gif
- Demos: wikipedia embeddings (BAAI/bge-small-en-v1.5 + UMAP + 10-way
  GaussianMixture soft clustering, markersize=2) and reddit conversation
  trajectories (convokit reddit-small, sliding-window SBERT, per-speaker
  colors, animate='serial', 30s/3 rotations) -- both executed 0 errors
  with committed gifs
- 205 tests passing

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tax, 30fps standard, rebuilt demos

WEIGHTS (the classic readthedocs animation) — fully diagnosed via Jeremy's
2020 pieman_trajectory_demo notebook (hypertools 0.6.2 + timecorr):
  gaussian temporal smoothing (var=300) -> hyp.align repeated n_iter=20
  (SRM) -> smooth again -> UMAP -> animate
Two additional findings were required to reproduce it on modern deps:
- align('SRM') now supports n_iter (re-fits SRM on each pass's output;
  inter-subject correlation plateaus ~0.87 = the data's shared-signal
  ceiling, matching era behavior since the vendored SRM is unchanged
  since v0.6.2)
- modern umap-learn's default n_neighbors=15 keeps neighborhoods
  within-subject and DISPERSES the aligned bundle; n_neighbors=150 merges
  same-timepoint rows across subjects and reproduces the tight looping
  rope of the original. Recipe scripted in
  scripts/generate_weights_trajectory.py; gif regenerated (900 frames,
  30fps, tight rope verified against the reference render)

Also in this round:
- repeated-hyperalignment scale collapse fixed (procrustes' optimal
  scaling < 1 shrinks data geometrically across passes; per-pass output
  rescaling keeps norms stable through n_iter=50)
- single-call soft-cluster coloring:
  hyp.plot(x, '.', markersize=2, reduce='UMAP',
           cluster={'model': GaussianMixture, 'n_clusters': 10})
  (dict accepts top-level n_clusters and model classes, in both cluster()
  and plot(); colors flow from mixture proportions automatically)
- 30fps animation standard: plotly frame density raised to 30/s (cap
  600); all tutorial gifs re-rendered at fps=30 with no conversion
  downsampling (wikipedia 300 frames/10s/1 rotation; conversation 900
  frames/30s/1 rotation; lorenz + hf spin 900 frames/30s); shapes morph
  3510-frame mp4 + 30fps preview gif; matplotlib evidence gifs at 450
  frames (plotly evidence gifs kept from the prior render -- kaleido
  makes 450-frame exports impractically slow; noted)
- conversation demo rebuilt: 3-sentence windows WITHIN utterances (true
  disconnection), repeating per-speaker colors, animate='serial',
  rotations=1, no frame clipping
- 206 tests passing

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The n_neighbors=150 recipe over-globalized the UMAP embedding and
flattened the trajectory into a near-straight line; download.png shows a
tightly-bundled bundle with a dramatic loop. Swept n_neighbors against
the reference: 15 disperses into a hairball, 150 flattens the loop, and
36 (min_dist=0.1) is the sweet spot -- one tight rope that keeps the
loop.

Verified this is a pure UMAP-neighborhood effect, not a dependency
version: the modern SRM branch is byte-identical to v0.6.2, and era
umap-learn 0.4.6 also hairballs the same aligned data at its default
n_neighbors=15.

Also switched the animation from a rolling window (animate=True +
tail_duration) to animate='spin': the window only ever showed a ~4s
fragment (a tangle), while spin draws the whole bundle and orbits it so
the loop is visible from every angle. Tightened the gif encode
(scale=340, 48-colour palette) to ~8MB, in line with the other
animation gifs, still 900 frames at 30fps with no downsampling.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Switch back from animate='spin' to the classic animate=True sliding
window, per review. The critical property -- the space inside the cube
is fixed for the whole animation, independent of which window is
visible -- is guaranteed by the pipeline (helpers.scale normalizes once
from the full stacked dataset and the window updater never touches axis
limits) and verified programmatically: axis limits are identical across
frames while the visible fragment's extent slides along the loop.

(The earlier claim that window mode showed a static tangle was a frame-
extraction bug in the verification harness -- PIL's ImageSequence
yields one re-seeked image object, so materializing it with list()
produced N copies of the final frame. Measured correctly, the window
render rotates and the comet travels: mean inter-second frame motion
8.3, 0 clipped frames.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Streams are a first-class data type -- no flag. hyp.plot() detects
Python iterators/generators and Hugging Face IterableDatasets
(load_dataset(..., streaming=True)) from the structure of the input:

- the first stream_init samples (default 10,000) ESTIMATE the
  normalization/reduction parameters; those fitted models are then
  APPLIED to every subsequent sample, which is added to the plot
  dynamically (the fit-on-head/transform-forever semantics from the
  issue thread)
- stream_chunk (default 100) is the per-fetch batch size; each chunk
  renders as one live redraw / saved animation frame
- stream_max (default None) streams continually; infinite streams
  render incoming data indefinitely, and Ctrl-C cleanly finalizes any
  save_path animation and returns the geometry
- stream_window optionally shows only the trailing samples
  (comet style) while everything consumed stays on the geometry
- reduction models must support transform() (IncrementalPCA default,
  PCA, UMAP; TSNE raises); align/cluster raise for streams (cluster
  planned)
- dict rows: numeric fields concatenated in insertion order, strings
  ignored (use .select_columns() for control); datasets added to [dev]

14 real tests (tests/test_streaming.py) incl. an actual HF iris stream,
interrupt finalization, and a fitted-on-head-only assertion. New
executed tutorial docs/tutorials/streaming_data.ipynb with two
streaming animations.

Docs theme: pydata-sphinx-theme -> Furo with the ContextLab brand
ported from ContextLab/scheduler (Nunito Sans, lowercase 300-weight
headings with 0.6px letter-spacing, green #007030 / dark #4CAF50),
screenshot-verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jeremymanning

Copy link
Copy Markdown
Member Author

Round-4.5: the weights animation is solved, plus the requested animation & API fixes ✅

The story-trajectories mystery — full diagnosis

The classic readthedocs animation could not be reproduced by any plot(align=...) call because it was never produced by one. Working from the 2020 pieman_trajectory_demo notebook (hypertools 0.6.2 + timecorr), the actual pipeline is:

gaussian temporal smoothing (var=300)
  → hyp.align(...) applied REPEATEDLY (n_iter=20, SRM)
  → smooth again
  → UMAP
  → animate

Two things still had to be fixed/understood to make that work on modern dependencies:

  1. align('SRM', n_iter=...) now re-fits SRM on each pass's aligned output (matching the notebook's loop). Inter-subject correlation plateaus at ~0.87 — the data's shared-signal ceiling (the vendored SRM is byte-identical to v0.6.2, so this matches era behavior).

  2. UMAP's n_neighbors sets the whole character of the embedding. The modern default (15) keeps neighborhoods within-subject and disperses the aligned data into a hairball; very large values (150+) over-globalize and flatten the trajectory into a near-straight line. n_neighbors=36 (with min_dist=0.1) is the sweet spot: same-timepoint rows across the 36 subjects rope together into one tight bundle while preserving the dramatic looping bend of the classic reference.

    I verified this is a pure neighborhood-size effect, not a dependency-version artifact: the modern SRM alignment is byte-identical to v0.6.2 (SRM(features=min(shape[0])), re-fit per pass), and fitting the era umap-learn 0.4.6 on the same aligned data also produces a hairball at its default n_neighbors=15. So the look is recovered by neighborhood tuning on current dependencies rather than by pinning old ones.

The animation is the classic sliding-window style (animate=True) in a fixed space: the data are scaled once from the full dataset before animating, so the axis limits never depend on which window is visible — the comet travels along the loop inside a stationary cube while the camera makes one slow rotation (900 frames @30fps). I verified the fixed-view property programmatically: axis limits are identical across all frames while the visible fragment's extent slides along the trajectory. The full recipe is scripted in scripts/generate_weights_trajectory.py:

weights

Along the way, repeated hyperalignment got a real fix too: procrustes' optimal scaling is < 1 under noise, so repeated passes shrank the data geometrically (eventually crashing). Per-pass rescaling keeps norms stable through n_iter=50.

Single-call soft clustering (requested syntax)

geo = hyp.plot(embeddings, '.', markersize=2, reduce='UMAP',
               cluster={'model': GaussianMixture, 'n_clusters': 10})

works verbatim: the cluster dict accepts top-level n_clusters and model classes (in both cluster() and plot()), and colors flow from the mixture proportions automatically as exact per-point blends. The Wikipedia tutorial now uses this one-liner (10s, 1 rotation, small dots):

wikipedia

Conversation demo on the new animate='serial' mode

Windows are now 3-sentence windows within each utterance, so utterances are truly disconnected trajectories; speaker colors repeat (one color per speaker); rotations=1; no frame clipping (border-verified):

conversation

30fps smooth/slow animation standard

All animations render at frame_rate=30 and gifs convert at fps=30 with no downsampling (the earlier choppiness came from fps-6/12 conversions). plotly's animation frame density is raised to 30 effective fps (cap 600). Regenerated at the new standard: shapes morph (3,510-frame mp4 + 30fps preview below), lorenz + HF spin tutorials (900 frames / 30s each), matplotlib evidence gifs (450 frames).

morph

(One honest caveat: the two plotly evidence gifs kept their previous renders — kaleido makes 450-frame plotly exports impractically slow; everything user-facing uses the new standard.)

206 tests passing · all four tutorial notebooks re-executed with 0 errors · CI green.

🤖 Generated with Claude Code

@jeremymanning

Copy link
Copy Markdown
Member Author

Round 5: streaming data (#101), streaming tutorial, and the ContextLab docs theme ✅

Streaming data — closes the oldest open feature request (#101, 2017)

Streams are now a first-class data type, exactly as specified in the issue thread: no streaming flaghyp.plot() infers it from the structure of the input. Python iterators/generators and Hugging Face streaming datasets both work:

from datasets import load_dataset
ds = load_dataset('scikit-learn/iris', split='train', streaming=True)
geo = hyp.plot(ds, '.')          # streams straight in, nothing materialized

Semantics (matching the issue design + review guidance):

  • stream_init (default 10,000): the initial samples used to estimate the normalization/reduction parameters. Those fitted models are then applied to every subsequent sample, which is added to the plot dynamically — verified by test: IncrementalPCA.n_samples_seen_ == stream_init after a full stream, and the stored model's transform() exactly reproduces the plotted trajectory.
  • stream_chunk (default 100): how many samples are fetched per update; each chunk renders as one live redraw / one saved animation frame, so it sets the animation's temporal resolution.
  • stream_max (default None): streaming continues until the stream ends, stream_max is hit, or the user hits Ctrl-C. Infinite streams render continually, and any save_path animation is finalized whenever streaming stops — including on interrupt (tested with a generator that raises KeyboardInterrupt mid-stream: the gif is finalized and the geometry returned).
  • stream_window (optional): comet-style display of only the most recent samples for long/infinite streams; everything consumed is still retained on the returned geometry (geo.data, geo.xform_data, geo.stream_info).

Guardrails: reduction models must support transform() (IncrementalPCA default, PCA, UMAP; TSNE raises with a clear message); align/cluster raise for streams (a stream is a single dataset; streaming clustering is future work). Dict rows (the HF case) contribute their numeric fields in insertion order; .select_columns(...) gives exact control.

14 real tests (tests/test_streaming.py) — real generators, a real infinite stream, a real interrupt, and a real load_dataset(..., streaming=True) network stream; no mocks.

New executed tutorial (docs/tutorials/streaming_data.ipynb) with two streamed animations:

streaming lorenz

Docs theme: Furo with the ContextLab look

Ported from ContextLab/scheduler per review: stock Furo plus the lab's brand — Nunito Sans, lowercase 300-weight headings with 0.6 px letter-spacing, green #007030 (light) / #4CAF50 (dark). Screenshot-verified across index/API/tutorial pages:

furo streaming tutorial

220 tests passing (213 fast set + 7 animation-export) · streaming tutorial executed with 0 errors · docs build clean.

🤖 Generated with Claude Code

jeremymanning and others added 7 commits July 2, 2026 21:34
Gallery: sphinx-gallery previously executed only plot_*-named examples,
leaving chemtrails/animate*/precog/explore/save_*/analyze pages with
code but no rendered output. All examples now execute
(filename_pattern), matplotlib animations render as embedded mp4 video
(matplotlib_animations + sphinxcontrib-video; animation examples expose
`ani = ani_geo.line_ani` for the scraper), save_* examples write to
temp files, and every example page gets a branch-aware "Open in Colab"
badge + .ipynb link (post_build.py) so gallery examples open as
runnable notebooks.

Dual-backend audit (scripts/audit_gallery_backends.py): every example
runs under BOTH matplotlib and plotly in subprocesses; 78/78 pass
(save_movie under plotly needs a long timeout -- kaleido per-frame mp4
export). The audit caught a real, years-latent bug: per-dataset fmt
lists (['-','--']) routed each dataset through interp_array_list
(plural), silently replacing 2D arrays with lists of per-row
interpolations; latent because is_line() always returned False before
its round-2 fix. Fixed with interp_array + regression test.

Plotly parity for animation extras: chemtrails/precog/bullettime draw
low-opacity trail traces on window animations, tail_duration sets the
window length, and zoom moves the camera (r = 1.95*(9-zoom)/8,
mirroring the matplotlib zoom semantics); previously none of these were
forwarded to the plotly renderer. explore maps to plotly's native
hover.

Universal loader: hyp.load (and DataGeometry.plot/transform) resolve
strings by trying, in order: built-in dataset name -> local file
(npy/npz/csv/tsv/txt/json/parquet/mat/pickle) -> Hugging Face dataset
(split=/streaming=; streaming feeds straight into hyp.plot) -> Google
Drive URL or bare id -> Dropbox URL or shared-link path -> any URL with
or without https://. Lists of strings return lists of datasets. Raw
text (whitespace) still flows to the text-embedding pipeline. Also
fixes df2mat for pandas>=2 (get_dummies bool dtype made mixed
DataFrames produce object arrays that crashed np.isnan).

19 new tests (12 loader incl. real HF/Drive/Dropbox/URL fetches, 6
plotly trails, 1 interp regression); 232 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two-layer fix for example pages showing code but no output:

1. nbsphinx was claiming the gallery pages: sphinx-gallery writes a
   downloadable .ipynb next to each generated .rst, and nbsphinx
   rendered the UNEXECUTED notebook instead of the gallery page.
   auto_examples/*.ipynb is now excluded from the document build
   (downloads still work).
2. matplotlib animations render as embedded HTML5 video: the
   sphinxcontrib.video extension is registered (required by
   matplotlib_animations=(True, 'mp4')) and all animation example
   pages (chemtrails, precog, animate*, save_movie) now embed a
   playable 30s mp4, verified visually.

Includes the regenerated gallery artifacts (all 39 examples executed,
9m39s total build execution).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…report

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…round 6.5)

NEW ANIMATION STANDARD (both backends): frame_rate=30, duration=30s,
rotations=1 -- one revolution every 30 seconds. Three layers had to
change for plotly to actually match matplotlib's pacing:
- library defaults (frame_rate 50->30, rotations 2->1) and the plotly
  frame math: n_frames = frame_rate * duration exactly like matplotlib
  (600-frame cap removed); parity test asserts identical 900 frames at
  ~33ms on both backends
- DataGeometry.plot no longer replays animation-pacing kwargs baked
  into saved .geo files (old-era defaults like frame_rate=50 and
  rotations=2 silently overrode the current standard in every gallery
  example that calls geo.plot); explicit caller overrides still win
- docs builds: plotly's sphinx-gallery renderer serialized every
  animation frame through kaleido for one static png (a 900-frame
  figure took ~an hour); the show path now writes a frame-stripped png
  plus an interactive html with embedded frames capped at 150, each
  shown proportionally longer -- total duration and rotation speed
  unchanged, pages ~0.1MB

Streaming stability: the data->box transform is FROZEN from the head
(the center+scale affine is captured once); every future sample goes
through the same transform and out-of-range samples are clamped to the
closest point on the box surface. Axis limits never change once set --
no more per-chunk rescale "twitch" (verified: zero vanishing ink
across tutorial animation frames + exact-position regression tests).

Legends (both backends): rendered to the RIGHT of the plot, vertically
centered on the box (mpl bbox_to_anchor; plotly x=1.02/y=0.5 with a
reserved right margin). Screenshot-verified in 2D and 3D.

Gallery UX: thumbnail clicks were dead (sphinx-gallery >= 0.17 no
longer wraps the thumb <img> in an anchor; the old gallery-fixes.js
targeted extinct .xref markup) -- thumbnails now open the example's
notebook on Colab while title text opens the example page. Animated
thumbnails were squashed into 200x200 squares from 4:3 sources; all 7
regenerated letterboxed at the correct aspect from the new 30fps mp4s
(scripts/generate_gallery_thumbs.py). Plotly evidence gifs re-rendered
at the pacing standard. DataGeometry.plot/transform docstrings document
the universal string-loading behavior for the API pages.

237 tests passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A kaleido subprocess wedged during test_animated_svg_plotly on one
Windows runner and burned the full 6-hour Actions job timeout. No test
legitimately takes over 20 minutes; a hung native call now fails fast
with a stack dump instead of holding a runner hostage (thread method,
since the hangs are inside native calls).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jeremymanning

Copy link
Copy Markdown
Member Author

Round 6: gallery pages render everything, dual-backend audit, notebooks-on-click, universal loader ✅

Gallery pages now display their outputs (including animations)

chemtrails.html (and 9 other pages) showed code with no output. Two independent root causes, both fixed:

  1. sphinx-gallery only executed examples named plot_* — chemtrails/animate*/precog/explore/save_*/analyze were never run. All 39 examples now execute at build time, and matplotlib animations render as embedded HTML5 video (matplotlib_animations=(True,'mp4') + sphinxcontrib.video; the animation examples now expose ani = ani_geo.line_ani, which also documents how to grab the animation object).
  2. nbsphinx was hijacking the pages: sphinx-gallery writes a downloadable .ipynb next to each generated .rst, and nbsphinx rendered the unexecuted notebook instead of the gallery page. Gallery notebooks are now excluded from the document build (downloads unaffected).

The chemtrails page with its playable 30s animation (screenshot):

chemtrails page

Clicking a gallery example gets you a runnable notebook

Every example page opens with an Open in Colab badge (branch-aware link to the example's committed notebook) next to the .ipynb download — visible in the screenshot above, injected across all 40 pages by post_build.py. If you'd prefer gallery thumbnails to jump straight to Colab instead of the example page, that's a one-line change to the click handler — say the word.

Every example verified on BOTH backends

New audit harness (scripts/audit_gallery_backends.py) runs all 39 examples under matplotlib and plotly in subprocesses: 78/78 pass (full report, side-by-side spot-check). Two things fell out of it:

  • A real, years-latent data-corruption bug: per-dataset format lists (['-','--']) routed each dataset through interp_array_list (plural), silently replacing 2D arrays with lists of per-row interpolations. It was unreachable for years because is_line() always returned False before its round-2 fix. Fixed (interp_array) with a regression test; this is what was crashing plot_procrustes and plot_missing_data.
  • Plotly parity gaps closed: chemtrails/precog/bullettime now draw low-opacity trail traces on plotly window animations, tail_duration sets the plotly window length, and zoom moves the camera (mirroring the matplotlib zoom semantics) — none were previously forwarded to the plotly renderer. explore maps to plotly's native hover labels. (save_movie under plotly passes but takes ~32 min — kaleido renders all 600 mp4 frames; noted in the report.)

Universal loader (hyp.load + DataGeometry)

Strings (and lists of strings) resolve in the requested order: built-in dataset → local file (npy/npz/csv/tsv/txt/json/parquet/mat/pickle) → Hugging Face dataset (streaming=True feeds straight into hyp.plot) → Google Drive URL or bare file id → Dropbox URL/shared-link path → any URL with or without https://. Raw text still flows to the text-embedding pipeline (whitespace is the discriminator). All verified with real network fetches — including the legacy Drive-hosted files, Dropbox dl=0 links, and schemeless URLs. Also fixed df2mat for pandas≥2 (bool get_dummies produced object arrays that crashed mixed-dtype DataFrame plotting).

239 tests passing (232 fast set + 7 animation-export) · 78/78 dual-backend example runs · docs build clean with all examples executed (9m39s) · CI green.

🤖 Generated with Claude Code

@jeremymanning

Copy link
Copy Markdown
Member Author

Round 6.5: animation pacing standard, streaming stability, legends, gallery UX ✅

One animation standard, both backends: 30 fps · 30 s · 1 rotation per 30 s

"Plotly is much too fast" had three stacked causes, each now fixed and regression-tested:

  1. Library defaultsframe_rate 50→30, rotations 2→1, and plotly now generates exactly frame_rate × duration frames like matplotlib (its 600-frame cap is gone). A parity test asserts both backends produce identical 900 frames @ ~33 ms at defaults.
  2. Saved geos smuggled in old pacing — gallery examples call geo.plot(...) on pickled example data whose stored kwargs carry their era's defaults (frame_rate=50, rotations=2), silently overriding current defaults. DataGeometry.plot no longer replays pacing kwargs from saved files (explicit overrides still win).
  3. Docs rendering — plotly's sphinx-gallery renderer pushed every animation frame through kaleido for one static png (a 900-frame figure took ~57 min and would embed tens-of-MB pages). The docs path now writes a frame-stripped png + an interactive html with embedded frames capped at 150, each displayed proportionally longer — duration and rotation speed unchanged, pages ~0.1 MB.

All gallery animation mp4s re-rendered at the standard (verified 900 frames @ 30 fps via ffprobe), and the plotly evidence gifs re-exported at correct pacing.

Streaming: view is rock-stable, out-of-range samples clamp to the box

The data→box transform is frozen from the initial samples (the center+scale affine is captured once); every future sample passes through that exact transform, and anything outside the box is clamped to the closest point on its surface. Axis limits never change once set. Verified: zero vanishing ink across the re-rendered tutorial animation (previously drawn pixels never move), plus regression tests for exact drawn-position stability and box-surface clamping:

streaming stable

Legends: right of the plot, vertically centered (both backends)

matplotlib | plotly:

Gallery: thumbnails click through to notebooks, correct aspect

Thumbnail clicks were genuinely dead (sphinx-gallery ≥ 0.17 no longer wraps the image in a link, and the old fix-up JS targeted markup that no longer exists). Thumbnails now open the example's notebook on Colab; the title text under each thumbnail opens the example page with the rendered output. The animated thumbnails were also being squashed into squares from 4:3 sources — all seven are regenerated letterboxed at the correct aspect from the new 30 fps videos:

gallery

API docs

hyp.load's docstring documents the full resolution chain (builtin → local file → Hugging Face incl. streaming → Drive → Dropbox → URL, lists supported), and DataGeometry.plot/transform now document automatic string-source loading — both regenerate into the API reference.

(CI hardening: a kaleido subprocess wedged for 6 hours on one Windows runner before hitting the Actions timeout -- pytest-timeout now caps every test at 20 minutes so a hung native call fails fast instead of burning a runner.)

237 tests passing · gallery animations verified at 900 frames / 30 fps · streaming tutorial re-executed with 0 errors.

🤖 Generated with Claude Code

jeremymanning and others added 2 commits July 3, 2026 08:35
Every documentation notebook (39 gallery + 13 tutorials) now opens with a
branch-aware install cell so it runs standalone in Google Colab. On a
preview branch it installs that branch from GitHub
(`%pip install "hypertools[interactive] @ git+...@dev-2.0"`, verified in a
clean venv: imports as 2.0.0.dev0 with 2.0-only features working); on
master it installs the released package. scripts/add_colab_install_cell.py
injects the line idempotently into the hand-authored tutorial notebooks,
and conf.py's first_notebook_cell emits the same line for gallery
notebooks on rebuild.

Gallery examples:
- shapes zoo: plots EVERY zoo shape (bunny, cube, dragon, sphere, teapot,
  vase, biplane) as small black dots (',' pixel marker), one panel each
- datasaurus: plots ALL THIRTEEN datasets of the dozen as small black
  dots ('.' point marker), one panel each

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant