HyperTools 2.0: modernized toolbox, interactive backend, soft clustering, comprehensive visual verification#270
HyperTools 2.0: modernized toolbox, interactive backend, soft clustering, comprehensive visual verification#270jeremymanning wants to merge 48 commits into
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…audit Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erator, dev notebook - scripts/screenshot_harness.py: headless PNG capture per function/use-case - scripts/generate_baseline_screenshots.py: 13 baseline cases, all passing on v0.8.2 - dev/hypertools_2.0_dev.ipynb: interactive test matrix, one section per public function - Roadmap updated with design decisions mined from fork issue tracker (incl. comments) - tests/screenshots/ gitignored (reviewed locally / CI artifacts, not committed) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…mixture models to first-class 2.0 features; record approved backend='auto' policy Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…models, robust coloring Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lazy heavy imports - Migrate to PEP 621 pyproject.toml (v2.0.0.dev0, py3.10+); delete setup.py, requirements.txt, MANIFEST.in, stale .travis.yml - CI: py3.10-3.13 matrix, setup-python@v5, cache@v4, codecov@v4, screenshot artifact upload; readthedocs python 3.9->3.11 - Remove memoize entirely (user requirement): str()-keyed cache truncated numpy arrays -> cache collisions returned wrong results (fork issue #3) - Lazy-import umap, seaborn, scipy.interpolate: import hypertools 5.1s -> 1.46s - 136/136 tests pass Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…re-formatting, scope plot styling (fixes #259) - Replace external hdbscan package with sklearn.cluster.HDBSCAN (always available); drop the SyntaxWarning filter that existed only for it - plot(): pass format_data=False to the post-analyze reduction (data was already formatted; avoids a redundant format_data/PPCA pass per plot) - plot(): apply seaborn palette/style inside plt.rc_context() so plotting no longer permanently mutates matplotlib rcParams (GH #259) - verified with a real before/after rcParams diff - 136/136 tests pass Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lors) - cluster() supports GaussianMixture, BayesianGaussianMixture, LDA, NMF and returns (n_samples, n_components) membership proportions (rows sum to 1); hard-clustering behavior unchanged - New hypertools/tools/colors.py: mat2colors maps categorical labels, continuous 1D values, or 2D matrices (soft assignments / arbitrary numeric matrices) to RGB; colors2groups quantizes per-point colors into traces for the matplotlib renderer - plot() accepts cluster='GaussianMixture' etc. (points colored by proportion-weighted blends) and matrix-valued hue - 145/145 tests pass (9 new: real GaussianMixture/BGM/LDA/NMF calls, color-blend math, end-to-end mixture plot) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- plot([[a, b], [c]]) flattens arbitrarily nested dataset lists, coloring every leaf by its outermost group and rendering deeper leaves thinner and fainter (summary -> detail, per fork design issues #14/#16) - Nested string lists (text corpora) are explicitly excluded and keep their existing text-pipeline behavior - 156/156 tests pass (6 new, incl. rendered-line color/width assertions) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- New hypertools/plot/interactive.py: plotly renderer mirroring _draw's contract (2D/3D traces, fmt-string mode mapping, per-trace colors/labels, hypertools no-ticks aesthetic, matplotlib elev/azim -> plotly camera) - Animations: sliding-window frames (animate=True) and camera spin (animate='spin') with play/pause controls - hyp.plot(..., backend='auto'|'matplotlib'|'plotly'): auto uses plotly ONLY on Colab/Kaggle (approved policy); matplotlib default everywhere else - Screenshot harness exports plotly figures via kaleido - 169/169 tests pass (13 new: policy resolution incl. Colab/Kaggle markers, fmt mapping, camera math, end-to-end plotly figure/animation assertions) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ok, backend fix, README 2.0 docs - scripts/generate_verification_screenshots.py: 44/44 cases pass covering every public function (plot/reduce/align/normalize/cluster/analyze/ describe/format_data/load/text) on both backends; INDEX.md manifest; curated copy committed to docs/images/v2.0-verification/ for PR evidence - dev notebook executed end-to-end with 0 errors via nbclient (dev/hypertools_2.0_dev_executed.ipynb); notebook cells updated to exercise implemented 2.0 APIs - backend.py: catch ValueError from mpl.use() -- matplotlib >=3.9 raises it (not ImportError) for missing ipympl; likely root cause of Colab animate=True failures (#235) - README: What's new in 2.0 + modernized requirements; ipykernel in [dev] - Full suite re-verified: 169/169 tests, 13/13 baselines, import 1.5-1.7s Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… time GitHub's windows/py3.13 runners ship a broken Tcl/Tk: TkAgg imports fine (so backend probing selects it) but window creation raises _tkinter.TclError. manage_backend now retries the plot once on the original backend after an interactive-backend TclError instead of crashing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- plotly renderer now reproduces the matplotlib aesthetic exactly: black
wireframe cube (3D) / square frame (2D), hidden axes, unit-cube range,
matched camera (elev/azim), pt->px line/marker sizing, full fmt-string
support (marker symbols + dash styles, with 3D symbol fallbacks)
- MULTICOLORED LINES: continuous or matrix hue + line fmt colors each
trajectory continuously along its length (matplotlib Line3DCollection /
LineCollection; plotly per-point line colors in 3D, segment traces in 2D)
- Fix long-standing is_line() bug: '' in Line2D.markers made it return
False for every fmt string, silently disabling line interpolation on
modern matplotlib; also parse linestyles before marker chars ('-.')
- Re-mapped per-point labels onto interpolated trajectories (fixes latent
IndexError that interpolation re-enablement exposed)
- Parity montage generator (scripts/generate_parity_screenshots.py):
matplotlib|plotly side-by-side for 22 identical calls
- 173/173 tests pass
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Removed (previously deprecated, now retired for 2.0): plot's group/model/ model_params; reduce's model/model_params/normalize/align; align's method/normalize/ndims and the ambiguous align=True form (now a clear ValueError with migration hint); cluster's ndims - DataGeometry.plot translates/drops retired kwargs when replaying geos saved by hypertools < 2.0 (group -> hue), so old files still load - New hyp.apply_model: the stack/unstack core from the revamp design -- one model fit across stacked datasets then unstacked to input structure (stack=False for per-dataset fits); model specs as registry name / dict / sklearn instance / pipeline list; mode auto|fit_transform|fit_predict| predict_proba; return_model for reuse on held-out data; explicit whitelist registry (no eval) - 185/185 tests pass (12 new apply_model tests) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s for all 2.0 features - docs/images/v2.0-parity/: 22 side-by-side (matplotlib | plotly) montages of identical calls -- line/marker styles, dashes, sizing, colors, hue variants, clustering, mixtures, nested lists, multicolored lines - docs/images/v2.0-verification/: refreshed 75-case matrix (was 44) now covering every plotting feature on BOTH backends, incl. multicolored lines, mixture models, nested lists, marker/line styles, animations, and apply_model - 5 new gallery examples (interactive backend, mixture models, multicolored lines, nested lists, apply_model), all executing cleanly; gallery rebuilt; apply_model added to the API reference - dev notebook updated for all implemented 2.0 features and re-executed end-to-end with 0 errors - README documents multicolored lines, apply_model, backend parity, and the retired legacy arguments - 185/185 tests pass; import 1.41s Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All review items addressed ✅Every item from the review is now implemented, tested, and screenshotted (PR body updated with full details). Summary of what changed since the initial submission: 1. Backend visual parityThe plotly renderer was rewritten to reproduce the matplotlib output exactly: same wireframe cube / square framing, hidden axes, matched camera (elev/azim), pt→px-calibrated line widths and marker sizes, full format-string support (marker symbols + dash styles), and identical palette assignment. Evidence: 22 side-by-side montages (matplotlib left | plotly right, same call) in Multicolored line, same call on both backends: 2. Complete feature screenshot coverageThe verification matrix grew from 44 → 75 cases, all passing (manifest): clustering (hard + all four mixture models), multilevel/nested lists, multicolored lines (new feature — continuous per-segment coloring along trajectories, both backends), matrix/continuous/categorical hue, marker + line styles, animations, 3. Formerly deferred items — all now in this PR
Bonus fix found while restoring parity
Final numbers: 185 tests passing · 75/75 verification cases · 22/22 parity montages · dev notebook executes with 0 errors · docs build clean · CI matrix green. 🤖 Generated with Claude Code |
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ments, plotly gallery - Animation EXPORT on both backends, format by extension: .gif (Pillow), .png/.apng (animated PNG), .mp4/.mov/.avi (ffmpeg). plotly animations render each frame via kaleido then assemble; exported frames no longer include the play/pause controls; frame counts scale with duration. 7 new tests save real files and verify frame counts. Sample GIFs from BOTH backends committed to docs/images/v2.0-animations/. - Mixture demos now use OVERLAPPING clusters (1.5 sd apart) so multi-class membership is visible as blended colors (examples, screenshots, parity, notebook); new test asserts a substantial fraction of genuinely mixed assignments. - Backend parity refinements: centered black 12pt title (matching matplotlib), default 640x480 canvas, 2D frame fills the canvas like matplotlib (no forced square), 3D box uses matplotlib's 4:4:3 aspect, camera distance tuned so cube sizes match (r=1.95). - Sphinx gallery renders plotly figures (plotly_sg_scraper + kaleido); new animate_plotly example with an animated GIF thumbnail wired into post_build; interactive-backend example shows the plotly figure inline. - Dev notebook displays animations inline (to_jshtml + plotly frames) and demonstrates gif export; re-executed end-to-end with 0 errors. - Evidence regenerated: 22/22 parity montages, 75/75 verification cases. - 192/192 tests pass (185 + 7 animation-export) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Root cause of the macos/py3.11 CI failure: Google Drive answered a dataset request with an HTML rate-limit page (200 status), which load() cached as the dataset -- poisoning every subsequent text-data test on that runner with UnpicklingError. - _download_example_data: raise_for_status; detect HTML error pages before caching (all example datasets are pickles, which never start with '<') - _load_example_data: on a corrupt cache, delete it and retry the download once before failing; never leave a poisoned cache behind - Regression test poisons the real cache with the actual Drive error page and verifies recovery (or clean failure with the cache removed) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… CI jobs - _download_example_data retries up to 4 times (2s/6s/18s backoff) when the host rate-limits, instead of failing on the first error page - CI caches ~/hypertools_data (immutable datasets, one cross-OS entry) so 24 concurrent jobs stop re-downloading the same files from Google Drive every run -- the root cause of the intermittent text-test failures Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-2 review items addressed ✅1. Mixture demos now show true multi-class membershipAll mixture-model demos (examples, screenshots, parity montages, dev notebook) use overlapping clusters (1.5 sd apart), so points in the overlap regions have genuinely mixed memberships and render with blended, intermediate colors. A new test asserts that a substantial fraction of points have soft (< 0.9 max-proportion) assignments. 2. Axis sizing/ratio and title placement now match
All 22 parity montages regenerated: 3. Animation works and exports to gif / animated png / mp4 — both backends
The dev notebook now displays animations inline ( 4. Sphinx gallery renders plotly output (including animation)
Bonus: dataset-download hardening (found via a CI failure during this round)One macOS CI job failed because Google Drive rate-limited a dataset download and returned an HTML error page with a 200 status, which Updated numbers: 193 tests passing · 75/75 verification cases · 22/22 parity montages · 4 committed animation exports · docs + gallery build clean · CI green. 🤖 Generated with Claude Code |
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…multi-panel + cache verification - SVG export on both backends: static (.svg) plus ANIMATED vector SVG via SMIL (frames stitched with discrete display switching; verified frame advance in headless Chrome by scrubbing setCurrentTime). matplotlib frames captured through a public AbstractMovieWriter subclass with frame subsampling (<=60 frames) - plotly window animations now rotate the camera while the window advances, matching matplotlib's behavior - plotly titles centered over the plot area (xref='paper') with a matplotlib-matched font stack - hyperalign: n_iter argument (default 10) iteratively re-estimates the common template; dict form no longer returns None; removed leftover 'method' reference that raised NameError for unknown align strings - shapes zoo: bunny/cube/dragon/sphere/teapot/vase/biplane + datasaurus registered with their Dropbox sources (direct-URL download support in the loader; tolerant unpickler for dill/legacy-pandas formats; dill added as a dependency). 'egyption_mask' excluded: upstream file is an empty (0,3) array - Multi-panel figures verified (hyp.plot(..., ax=...) into user subplot grids, 3D + 2D panels) - Re-download hygiene verified: repeated loads leave the cache byte-stable (no duplication / storage leak) - Reconstructed the classic readthedocs hyperaligned-weights animation (docs/images/v2.0-animations/weights_hyperaligned.gif) - 9 new tests (tests/test_round3.py), all real calls Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… regen - Docs: modern pydata-sphinx-theme (replacing sphinx_bootstrap_theme); full site + gallery build verified, screenshots committed to docs/images/v2.0-theme/. nbsphinx_execute='never': tutorial notebooks ship pre-executed, and 'auto' was re-executing every gallery notebook (doubling build time and hanging on plotly exports in the nbsphinx kernel) - Fixed zoom: Axes3D.dist was removed in matplotlib >= 3.8, silently disabling animation zoom; replaced with set_box_aspect(zoom=...) using the exact legacy scale mapping (10 / (9 - zoom)) - Reconstructed the classic readthedocs hyperaligned-weights animation (docs/images/v2.0-animations/weights_hyperaligned.gif): 36 subjects, align='hyper', smooth interpolated trajectories, working zoom - Modern demos: gallery examples plot_shapes_zoo + plot_datasaurus; executed tutorial notebooks hugging_face_embeddings (sentence- transformers + HF ag_news, mixture soft clustering, UMAP, animated spin gif) and modern_sklearn_dynamics (HDBSCAN, GaussianMixture, Lorenz attractor multicolored line, animated gif); registered in the tutorials toctree - Animation evidence regenerated with the rotation + zoom fixes; parity montages regenerated (22/22) with the title-font change - 202/202 tests pass Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-3 review items — all 10 addressed ✅1. SVG export: static AND animated, both backends
2. plotly window animation now rotates
3. Titles matchplotly titles are centered over the plot area ( 4. Multi-panel figures verified
5. Hyperalignment
|
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ality fixes, embedding demos - hyperalign now REPEATEDLY applies the full procedure (n_iter passes, default 10): each pass's aligned output feeds the next, compounding convergence (mean distance to group mean 0.43 -> 0.004 over 10 passes on rotated copies). The classic readthedocs weights animation is regenerated with the corrected pipeline (normalize='across', align='hyper', zoom=3.5, rotations=1, frame_rate=50, linewidth=3) - NEW animate='serial' mode (both backends): datasets appear one at a time in list order, each growing point-by-point while earlier ones stay fixed, never connected -- built for conversation-turn visualizations; tests assert sequential reveal on both backends - Animation quality: full-canvas animated axes + skip tight_layout fixes cube/data clipping at rotation angles (border-pixel regression test); linewidth is now a plot() argument and animations no longer hardcode linewidth=1; markersize is now a plot() argument - EXACT per-point colors for markers: matrix hue / mixture-model scatter renders true per-observation blends via scatter (was: quantized color groups); plotly path already carried per-point colors - Shapes morph: 3510 frames @ 30fps (117s, 13 rotations), committed as mp4 + 20s preview gif - Demos: wikipedia embeddings (BAAI/bge-small-en-v1.5 + UMAP + 10-way GaussianMixture soft clustering, markersize=2) and reddit conversation trajectories (convokit reddit-small, sliding-window SBERT, per-speaker colors, animate='serial', 30s/3 rotations) -- both executed 0 errors with committed gifs - 205 tests passing Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tax, 30fps standard, rebuilt demos
WEIGHTS (the classic readthedocs animation) — fully diagnosed via Jeremy's
2020 pieman_trajectory_demo notebook (hypertools 0.6.2 + timecorr):
gaussian temporal smoothing (var=300) -> hyp.align repeated n_iter=20
(SRM) -> smooth again -> UMAP -> animate
Two additional findings were required to reproduce it on modern deps:
- align('SRM') now supports n_iter (re-fits SRM on each pass's output;
inter-subject correlation plateaus ~0.87 = the data's shared-signal
ceiling, matching era behavior since the vendored SRM is unchanged
since v0.6.2)
- modern umap-learn's default n_neighbors=15 keeps neighborhoods
within-subject and DISPERSES the aligned bundle; n_neighbors=150 merges
same-timepoint rows across subjects and reproduces the tight looping
rope of the original. Recipe scripted in
scripts/generate_weights_trajectory.py; gif regenerated (900 frames,
30fps, tight rope verified against the reference render)
Also in this round:
- repeated-hyperalignment scale collapse fixed (procrustes' optimal
scaling < 1 shrinks data geometrically across passes; per-pass output
rescaling keeps norms stable through n_iter=50)
- single-call soft-cluster coloring:
hyp.plot(x, '.', markersize=2, reduce='UMAP',
cluster={'model': GaussianMixture, 'n_clusters': 10})
(dict accepts top-level n_clusters and model classes, in both cluster()
and plot(); colors flow from mixture proportions automatically)
- 30fps animation standard: plotly frame density raised to 30/s (cap
600); all tutorial gifs re-rendered at fps=30 with no conversion
downsampling (wikipedia 300 frames/10s/1 rotation; conversation 900
frames/30s/1 rotation; lorenz + hf spin 900 frames/30s); shapes morph
3510-frame mp4 + 30fps preview gif; matplotlib evidence gifs at 450
frames (plotly evidence gifs kept from the prior render -- kaleido
makes 450-frame exports impractically slow; noted)
- conversation demo rebuilt: 3-sentence windows WITHIN utterances (true
disconnection), repeating per-speaker colors, animate='serial',
rotations=1, no frame clipping
- 206 tests passing
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The n_neighbors=150 recipe over-globalized the UMAP embedding and flattened the trajectory into a near-straight line; download.png shows a tightly-bundled bundle with a dramatic loop. Swept n_neighbors against the reference: 15 disperses into a hairball, 150 flattens the loop, and 36 (min_dist=0.1) is the sweet spot -- one tight rope that keeps the loop. Verified this is a pure UMAP-neighborhood effect, not a dependency version: the modern SRM branch is byte-identical to v0.6.2, and era umap-learn 0.4.6 also hairballs the same aligned data at its default n_neighbors=15. Also switched the animation from a rolling window (animate=True + tail_duration) to animate='spin': the window only ever showed a ~4s fragment (a tangle), while spin draws the whole bundle and orbits it so the loop is visible from every angle. Tightened the gif encode (scale=340, 48-colour palette) to ~8MB, in line with the other animation gifs, still 900 frames at 30fps with no downsampling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Switch back from animate='spin' to the classic animate=True sliding window, per review. The critical property -- the space inside the cube is fixed for the whole animation, independent of which window is visible -- is guaranteed by the pipeline (helpers.scale normalizes once from the full stacked dataset and the window updater never touches axis limits) and verified programmatically: axis limits are identical across frames while the visible fragment's extent slides along the loop. (The earlier claim that window mode showed a static tangle was a frame- extraction bug in the verification harness -- PIL's ImageSequence yields one re-seeked image object, so materializing it with list() produced N copies of the final frame. Measured correctly, the window render rotates and the comet travels: mean inter-second frame motion 8.3, 0 clipped frames.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Streams are a first-class data type -- no flag. hyp.plot() detects Python iterators/generators and Hugging Face IterableDatasets (load_dataset(..., streaming=True)) from the structure of the input: - the first stream_init samples (default 10,000) ESTIMATE the normalization/reduction parameters; those fitted models are then APPLIED to every subsequent sample, which is added to the plot dynamically (the fit-on-head/transform-forever semantics from the issue thread) - stream_chunk (default 100) is the per-fetch batch size; each chunk renders as one live redraw / saved animation frame - stream_max (default None) streams continually; infinite streams render incoming data indefinitely, and Ctrl-C cleanly finalizes any save_path animation and returns the geometry - stream_window optionally shows only the trailing samples (comet style) while everything consumed stays on the geometry - reduction models must support transform() (IncrementalPCA default, PCA, UMAP; TSNE raises); align/cluster raise for streams (cluster planned) - dict rows: numeric fields concatenated in insertion order, strings ignored (use .select_columns() for control); datasets added to [dev] 14 real tests (tests/test_streaming.py) incl. an actual HF iris stream, interrupt finalization, and a fitted-on-head-only assertion. New executed tutorial docs/tutorials/streaming_data.ipynb with two streaming animations. Docs theme: pydata-sphinx-theme -> Furo with the ContextLab brand ported from ContextLab/scheduler (Nunito Sans, lowercase 300-weight headings with 0.6px letter-spacing, green #007030 / dark #4CAF50), screenshot-verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Round-4.5: the weights animation is solved, plus the requested animation & API fixes ✅The story-trajectories mystery — full diagnosisThe classic readthedocs animation could not be reproduced by any Two things still had to be fixed/understood to make that work on modern dependencies:
The animation is the classic sliding-window style ( Along the way, repeated hyperalignment got a real fix too: procrustes' optimal scaling is < 1 under noise, so repeated passes shrank the data geometrically (eventually crashing). Per-pass rescaling keeps norms stable through Single-call soft clustering (requested syntax)geo = hyp.plot(embeddings, '.', markersize=2, reduce='UMAP',
cluster={'model': GaussianMixture, 'n_clusters': 10})works verbatim: the cluster dict accepts top-level Conversation demo on the new
|
Round 5: streaming data (#101), streaming tutorial, and the ContextLab docs theme ✅Streaming data — closes the oldest open feature request (#101, 2017)Streams are now a first-class data type, exactly as specified in the issue thread: no streaming flag — from datasets import load_dataset
ds = load_dataset('scikit-learn/iris', split='train', streaming=True)
geo = hyp.plot(ds, '.') # streams straight in, nothing materializedSemantics (matching the issue design + review guidance):
Guardrails: reduction models must support 14 real tests ( New executed tutorial (docs/tutorials/streaming_data.ipynb) with two streamed animations: Docs theme: Furo with the ContextLab lookPorted from ContextLab/scheduler per review: stock Furo plus the lab's brand — Nunito Sans, lowercase 300-weight headings with 0.6 px letter-spacing, green 220 tests passing (213 fast set + 7 animation-export) · streaming tutorial executed with 0 errors · docs build clean. 🤖 Generated with Claude Code |
Gallery: sphinx-gallery previously executed only plot_*-named examples, leaving chemtrails/animate*/precog/explore/save_*/analyze pages with code but no rendered output. All examples now execute (filename_pattern), matplotlib animations render as embedded mp4 video (matplotlib_animations + sphinxcontrib-video; animation examples expose `ani = ani_geo.line_ani` for the scraper), save_* examples write to temp files, and every example page gets a branch-aware "Open in Colab" badge + .ipynb link (post_build.py) so gallery examples open as runnable notebooks. Dual-backend audit (scripts/audit_gallery_backends.py): every example runs under BOTH matplotlib and plotly in subprocesses; 78/78 pass (save_movie under plotly needs a long timeout -- kaleido per-frame mp4 export). The audit caught a real, years-latent bug: per-dataset fmt lists (['-','--']) routed each dataset through interp_array_list (plural), silently replacing 2D arrays with lists of per-row interpolations; latent because is_line() always returned False before its round-2 fix. Fixed with interp_array + regression test. Plotly parity for animation extras: chemtrails/precog/bullettime draw low-opacity trail traces on window animations, tail_duration sets the window length, and zoom moves the camera (r = 1.95*(9-zoom)/8, mirroring the matplotlib zoom semantics); previously none of these were forwarded to the plotly renderer. explore maps to plotly's native hover. Universal loader: hyp.load (and DataGeometry.plot/transform) resolve strings by trying, in order: built-in dataset name -> local file (npy/npz/csv/tsv/txt/json/parquet/mat/pickle) -> Hugging Face dataset (split=/streaming=; streaming feeds straight into hyp.plot) -> Google Drive URL or bare id -> Dropbox URL or shared-link path -> any URL with or without https://. Lists of strings return lists of datasets. Raw text (whitespace) still flows to the text-embedding pipeline. Also fixes df2mat for pandas>=2 (get_dummies bool dtype made mixed DataFrames produce object arrays that crashed np.isnan). 19 new tests (12 loader incl. real HF/Drive/Dropbox/URL fetches, 6 plotly trails, 1 interp regression); 232 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two-layer fix for example pages showing code but no output: 1. nbsphinx was claiming the gallery pages: sphinx-gallery writes a downloadable .ipynb next to each generated .rst, and nbsphinx rendered the UNEXECUTED notebook instead of the gallery page. auto_examples/*.ipynb is now excluded from the document build (downloads still work). 2. matplotlib animations render as embedded HTML5 video: the sphinxcontrib.video extension is registered (required by matplotlib_animations=(True, 'mp4')) and all animation example pages (chemtrails, precog, animate*, save_movie) now embed a playable 30s mp4, verified visually. Includes the regenerated gallery artifacts (all 39 examples executed, 9m39s total build execution). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…report Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…round 6.5) NEW ANIMATION STANDARD (both backends): frame_rate=30, duration=30s, rotations=1 -- one revolution every 30 seconds. Three layers had to change for plotly to actually match matplotlib's pacing: - library defaults (frame_rate 50->30, rotations 2->1) and the plotly frame math: n_frames = frame_rate * duration exactly like matplotlib (600-frame cap removed); parity test asserts identical 900 frames at ~33ms on both backends - DataGeometry.plot no longer replays animation-pacing kwargs baked into saved .geo files (old-era defaults like frame_rate=50 and rotations=2 silently overrode the current standard in every gallery example that calls geo.plot); explicit caller overrides still win - docs builds: plotly's sphinx-gallery renderer serialized every animation frame through kaleido for one static png (a 900-frame figure took ~an hour); the show path now writes a frame-stripped png plus an interactive html with embedded frames capped at 150, each shown proportionally longer -- total duration and rotation speed unchanged, pages ~0.1MB Streaming stability: the data->box transform is FROZEN from the head (the center+scale affine is captured once); every future sample goes through the same transform and out-of-range samples are clamped to the closest point on the box surface. Axis limits never change once set -- no more per-chunk rescale "twitch" (verified: zero vanishing ink across tutorial animation frames + exact-position regression tests). Legends (both backends): rendered to the RIGHT of the plot, vertically centered on the box (mpl bbox_to_anchor; plotly x=1.02/y=0.5 with a reserved right margin). Screenshot-verified in 2D and 3D. Gallery UX: thumbnail clicks were dead (sphinx-gallery >= 0.17 no longer wraps the thumb <img> in an anchor; the old gallery-fixes.js targeted extinct .xref markup) -- thumbnails now open the example's notebook on Colab while title text opens the example page. Animated thumbnails were squashed into 200x200 squares from 4:3 sources; all 7 regenerated letterboxed at the correct aspect from the new 30fps mp4s (scripts/generate_gallery_thumbs.py). Plotly evidence gifs re-rendered at the pacing standard. DataGeometry.plot/transform docstrings document the universal string-loading behavior for the API pages. 237 tests passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A kaleido subprocess wedged during test_animated_svg_plotly on one Windows runner and burned the full 6-hour Actions job timeout. No test legitimately takes over 20 minutes; a hung native call now fails fast with a stack dump instead of holding a runner hostage (thread method, since the hangs are inside native calls). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Round 6: gallery pages render everything, dual-backend audit, notebooks-on-click, universal loader ✅Gallery pages now display their outputs (including animations)
The chemtrails page with its playable 30s animation (screenshot): Clicking a gallery example gets you a runnable notebookEvery example page opens with an Open in Colab badge (branch-aware link to the example's committed notebook) next to the Every example verified on BOTH backendsNew audit harness (
Universal loader (
|
Round 6.5: animation pacing standard, streaming stability, legends, gallery UX ✅One animation standard, both backends: 30 fps · 30 s · 1 rotation per 30 s"Plotly is much too fast" had three stacked causes, each now fixed and regression-tested:
All gallery animation mp4s re-rendered at the standard (verified 900 frames @ 30 fps via ffprobe), and the plotly evidence gifs re-exported at correct pacing. Streaming: view is rock-stable, out-of-range samples clamp to the boxThe data→box transform is frozen from the initial samples (the center+scale affine is captured once); every future sample passes through that exact transform, and anything outside the box is clamped to the closest point on its surface. Axis limits never change once set. Verified: zero vanishing ink across the re-rendered tutorial animation (previously drawn pixels never move), plus regression tests for exact drawn-position stability and box-surface clamping: Legends: right of the plot, vertically centered (both backends)matplotlib | plotly: Gallery: thumbnails click through to notebooks, correct aspectThumbnail clicks were genuinely dead (sphinx-gallery ≥ 0.17 no longer wraps the image in a link, and the old fix-up JS targeted markup that no longer exists). Thumbnails now open the example's notebook on Colab; the title text under each thumbnail opens the example page with the rendered output. The animated thumbnails were also being squashed into squares from 4:3 sources — all seven are regenerated letterboxed at the correct aspect from the new 30 fps videos: API docs
(CI hardening: a kaleido subprocess wedged for 6 hours on one Windows runner before hitting the Actions timeout -- 237 tests passing · gallery animations verified at 900 frames / 30 fps · streaming tutorial re-executed with 0 errors. 🤖 Generated with Claude Code |
Every documentation notebook (39 gallery + 13 tutorials) now opens with a
branch-aware install cell so it runs standalone in Google Colab. On a
preview branch it installs that branch from GitHub
(`%pip install "hypertools[interactive] @ git+...@dev-2.0"`, verified in a
clean venv: imports as 2.0.0.dev0 with 2.0-only features working); on
master it installs the released package. scripts/add_colab_install_cell.py
injects the line idempotently into the hand-authored tutorial notebooks,
and conf.py's first_notebook_cell emits the same line for gallery
notebooks on rebuild.
Gallery examples:
- shapes zoo: plots EVERY zoo shape (bunny, cube, dragon, sphere, teapot,
vase, biplane) as small black dots (',' pixel marker), one panel each
- datasaurus: plots ALL THIRTEEN datasets of the dozen as small black
dots ('.' point marker), one panel each
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
















HyperTools 2.0: modernized toolbox with interactive backend, soft clustering, and comprehensive visual verification
This PR modernizes hypertools end-to-end while preserving the public API, integrating the best ideas from the earlier refactor attempt (jeremymanning/hypertools
devbranch + backend experiments) into the current, tested codebase. Full design rationale:notes/hypertools_2.0_roadmap.md.Highlights
Interactive plotly backend with visual parity (
backend='auto' | 'matplotlib' | 'plotly')backend='auto'(the default) switches to plotly only on Google Colab and Kaggle. Existing local/CI workflows see zero change.docs/images/v2.0-parity/— matplotlib left, plotly right, same call.Mixture-model ("soft") clustering + robust coloring
hyp.cluster(x, cluster='GaussianMixture' | 'BayesianGaussianMixture' | 'LatentDirichletAllocation' | 'NMF')returns(n_samples, n_components)membership proportions (rows sum to 1). Hard clustering unchanged.hyp.plot(x, cluster='GaussianMixture', ...)colors observations by proportion-weighted blends of component colors.hueaccepts categorical labels, continuous values, or any 2D matrix via the newmat2colors.Multicolored lines
hue+ a line format string colors each trajectory continuously along its length on both backends (matplotlibLine3DCollection/LineCollection; plotly per-point line colors in 3D, segment traces in 2D).Nested-list input with multilevel styling
hyp.plot([[a, b], [c]])colors datasets by outermost group; deeper nesting renders thinner + fainter. Text corpora keep existing behavior.hyp.apply_model: the stack/unstack coremode='auto'|fit_transform|fit_predict|predict_proba,return_model=Truefor held-out reuse,stack=Falsefor per-dataset fits. Explicit whitelist registry (noeval).Retired legacy arguments (long-deprecated)
plot(group=...)→hue;plot(model=/model_params=)→reduce;reduce(model=/model_params=/normalize=/align=);align(method=/normalize=/ndims=)and the ambiguousalign=True(now a clear ValueError with a migration hint);cluster(ndims=). Saved geos from hypertools 0.x still load — retired kwargs are translated (group→hue) or dropped with a warning on replay.Bug fixes
str()-keys truncated numpy arrays so new data collided with stale entries. Cache removed; regression test reproduces the reported loop scenario.animate=Trueunder numpy≥2 — regression test reproduces the exact array from the issue report; also fixed the Colab/ipympl backend crash (matplotlib ≥3.9 raisesValueError, notImportError, for brokenmodule://backends) and a broken-Tcl/Tk fallback.is_line()bug:''inLine2D.markersmade it return False for every format string, silently disabling hypertools' smooth line interpolation on modern matplotlib. Fixed (with linestyle-aware parsing), restoring the intended rendering; per-point labels are now re-mapped onto interpolated trajectories.format_data/PPCA pass per plot.Performance, packaging, docs
import hypertools: 5.1s → 1.4s (lazy umap/seaborn/scipy.interpolate).pyproject.toml(2.0.0.dev0, Python 3.10–3.13);setup.py/requirements.txt/MANIFEST.in/.travis.ymlremoved; extras[interactive],[dev]; CI matrix 3 OS × py3.10–3.13 with bumped actions + screenshot artifacts; readthedocs on py3.11; externalhdbscan→ sklearn's built-in; unmaintainedpca-magicdropped.apply_modeladded to the API reference; docstrings updated for all changed signatures.Evidence: every function verified on both backends
plot,reduce,align,normalize,cluster,analyze,describe,format_data,load,apply_model, text input) across use cases on both backends: line/marker/dash styles, hue variants, clustering (hard + mixture), nested lists, multicolored lines, missing data, animations. Regenerate:python scripts/generate_verification_screenshots.py.python scripts/generate_parity_screenshots.py.dev/hypertools_2.0_dev_executed.ipynb.Sample parity montages (matplotlib | plotly — same call)
Breaking changes
TypeError/ValueErrorwith migration hints (old saved geos are translated on replay).🤖 Generated with Claude Code