Fix large GEDCOM files failing to load or freezing the browser#296
Conversation
…M store
Firefox caps history.pushState state at 640KB; Safari at ~512KB. The upload
handler was passing the raw GEDCOM string through navigate(..., {state}),
which calls history.pushState. A 10MB GEDCOM caused an immediate SecurityError
in Firefox and Safari, preventing the chart from ever loading.
Add a bounded in-memory store (gedcom_store.ts) capped to 1 entry — uploading
a new file evicts the previous one, preventing unbounded memory growth during
a session. navigate() now carries only a file fingerprint hash in the URL;
loadGedcom() retrieves the GEDCOM from the store by hash.
Update embedded.ts to store its injected GEDCOM under the key 'embedded'
before calling loadGedcom(), keeping the data path consistent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…B sample hash The upload handler called md5() over the full GEDCOM string before navigating — ~240ms for a 10MB file, blocking the main thread before the user saw any feedback. Replace with a hash over file.name|file.size|file.lastModified plus the first and last 4KB of content. Runtime drops to ~1ms. The 4KB sample captures the GEDCOM header (software name, export date, submitter info) which differs between any two real exports, providing strong collision resistance for a local session cache without a cryptographic guarantee. Extract fileFingerprint to src/util/file_fingerprint.ts so it can be unit tested independently of the React component. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gress callback convertGedcom was synchronous, blocking the main thread for the full parse duration (~1.2s parseGedcom + ~700ms gedcomEntriesToJson in Safari). The browser spinner could not update and GC could not run between steps. Make convertGedcom async and insert await setTimeout(0) yields between the four major steps (parse, convert, normalize, index). This breaks the work into chunks, allows incremental GC, and keeps the spinner responsive. Add onProgress?(status: string) => void to convertGedcom, prepareData, loadGedcom, loadFromUrl, loadAndPrepareFile, and the DataSource interface. All data source implementations (UploadedDataSource, GedcomUrlDataSource, GoogleDriveDataSource) forward the callback down the load pipeline. WikiTree and Embedded accept the parameter but do not use it. Also skip the sessionStorage JSON.stringify for GEDCOM files over 512KB. The serialized TopolaData (chartData 7.85MB + raw parse-gedcom tree 25MB = 33MB) always exceeded the 5MB sessionStorage limit, wasting 10-20s in Safari on a write that was guaranteed to fail. Simplify loadGedcom to (hash, onProgress?) now that the GEDCOM string is always retrieved from the in-memory store rather than passed as a parameter. Remove console.log timing instrumentation and window.__topolaDebug globals that were left in from debugging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rough D3 layout
Two related issues caused the chart to freeze or appear unresponsive:
1. The Chart useEffect has no dependency array, so it ran after every React
render. Each call re-created a JsonDataProvider from all 23K individuals,
re-ran the ancestor/descendant layout, and called getComputedTextLength for
every visible text element (~500ms per call in Chrome, longer in Safari).
For unrelated state changes this froze the browser 3-5 extra times per load.
Fix: return early in renderChart() when neither initialRender nor
resetPosition is set — the SVG is already correct.
2. The "Loading..." progress pill was mounted only in AppState.LOADING. When
data finished parsing, React transitioned to SHOWING_CHART and unmounted the
pill before the D3 layout (the slow part) had run. Users saw a blank screen
with no feedback for several seconds.
Fix: replace the state-gated pill with a persistent fixed overlay driven by
a loadingStatus string, visible through both LOADING and the initial chart
render. Add onFirstRender() callback to Chart; App clears loadingStatus when
the first D3 layout completes.
Wire the onProgress callback from the load pipeline into setLoadingStatus so
users see step-by-step progress ("Step 1/4: parsing GEDCOM..." through
"Rendering chart (23,909 people)...").
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Building the full-text search index over 23K individuals takes ~600ms synchronously. This was deferred to the first keystroke, freezing the input field for up to 3 seconds in Safari with no user feedback. Schedule the build via requestIdleCallback (5s timeout) after data loads, with a 200ms setTimeout fallback for browsers without requestIdleCallback. The index is typically warm before the user types. getOrBuildIndex() still builds synchronously on the first keystroke if idle scheduling has not yet fired (e.g. user types immediately after load). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ingerprint
gedcom_store.spec.ts
- Returns undefined for unknown hash
- Stores and retrieves GEDCOM by hash
- Evicts the previous entry when a new file is stored (bounded-to-1 behavior)
- Returns undefined for a hash that was evicted
load_data_store.spec.ts
- loadGedcom throws ERROR_LOADING_UPLOADED_FILE when hash is absent from store
- loadGedcom successfully parses GEDCOM retrieved from the in-memory store
- onProgress callback receives all four step messages in the correct order
upload_menu.spec.ts (via src/util/file_fingerprint.ts)
- Returns a valid 32-character hex hash
- Produces identical hashes for identical inputs
- Differs for different filename, file size, content beyond 4KB boundary,
and image file list
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thank your for all the fixes! Do you think you could ask Claude to split the PR into multiple smaller ones? Like one per fix? It will make reviewing the changes easier. |
|
In the Donatso chart, the "Rendering chart" pill doesn't disappear. Also makes |
|
I reviewed the code. You don't need to split it (unless you already did 😄 ) |
The progress pill persisted indefinitely on the Donatso chart because
DonatsoChart did not call onFirstRender — only the D3-based Chart did.
Add onFirstRender? to DonatsoChartProps and call it from the existing
useEffect on the first render, consistent with how Chart handles it.
Pass onFirstRender={() => setLoadingStatus('')} from renderChart in app.tsx.
Update visual regression snapshots for macOS (darwin) to reflect the new
progress pill and its correct dismissal across all chart types. The 12
non-Donatso failures were pre-existing snapshot staleness; chart-donatso
was the one newly introduced by this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fixed the Donatso pill issue — added |
The _onProgress parameter accepted by loadData was never forwarded to the onMessage event handler, silently dropping all step-progress callbacks for embedded GEDCOM loads. The progress pill stayed at 'Loading…' for the full parse duration instead of showing the four-step breakdown. Thread onProgress through the event-listener closure by adding it as a parameter to onMessage and passing it to loadGedcom.
Jest caches module instances, so the module-level let variables persisted
across tests. Running with --randomize or in parallel could cause test 1
('returns undefined for unknown hash') to see a hash left by a previous test.
Export clearStoredGedcom() for testing and call it in beforeEach so each test
starts from a clean slate regardless of execution order.
onFirstRender fires when the chart SVG is in the DOM, before the D3 fade-in animation completes (~400ms). A reviewer noted this as a potential issue, but moving it into animationPromise.then() would keep the progress pill visible for 400ms after the chart is already visible and interactive, creating a confusing overlap. The current timing is correct. Add a comment explaining the design intent.
Both call sites in the loadData callback omitted the onProgress parameter that all other sources (UPLOADED, GEDCOM_URL, GOOGLE_DRIVE) already forward. WikiTree's implementation ignores it today (_onProgress), but wiring it here ensures parity and makes future progress reporting possible without touching app.tsx again. Embedded now receives it and threads it to loadGedcom.
The shared 'let handle' variable was typed as a union of setTimeout and requestIdleCallback return types, requiring unsafe casts (handle as number, handle as ReturnType<typeof setTimeout>) in the cleanup closures. Each cleanup closure captured the shared variable by reference rather than value. Use a const declaration in each branch so TypeScript infers the correct type automatically and no casts are needed. The early-return form also makes the two branches structurally symmetric.
The listener registered in loadData was never removed, causing it to accumulate on every loadData call. In a multi-file workflow the second load would have two listeners firing on the same message, the first with a stale resolve/reject pair. Wrap resolve and reject so the listener removes itself on first settlement, making the registration effectively once-per-load.
|
Fixed — added While fixing this I also addressed a few issues surfaced during code review:
All 89 unit tests and 13 visual tests pass. |
|
Thank you! |
Summary
Large GEDCOM files (10K+ individuals) either failed silently in Firefox/Safari or froze Chrome for 10–20+ seconds. This PR fixes five bugs in the viewer. The underlying chart rendering bug (exponential pedigree traversal) is addressed separately in PeWu/topola#110.
Bugs fixed
1. Firefox and Safari reject the uploaded file immediately
navigate(..., {state: {data: gedcom}})passes the GEDCOM throughhistory.pushState. Firefox caps this at 640KB; Safari at ~512KB. A 10MB GEDCOM caused an immediateSecurityError— the chart never loaded in either browser.Fix: Store the GEDCOM in a bounded in-memory store (
gedcom_store.ts, capped to 1 entry — new upload evicts the previous).navigate()carries only a content fingerprint hash.2.
JSON.stringifyblocks the browser for 10–20 secondsprepareDataserialized the fullTopolaData(33MB total) to sessionStorage on every load — always failing the 5MB limit.Fix: Skip serialization when the raw GEDCOM exceeds 512KB.
3. Chart re-renders 3–5 times per load, each freezing the browser
The
ChartuseEffecthas no dependency array, so it ran after every React render. Each call re-ran the full D3 layout.Fix: Return early in
renderChart()when neitherinitialRendernorresetPositionis set.4. Progress indicator disappears before the chart renders
The "Loading…" pill was shown only during
AppState.LOADING, disappearing before the D3 layout ran.Fix: Persistent fixed overlay driven by
loadingStatusstate, cleared viaonFirstRendercallback after the chart DOM is ready (intentionally pre-animation — the chart is interactive at this point, and keeping the pill during the 400ms fade-in would create a confusing overlap).5. Donatso chart: progress pill never cleared
DonatsoChartdid not callonFirstRender, soloadingStatuswas never cleared.Fix: Add
onFirstRender?toDonatsoChartPropsand call it in the first-render branch of DonatsoChart'suseEffect.Additional improvements
convertGedcomis async withawait setTimeout(0)yields between the four major steps, keeping the spinner responsive and allowing incremental GC. AnonProgresscallback threads through the full load pipeline (all five datasources: uploaded, URL, Google Drive, WikiTree, embedded) showing step-by-step status.requestIdleCallbackafter data loads. Event listener for idle callbacks usesconst-scoped handles to eliminate type casts.md5()(~240ms for 10MB) with metadata + 4KB content sample (~1ms), extracted tosrc/util/file_fingerprint.ts.window.addEventListenerinEmbeddedDataSource.loadDatawas never removed, accumulating listeners across calls. Now removed on first settlement.gedcom_store.spec.tsaddsbeforeEach(clearStoredGedcom)to prevent Jest module cache from causing order-dependent test failures.Tests
All passing: 89 unit tests, 13 visual regression tests (macOS snapshots updated to include new progress overlay).
Dependency note
The root-cause chart rendering fix (exponential pedigree traversal in
topola) is in PeWu/topola#110. This viewer PR is independently mergeable.Co-authored-by: Claude Sonnet 4.6 noreply@anthropic.com