Skip to content

feat(page-number): per-field PAGE value-format switches & case-insensitive field dispatch (SD-3006)#3599

Open
luccas-harbour wants to merge 11 commits into
mainfrom
luccas/sd-3006-feature-page-number-fields
Open

feat(page-number): per-field PAGE value-format switches & case-insensitive field dispatch (SD-3006)#3599
luccas-harbour wants to merge 11 commits into
mainfrom
luccas/sd-3006-feature-page-number-fields

Conversation

@luccas-harbour
Copy link
Copy Markdown
Contributor

Summary

Two related fidelity fixes for PAGE fields in headers/footers, plus a refactor that makes formatPageNumber the single source of truth.

  1. Case-insensitive field dispatch. OOXML field type names are case-insensitive, but the field-reference preprocessors dispatched on the raw first token ("PAGE" only, not "page"). A lowercase PAGE/NUMPAGES field in a repeated footer fell through to cached static text and showed the same number on every page.

  2. PAGE value-format switches. The \* value-format switches on a PAGE instruction (Arabic, Roman/roman, ALPHABETIC/alphabetic, ArabicDash) are now parsed into a run-local pageNumberFormat override and applied independently of section numbering. Previously a { PAGE \* roman } footer ignored its own switch and rendered using the section's format.

Changes

Field dispatch (super-converter)

  • New extractFieldKeyword helper normalizes the dispatch token to upper case while leaving the original instruction text intact for downstream processors.
  • Routed fldSimple/fldChar dispatch and the header/footer page-field scan (preProcessPageFieldsOnly) through it.
  • Made the HYPERLINK target regex case-insensitive and anchored.

Value-format switches

  • New page-instruction.js: parsePageInstruction (parse \* switches → pageNumberFormat) and pageNumberFormatToInstructionSwitch (inverse, for export).
  • page-preprocessor stores the original instruction and parsed pageNumberFormat on the sd:autoPageNumber node.
  • Round-trips instruction + pageNumberFormat through the autoPageNumber translator (preserving imported instruction text, synthesizing a PAGE \* <switch> for new formatted nodes) and the page-number extension node.

Layout pipeline

  • Added pageNumberFormat to the TextRun contract and threaded it through the v1 layout-adapter (text-run / generic-token converters), layout-bridge, layout-resolved, and the DOM painter's resolveRunText.
  • Stamped a section-aware displayNumber (pre-format numeric value) on Page / HeaderFooterPage / resolved pages, plumbed via pageResolverresolveHeaderFooterTokens → render context, so the format applies to the correct numeric value.
  • Included pageNumberFormat in cache keys: block-version, versionSignature, run merge-hash, and header/footer content hash — format changes now invalidate cached layouts.

Refactor

  • Moved formatPageNumber + PageNumberFormat into @superdoc/contracts as the single source of truth; pageNumbering re-exports them.

Behavior changes

  • upperLetter/lowerLetter now render as repeated letters (AA, BB, CC) to match Word, instead of the previous Excel-style sequence (AA, AB).
  • ArabicDash renders as - N - (with spacing) and unknown formats fall back to decimal.

Tests

  • Unit coverage: field-keyword, page-instruction/page-preprocessor, preProcessNodesForFldChar, preProcessPageFieldsOnly, autoPageNumber-translator, resolvePageTokens, resolveHeaderFooterTokens, cacheInvalidation, painter text-run, and the layout-adapter token converters.
  • Behavior specs: lowercase PAGE footer resolves per page (footer-page-keyword-case.spec.ts) and formatted footer page fields, with new shared story-fixtures helpers.

OOXML field type names are case-insensitive, but the field-reference
preprocessors dispatched on the raw first token (e.g. only "PAGE",
not "page"). A lowercase PAGE/NUMPAGES field in a repeated footer fell
through to the cached static text and showed the same number on every
page.

Add a shared extractFieldKeyword helper that normalizes the dispatch
token to upper case while leaving the original instruction text intact
for downstream processors, and route fldSimple/fldChar dispatch and the
header/footer page-field scan through it. Make the HYPERLINK target
regex case-insensitive and anchored. Cover the new behavior with unit
tests and a behavior spec asserting a lowercase PAGE footer resolves
per page.
Parse the `\*` value-format switches on PAGE field instructions (Arabic,
Roman/roman, ALPHABETIC/alphabetic, ArabicDash) into a run-local
pageNumberFormat override, and apply it independently of section numbering
when resolving page-number tokens.

- add parsePageInstruction / pageNumberFormatToInstructionSwitch in a new
  page-instruction.js; page-preprocessor stores the original instruction and
  parsed format on sd:autoPageNumber
- round-trip instruction + pageNumberFormat through the autoPageNumber
  translator and the page-number extension node (preserve imported
  instruction text, synthesize a switch for new formatted nodes)
- add pageNumberFormat to TextRun and thread it through layout-bridge,
  layout-resolved, painters (resolveRunText), and stamp section-aware
  displayNumber on pages so formatting uses the pre-format numeric value
- move formatPageNumber + PageNumberFormat into @superdoc/contracts as the
  single source of truth; re-export from pageNumbering
- include pageNumberFormat in block-version, merge, and hash signatures so
  format changes invalidate cached layouts

upperLetter/lowerLetter now render as repeated letters (AA, BB, CC) to match
Word instead of the previous Excel-style sequence (AA, AB).
@luccas-harbour luccas-harbour requested a review from a team as a code owner June 1, 2026 19:46
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Jun 1, 2026

SD-3006

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

I tried to verify against the ECMA-376 MCP server, but the spec tool calls were blocked (permission not granted in this environment) after several attempts. I completed the review against ECMA-376 §17.16.4.1 (General-Formatting-Switch) from reference knowledge instead — flagging that so you can re-run the MCP check if you want the live citation.

Status: PASS

The two changed handler files (autoPageNumber-translator.js + test) are OOXML-compliant.

What I checked on the OOXML-facing side (the decode path, which is what actually emits XML):

  • Field structure — emits a standard complex field: w:fldChar (begin) → w:instrTextw:fldChar (separate) → w:fldChar (end), with xml:space="preserve" on the w:instrText. All valid per the field grammar. The missing cached-result run between separate and end is pre-existing and legal (the result text is optional). ✓
  • PAGE field + \* switch — the instruction text is freeform w:instrText content (not schema-constrained), and the synthesized switches map correctly: Arabic/Roman/roman/ALPHABETIC/alphabetic match the ECMA-376 general-formatting-switch names and casing exactly (uppercase Roman/ALPHABETIC → upper variants, lowercase → lower variants). ✓ — https://ooxml.dev/spec?q=general-formatting-switch
  • Round-tripencode preserves the original instruction verbatim (e.g. PAGE \* Roman \* MERGEFORMAT) and decode re-emits it, falling back to synthesizing PAGE \* <switch> only for new nodes. No format intent is baked into resolved text. ✓

One thing worth noting (not a failure):

  • ArabicDash (page-instruction.js:7, drives the translator's numberInDash output) — this switch is not in the ECMA-376 §17.16.4.1 enumerated \* format list; it's a Microsoft Word extension. It's interoperable (Word reads/writes it) and w:instrText is plain text, so it's not a schema violation — but it's the one value here that isn't spec-documented. Fine to keep for Word parity; just flagging it isn't backed by the standard.

No non-existent elements/attributes, no missing required attributes, and no incorrect defaults in the changed handler.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63d72a5e31

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

3 issues found across 43 files

Linked issue analysis

Linked issue: SD-3006: Feature: Page Number Fields

Status Acceptance criteria Notes
Visible PAGE output matches Word in representative corpus docs The PR centralizes page formatting (formatPageNumber), threads run-local pageNumberFormat through the layout + painter, stamps section-aware displayNumber, and includes unit + behavior tests validating formatted output in headers/footers and token resolution.
⚠️ Header/footer page numbers render in the correct place and format The PR fixes per-page formatting and per-field format overrides (format applied at render/measurement time) and adds tests that validate the formatted content in footers. However, there is no clear change or test specifically asserting layout placement (geometry) adjustments: the work targets correct content/format and caching invalidation rather than explicit placement adjustments, so placement aspect remains unverified by the diff.
Section-aware page number behavior remains stable after pagination settles The PR plumbs a section-aware numeric value (displayNumber) through pageResolver → header/footer resolution → painter; includes changes to incrementalLayout/pageResolver, layoutHeaderFooter, resolved-layout, and tests that exercise per-page resolution and caching invalidation ensuring section-aware formatting is applied to the correct numeric value.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/layout-engine/painters/dom/src/renderer.ts
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Only dispatch the SEQ pre-processor for uppercase SEQ instructions so
lowercase `seq` fields keep their cached visible result runs instead of
being re-resolved. Also recurse into run-wrapped content when extracting
resolved text so cached numbers nested inside runs are captured.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 11 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread packages/layout-engine/layout-bridge/src/layoutHeaderFooter.ts
Comment thread packages/layout-engine/painters/dom/src/renderer.ts
@luccas-harbour
Copy link
Copy Markdown
Contributor Author

luccas-harbour commented Jun 1, 2026

cubic analysis

3 issues found across 43 files

Linked issue analysis

Linked issue: SD-3006: Feature: Page Number Fields

Status Acceptance criteria Notes
Visible PAGE output matches Word in representative corpus docs The PR centralizes page formatting (formatPageNumber), threads run-local pageNumberFormat through the layout + painter, stamps section-aware displayNumber, and includes unit + behavior tests validating formatted output in headers/footers and token resolution.
⚠️ Header/footer page numbers render in the correct place and format The PR fixes per-page formatting and per-field format overrides (format applied at render/measurement time) and adds tests that validate the formatted content in footers. However, there is no clear change or test specifically asserting layout placement (geometry) adjustments: the work targets correct content/format and caching invalidation rather than explicit placement adjustments, so placement aspect remains unverified by the diff.
Section-aware page number behavior remains stable after pagination settles The PR plumbs a section-aware numeric value (displayNumber) through pageResolver → header/footer resolution → painter; includes changes to incrementalLayout/pageResolver, layoutHeaderFooter, resolved-layout, and tests that exercise per-page resolution and caching invalidation ensuring section-aware formatting is applied to the correct numeric value.
Reply with feedback, questions, or to request a fix.Fix all with cubic | Re-trigger cubic

According to ECMA-376, the PAGE instruction controls what value is displayed, not where it is placed.

Placement comes from normal WordprocessingML layout:

  • Header/footer story selection: w:headerReference / w:footerReference in w:sectPr choose which header/footer applies to first/even/default pages.
  • Header/footer region geometry: w:pgMar controls header/footer offsets and page text extents, especially w:header, w:footer, w:left, and w:right (§17.6.11).
  • Paragraph layout inside that story: page number fields are inline content in paragraphs, so placement is controlled by paragraph properties like w:jc alignment (§17.3.1.13), indentation, styles, and tab stops.
  • Tabs: common Word output uses tabs/custom tab stops to place header/footer content left/center/right. w:tab run content advances to paragraph tab stops (§17.3.3.32), and w:tabs/w:tab define positions and alignment (§17.3.1.37-38, §17.18.84).

w:pgNumType controls page-number formatting/counting for the section: start number, numeric format, chapter separator/style (§17.6.12). It does not place the number.

So for SuperDoc: render PAGE as inline field text in the header/footer layout. Its x/y placement should fall out of header/footer geometry plus paragraph alignment/tabs/styles, not from special positioning logic attached to the PAGE instruction itself. Therefore this falls outside the scope of this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants