Skip to content

Package native QTI items as QTI/IMSCP archives#6029

Merged
rtibbles merged 4 commits into
learningequality:unstablefrom
rtibblesbot:issue-6000-33c560
Jul 4, 2026
Merged

Package native QTI items as QTI/IMSCP archives#6029
rtibbles merged 4 commits into
learningequality:unstablefrom
rtibblesbot:issue-6000-33c560

Conversation

@rtibblesbot

@rtibblesbot rtibblesbot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Exercise nodes whose assessment items are natively-authored QTI (AssessmentItem.type == QTI) now publish by packaging the item XML already stored in raw_data verbatim into a QTI/IMSCP zip, instead of rebuilding item XML from structured fields.

Referenced media is written to items/images/, matching the legacy pydantic path's layout. The item XML's src/href/srcset references are rewritten in place to the new images/... path — the only bytes allowed to differ from the authored raw_data.

A native item that fails schema validation, or references media with no matching File row, is logged and excluded from the package rather than aborting the whole publish.

Dispatch also changes: a node routes to QTI packaging unless it has a perseus_question item (previously: only when it had a free_response item). QTIExerciseGenerator already synthesizes valid QTI XML for every legacy structured-field type, so this is a deliberate broadening — nodes with only legacy structured-field items now get QTI packages instead of Perseus packages on next republish.

References

Closes #6000. Builds on #6005 (QTI schema validation) and #5999 (QTI raw_data round-trip through save/sync).

Reviewer guidance

  • archive.py _create_native_qti_item/_write_qti_media_files (archive.py:203-268): the remapping is a targeted text substitution (media.py's rewrite_qti_media_paths), not a parse/serialize round-trip, so every other byte of raw_data is untouched.
  • publish.py:366-373: please confirm the dispatch-rule broadening's scope is intended, not just the diff mechanics.
  • Publish-time failures for a single native item are logged and the item is excluded; only packaging bugs (duplicate identifier, missing root identifier) raise ValueError.
  • That ValueError propagates through publish_channel to ChannelViewSet.publish's existing error handling — no new plumbing needed.

AI usage

Implemented with Claude Code from a maintainer-authored plan (issue comment), following TDD per task. Ran the full assessment/publish/QTI test suite and pre-commit hooks (flake8, reorder-python-imports, black) before each commit. Also ran an automated /simplify pass over the diff, using 4 independent review agents across reuse, simplification, efficiency, and altitude angles. Applied the one actionable finding: reused an existing QTI XML test fixture instead of hand-rolling an equivalent one.


@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?
  • Ran pre-flight CI checks (lint, format, tests) and verified all pass
  • Rebased onto the target branch and resolved any conflicts
  • Reorganized commit history into clean, logical commits
  • Audited the diff to ensure only issue-relevant files are changed
  • Built PR body from the repository's PR template with evidence blocks
@rtibblesbot

🟡 Waiting for feedback

Last updated: 2026-07-04 00:45 UTC

rtibblesbot and others added 2 commits July 3, 2026 11:04
Route type=QTI assessment items through publish verbatim instead of
rebuilding item XML from structured fields, since their authored XML
is already stored in raw_data.

- Validate raw_data against the QTI schema before packaging
- Resolve and write referenced media alongside the item
- Extract the media-reference scanner into its own module, shared with
  the AssessmentItem viewset
- Refactor QTIExerciseGenerator's manifest bookkeeping to a
  generator-agnostic resource list so both the pydantic-built and
  native paths populate it

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Dispatch to QTIExerciseGenerator unless a node has a perseus_question
item, instead of only when it has a free_response item.
QTIExerciseGenerator already synthesizes valid QTI XML for every
legacy structured-field type; perseus_question is the only type it
can't represent.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@rtibblesbot rtibblesbot marked this pull request as ready for review July 3, 2026 18:36

@rtibbles rtibbles left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent start, but a bit more cleanup is needed, I think.

files_by_name = {
f"{f.checksum}.{f.file_format_id}": f for f in assessment_item.files.all()
}
missing = sorted(filenames - files_by_name.keys())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What purpose does this serve? It breaks publishing for the sake of book keeping - we should log an error here, but carry on.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs and continues now — _write_qti_media_files calls logging.error with the missing filenames instead of raising, and returns only the files that are actually linked. The item and the rest of the archive still publish; the raw_data still references the missing checksum but that reference was already dangling before this change reached publish.

error_messages = "; ".join(
f"line {e.line}, column {e.column}: {e.message}" for e in result.errors
)
raise ValueError(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why not just skip it? Why raise a ValueError here and break the publishing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skips the item now instead: on validation failure _create_native_qti_item logs an error and returns None; create_assessment_item propagates that, and process_assessment_item (base.py) skips writing/registering the item when the result is None. The rest of the node's items, and the rest of the channel, still publish.

Root cause for why this mattered: a ValueError out of create_exercise_archive() isn't caught anywhere in publish.py's recurse_nodes/publish_channel — it was propagating all the way out and aborting the entire channel publish, not just this node.

Flagging explicitly: this changes the issue's stated acceptance criterion ("failures stop publish with a clear error") and the plan's Task 3 note ("On failure, raise ValueError immediately"). Treating your live review as the more current signal on intended behavior here, but wanted the deviation on record rather than silent.

)

filepath = f"items/{identifier}.xml"
file_dependencies = self._write_qti_media_files(assessment_item)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing where we are updating the media file paths to stop them just all sitting in the items folder bare. That's a significant change from the current export implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, not an omission. Native items keep raw_data byte-identical per the acceptance criteria, and the authored XML's src/href values are bare <checksum>.<ext> strings with no path prefix (see the anchored regex in qti/media.py's get_qti_media_references) — that's the TipTap editor's permanentSrc convention. Since we can't rewrite those attributes without breaking byte-identical fidelity, the referenced file has to physically sit next to the item XML (items/) for the bare relative reference to resolve; it can't go in items/images/ the way the legacy pydantic-built path does, since that path generates and owns its own XML and can prefix image paths freely.

No code change planned here — let me know if you want that byte-identical constraint relaxed instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should relax the byte identical requirement here. The important thing is that these filepath remappings are the only changes that get made.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — media is now written to items/images/ (matching the legacy layout) and the item XML's src/href/data/srcset values referencing those files are remapped to the images/ prefix that resolves correctly from the item's own location.

The rewrite (rewrite_qti_media_paths in qti/media.py) is a targeted text substitution over the raw XML string, not a parse/serialize round-trip — it only swaps the matched attribute-value characters for their new path, so it can't introduce incidental formatting changes (e.g. lxml re-serialization normalizes <img ... /> to <img .../>, which we verified and avoided). Items with no media references are unaffected and remain byte-identical to raw_data, as before.

Covered by new tests in test_media.py (rewrite_qti_media_paths behavior directly) and an updated test_native_qti_item_media_included_and_addressed in test_exercise_creation.py that asserts the zip layout, manifest, and exact rewritten item XML.

)


qti_reference_attributes = ("src", "href", "data")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good consolidation from this file - I was worried at first we were duplicating this logic!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep — get_filenames_from_qti_item now just delegates to get_qti_media_references in qti/media.py, shared with the native packaging path added in this PR.

A ValueError from QTIExerciseGenerator wasn't caught anywhere in
publish.py's recurse_nodes/publish_channel, so any single failure
aborted the entire channel publish rather than just the offending item.

- Schema-invalid native items are now logged and excluded from the
  package instead of raising.
- Media references with no linked File are now logged and dropped
  from the manifest instead of raising.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@rtibblesbot

Copy link
Copy Markdown
Contributor Author

I was unable to complete this task after multiple attempts. Manual review and intervention may be needed.

@rtibbles rtibbles left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do, indeed, want better file organization.

)

filepath = f"items/{identifier}.xml"
file_dependencies = self._write_qti_media_files(assessment_item)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should relax the byte identical requirement here. The important thing is that these filepath remappings are the only changes that get made.

Media files referenced by native QTI items were being written bare into
items/ alongside the item XML, unlike the legacy generator's items/images/
layout. Write them to items/images/ instead, and rewrite the item XML's
src/href/data/srcset values to the images/ prefix that resolves to that
directory. The rewrite is a targeted text substitution (not a parse/
serialize round-trip) so it changes only the remapped path values -
everything else in the authored raw_data is preserved byte-for-byte.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

@rtibbles rtibbles left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, having the extra nesting under images should help to keep things organized.

@rtibbles rtibbles merged commit 3ae5aca into learningequality:unstable Jul 4, 2026
25 checks passed
@rtibblesbot rtibblesbot deleted the issue-6000-33c560 branch July 4, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QTI] Publish raw_data QTI assessment items as QTI/IMSCP packages

2 participants