Package native QTI items as QTI/IMSCP archives by rtibblesbot · Pull Request #6029 · learningequality/studio

rtibblesbot · 2026-07-03T18:11:32Z

Summary

Exercise nodes whose assessment items are natively-authored QTI (AssessmentItem.type == QTI) now publish by packaging the item XML already stored in raw_data verbatim into a QTI/IMSCP zip, instead of rebuilding item XML from structured fields.

Referenced media is written to items/images/, matching the legacy pydantic path's layout. The item XML's src/href/srcset references are rewritten in place to the new images/... path — the only bytes allowed to differ from the authored raw_data.

A native item that fails schema validation, or references media with no matching File row, is logged and excluded from the package rather than aborting the whole publish.

Dispatch also changes: a node routes to QTI packaging unless it has a perseus_question item (previously: only when it had a free_response item). QTIExerciseGenerator already synthesizes valid QTI XML for every legacy structured-field type, so this is a deliberate broadening — nodes with only legacy structured-field items now get QTI packages instead of Perseus packages on next republish.

References

Closes #6000. Builds on #6005 (QTI schema validation) and #5999 (QTI raw_data round-trip through save/sync).

Reviewer guidance

archive.py _create_native_qti_item/_write_qti_media_files (archive.py:203-268): the remapping is a targeted text substitution (media.py's rewrite_qti_media_paths), not a parse/serialize round-trip, so every other byte of raw_data is untouched.
publish.py:366-373: please confirm the dispatch-rule broadening's scope is intended, not just the diff mechanics.
Publish-time failures for a single native item are logged and the item is excluded; only packaging bugs (duplicate identifier, missing root identifier) raise ValueError.
That ValueError propagates through publish_channel to ChannelViewSet.publish's existing error handling — no new plumbing needed.

AI usage

Implemented with Claude Code from a maintainer-authored plan (issue comment), following TDD per task. Ran the full assessment/publish/QTI test suite and pre-commit hooks (flake8, reorder-python-imports, black) before each commit. Also ran an automated /simplify pass over the diff, using 4 independent review agents across reuse, simplification, efficiency, and altitude angles. Applied the one actionable finding: reused an existing QTI XML test fixture instead of hand-rolling an equivalent one.

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Ran pre-flight CI checks (lint, format, tests) and verified all pass
Rebased onto the target branch and resolved any conflicts
Reorganized commit history into clean, logical commits
Audited the diff to ensure only issue-relevant files are changed
Built PR body from the repository's PR template with evidence blocks

	🟡 Waiting for feedback Last updated: 2026-07-04 00:45 UTC

Route type=QTI assessment items through publish verbatim instead of rebuilding item XML from structured fields, since their authored XML is already stored in raw_data. - Validate raw_data against the QTI schema before packaging - Resolve and write referenced media alongside the item - Extract the media-reference scanner into its own module, shared with the AssessmentItem viewset - Refactor QTIExerciseGenerator's manifest bookkeeping to a generator-agnostic resource list so both the pydantic-built and native paths populate it Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Dispatch to QTIExerciseGenerator unless a node has a perseus_question item, instead of only when it has a free_response item. QTIExerciseGenerator already synthesizes valid QTI XML for every legacy structured-field type; perseus_question is the only type it can't represent. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

rtibbles

This is an excellent start, but a bit more cleanup is needed, I think.

rtibbles · 2026-07-03T20:23:27Z

+        files_by_name = {
+            f"{f.checksum}.{f.file_format_id}": f for f in assessment_item.files.all()
+        }
+        missing = sorted(filenames - files_by_name.keys())


What purpose does this serve? It breaks publishing for the sake of book keeping - we should log an error here, but carry on.

Logs and continues now — _write_qti_media_files calls logging.error with the missing filenames instead of raising, and returns only the files that are actually linked. The item and the rest of the archive still publish; the raw_data still references the missing checksum but that reference was already dangling before this change reached publish.

rtibbles · 2026-07-03T20:24:55Z

+            error_messages = "; ".join(
+                f"line {e.line}, column {e.column}: {e.message}" for e in result.errors
+            )
+            raise ValueError(


So why not just skip it? Why raise a ValueError here and break the publishing?

Skips the item now instead: on validation failure _create_native_qti_item logs an error and returns None; create_assessment_item propagates that, and process_assessment_item (base.py) skips writing/registering the item when the result is None. The rest of the node's items, and the rest of the channel, still publish.

Root cause for why this mattered: a ValueError out of create_exercise_archive() isn't caught anywhere in publish.py's recurse_nodes/publish_channel — it was propagating all the way out and aborting the entire channel publish, not just this node.

Flagging explicitly: this changes the issue's stated acceptance criterion ("failures stop publish with a clear error") and the plan's Task 3 note ("On failure, raise ValueError immediately"). Treating your live review as the more current signal on intended behavior here, but wanted the deviation on record rather than silent.

rtibbles · 2026-07-03T20:26:50Z

+            )
+
+        filepath = f"items/{identifier}.xml"
+        file_dependencies = self._write_qti_media_files(assessment_item)


I am not seeing where we are updating the media file paths to stop them just all sitting in the items folder bare. That's a significant change from the current export implementation.

Intentional, not an omission. Native items keep raw_data byte-identical per the acceptance criteria, and the authored XML's src/href values are bare <checksum>.<ext> strings with no path prefix (see the anchored regex in qti/media.py's get_qti_media_references) — that's the TipTap editor's permanentSrc convention. Since we can't rewrite those attributes without breaking byte-identical fidelity, the referenced file has to physically sit next to the item XML (items/) for the bare relative reference to resolve; it can't go in items/images/ the way the legacy pydantic-built path does, since that path generates and owns its own XML and can prefix image paths freely.

No code change planned here — let me know if you want that byte-identical constraint relaxed instead.

Yes, we should relax the byte identical requirement here. The important thing is that these filepath remappings are the only changes that get made.

Done — media is now written to items/images/ (matching the legacy layout) and the item XML's src/href/data/srcset values referencing those files are remapped to the images/ prefix that resolves correctly from the item's own location.

The rewrite (rewrite_qti_media_paths in qti/media.py) is a targeted text substitution over the raw XML string, not a parse/serialize round-trip — it only swaps the matched attribute-value characters for their new path, so it can't introduce incidental formatting changes (e.g. lxml re-serialization normalizes <img ... /> to <img .../>, which we verified and avoided). Items with no media references are unaffected and remain byte-identical to raw_data, as before.

Covered by new tests in test_media.py (rewrite_qti_media_paths behavior directly) and an updated test_native_qti_item_media_included_and_addressed in test_exercise_creation.py that asserts the zip layout, manifest, and exact rewritten item XML.

rtibbles · 2026-07-03T20:28:51Z

    )


-qti_reference_attributes = ("src", "href", "data")


Good consolidation from this file - I was worried at first we were duplicating this logic!

Yep — get_filenames_from_qti_item now just delegates to get_qti_media_references in qti/media.py, shared with the native packaging path added in this PR.

A ValueError from QTIExerciseGenerator wasn't caught anywhere in publish.py's recurse_nodes/publish_channel, so any single failure aborted the entire channel publish rather than just the offending item. - Schema-invalid native items are now logged and excluded from the package instead of raising. - Media references with no linked File are now logged and dropped from the manifest instead of raising. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

rtibblesbot · 2026-07-03T21:11:39Z

I was unable to complete this task after multiple attempts. Manual review and intervention may be needed.

rtibbles

I do, indeed, want better file organization.

rtibbles · 2026-07-03T23:06:53Z

+            )
+
+        filepath = f"items/{identifier}.xml"
+        file_dependencies = self._write_qti_media_files(assessment_item)


Yes, we should relax the byte identical requirement here. The important thing is that these filepath remappings are the only changes that get made.

Media files referenced by native QTI items were being written bare into items/ alongside the item XML, unlike the legacy generator's items/images/ layout. Write them to items/images/ instead, and rewrite the item XML's src/href/data/srcset values to the images/ prefix that resolves to that directory. The rewrite is a targeted text substitution (not a parse/ serialize round-trip) so it changes only the remapped path values - everything else in the authored raw_data is preserved byte-for-byte. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

rtibbles

Looks good to me, having the extra nesting under images should help to keep things organized.

rtibblesbot and others added 2 commits July 3, 2026 11:04

rtibblesbot marked this pull request as ready for review July 3, 2026 18:36

rtibbles requested changes Jul 3, 2026

View reviewed changes

rtibbles approved these changes Jul 4, 2026

View reviewed changes

rtibbles merged commit 3ae5aca into learningequality:unstable Jul 4, 2026
25 checks passed

rtibblesbot deleted the issue-6000-33c560 branch July 4, 2026 00:46

Uh oh!

Uh oh!

Conversation

rtibblesbot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

References

Reviewer guidance

AI usage

🟡 Waiting for feedback

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rtibblesbot commented Jul 3, 2026

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rtibblesbot commented Jul 3, 2026 •

edited

Loading