Skip to content

feat(threads): persist and restore chat history across summarization …#3026

Open
buptwz wants to merge 2 commits into
bytedance:mainfrom
buptwz:feat/summarization-message-archive
Open

feat(threads): persist and restore chat history across summarization …#3026
buptwz wants to merge 2 commits into
bytedance:mainfrom
buptwz:feat/summarization-message-archive

Conversation

@buptwz
Copy link
Copy Markdown
Contributor

@buptwz buptwz commented May 17, 2026

Closes #2012
Related to #2424

Summary

When the agent's context window fills up, DeerFlowSummarizationMiddleware removes
old messages from LangGraph state. These messages were permanently lost from the UI —
users could only see the most recent portion of a long conversation.

The existing handler in hooks.ts relied on the internal LangGraph state key
SummarizationMiddleware.before_model and a positional index _messages[2],
which are fragile and can break silently across LangGraph version changes.

This PR replaces that with a two-layer mechanism: a file archive for persistence
across page reloads, and a custom stream event for the live streaming session.

What Changed

Backend

  • Add MessageArchiveHook (BeforeSummarizationHook) that appends archived messages
    to {thread_dir}/message_archive.jsonl, deduped by message id. Registered
    unconditionally alongside memory_flush_hook.
  • Emit a messages_archived custom stream event in DeerFlowSummarizationMiddleware
    via get_stream_writer() before summary generation. Failure is swallowed so
    summarization is never interrupted.
  • Add GET /api/threads/{id}/message-archive endpoint that reads the JSONL file and
    returns {"data": [...], "has_more": false}. Returns empty list for threads that
    predate this feature.

Frontend (hooks.ts)

  • Remove the fragile onUpdateEvent SummarizationMiddleware block and summarizedRef.
  • Handle messages_archived in onCustomEvent: move archived messages into history
    and reset messagesRef.
  • Fetch the archive on mount in useThreadHistory so full history is restored on
    page reload.

Design Notes

  • No LangGraph state growth — nothing stored in ThreadState or checkpoints
  • No dependency on LangGraph internal event key names or message positions
  • Follows existing patterns: same shared-filesystem assumption as memory.json,
    same BeforeSummarizationHook protocol as memory_flush_hook, same custom
    stream mode already used by task_running / llm_retry
  • MessageArchiveHook writes to disk before the stream event is emitted, so a
    page reload mid-stream will still recover the archive

Verification

# New tests
pytest tests/test_message_archive_hook.py tests/test_summarization_middleware.py -q

# CI regression suite
pytest tests/test_harness_boundary.py tests/test_docker_sandbox_mode_detection.py \
       tests/test_provisioner_kubeconfig.py tests/test_memory_updater.py -q

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 17, 2026

CLA assistant check
All committers have signed the CLA.

buptwz and others added 2 commits May 19, 2026 08:47
…cycles

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@buptwz buptwz force-pushed the feat/summarization-message-archive branch from 6d816dc to 1847ef1 Compare May 19, 2026 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

关于Summarization显示体验度的建议

2 participants