Skip to content

fix(chat): persist thread transcripts outside checkpoints#2385

Closed
LittleChenLiya wants to merge 4 commits into
bytedance:mainfrom
LittleChenLiya:fix/transcript-store
Closed

fix(chat): persist thread transcripts outside checkpoints#2385
LittleChenLiya wants to merge 4 commits into
bytedance:mainfrom
LittleChenLiya:fix/transcript-store

Conversation

@LittleChenLiya
Copy link
Copy Markdown
Collaborator

@LittleChenLiya LittleChenLiya commented Apr 21, 2026

问题原因

UI 聊天记录依赖 LangGraph checkpoint history。长对话触发 summarization 后,checkpoint 会被用于压缩模型上下文,导致前端从 checkpoint history 恢复聊天记录时丢失早期可见消息。

checkpoint 更适合作为 agent/model state,不应承担完整 UI transcript 的职责。

修改内容

  • 新增独立 thread transcript 存储,按 thread 保存 UI 可见消息。
  • run 开始和完成时同步可见消息,避免 summarization 影响聊天记录展示。
  • 新增线程消息读取接口,前端优先读取 canonical transcript,并与 live stream 消息合并展示。
  • 过滤 summary placeholder,并在线程删除时同步清理 transcript。
  • 该 PR 作为 summarization/chat history 问题的主线方案,替代已关闭的 Preserve chat history after summarization #2424

关联 issue

Closes #2012
Supersedes #2424


Problem Cause

UI chat history depended on LangGraph checkpoint history. After a long conversation triggered summarization, checkpoints were used to compress model context, so restoring chat history from checkpoint history could lose earlier visible messages in the frontend.

Checkpoints are better suited for agent/model state and should not be responsible for the complete UI transcript.

Changes

  • Added independent thread transcript storage that saves UI-visible messages per thread.
  • Synchronized visible messages when a run starts and completes, preventing summarization from affecting chat history display.
  • Added a thread messages read endpoint so the frontend can prefer the canonical transcript and merge it with live stream messages.
  • Filtered summary placeholders and cleaned up transcripts when deleting a thread.
  • This PR is the main-line solution for the summarization/chat history issue and supersedes the closed Preserve chat history after summarization #2424.

Related Issue

Closes #2012
Supersedes #2424

@LittleChenLiya LittleChenLiya marked this pull request as ready for review April 21, 2026 02:03
…pr2385

# Conflicts:
#	backend/app/gateway/routers/threads.py
#	backend/app/gateway/services.py
#	frontend/src/components/workspace/messages/message-list.tsx
@LittleChenLiya
Copy link
Copy Markdown
Collaborator Author

我重新检查了这个 PR 和最新 main 的冲突,暂时先转回 draft。

主要原因不是普通文本冲突,而是主线现在已经新增了 GET /api/threads/{thread_id}/messages,该接口从 run event store 返回可分页的 display messages;这个 PR 也想新增同一路径来返回独立 transcript。如果机械合并,会遮蔽主线的消息分页接口,并和后续的 history pagination 修复方向冲突。

目前更合理的收敛方向是:不要再新增一套并行 transcript API,而是基于主线 run event store 消息历史方案继续修边界问题(例如分页边界、SSE 恢复、summary 后显示恢复)。因此这个 PR 不适合直接 rebase 后继续保持 ready 状态。

@LittleChenLiya LittleChenLiya marked this pull request as draft May 24, 2026 05:32
@LittleChenLiya LittleChenLiya requested a review from Copilot May 26, 2026 02:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces a canonical, durable transcript for threads (separate from checkpoint/model context) and wires it into the UI so message history remains stable across summarization/context compression.

Changes:

  • Added backend transcript storage utilities with filtering + deduplication, plus tests covering key behaviors.
  • Added a threads API endpoint to fetch the canonical transcript and integrated transcript lifecycle with run execution and thread deletion.
  • Updated the frontend message list to fetch the canonical transcript and merge it with live streaming messages.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
frontend/src/components/workspace/messages/message-list.tsx Fetches canonical thread transcript and merges it with live messages for rendering.
backend/app/gateway/transcripts.py Implements transcript normalization/filtering/dedup + persistence in the Store.
backend/app/gateway/services.py Appends submitted messages and syncs final checkpoint messages into the transcript.
backend/app/gateway/routers/threads.py Adds /threads/{thread_id}/messages endpoint and deletes transcript on thread deletion.
backend/packages/harness/deerflow/agents/middlewares/summarization_middleware.py Tags summary messages as hidden/summary via additional_kwargs.
backend/tests/test_transcripts.py Adds test coverage for transcript append/filter/dedup behavior.
backend/tests/test_summarization_middleware.py Asserts summary messages are tagged as hidden + summary marker.
Comments suppressed due to low confidence (1)

backend/packages/harness/deerflow/agents/middlewares/summarization_middleware.py:190

  • There are two _build_new_messages definitions; the later one overrides the earlier in Python, making the first implementation dead code and potentially confusing for future changes. Remove the shadowed method or consolidate the behavior into a single override to keep the tagging logic in one place.
    def _build_new_messages(self, summary: str) -> list[HumanMessage]:

Comment thread frontend/src/components/workspace/messages/message-list.tsx
Comment thread frontend/src/components/workspace/messages/message-list.tsx Outdated
Comment thread frontend/src/components/workspace/messages/message-list.tsx Outdated
Comment thread frontend/src/components/workspace/messages/message-list.tsx
Comment thread backend/app/gateway/transcripts.py Outdated
Comment thread backend/packages/harness/deerflow/agents/middlewares/summarization_middleware.py Outdated
@LittleChenLiya
Copy link
Copy Markdown
Collaborator Author

先关闭这个 PR,避免继续和主线的 run event store 消息历史 / 分页接口方案并行。后续 summarization/chat history 方向以现有主线消息历史接口及 #3161#3188 这类分页修复为准。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

关于Summarization显示体验度的建议

2 participants