fix(runs): expose active progress counters by kibabsquirrel · Pull Request #3148 · bytedance/deer-flow

kibabsquirrel · 2026-05-22T01:37:22Z

Summary

related to #3116
Fixes the active-run visibility part of BUG-005 by exposing near-real-time progress counters for running runs.

During long runs, RunJournal already accumulates token usage and message counts, but those values were only written to runs at completion. This made active runs appear as total_tokens=0, llm_call_count=0, and message_count=0 until the worker finalized the run.

This PR adds a throttled active progress snapshot path:

RunJournal reports accumulated usage snapshots after LLM/subagent usage is settled.
RunManager / RunStore persist progress without changing run status.
RunResponse exposes token/message counters for run detail and list APIs.
/api/threads/{thread_id}/token-usage?include_active=true can include running run snapshots while preserving the completed-only default.

Progress flushing uses leading + trailing throttling: the first settled usage update is reported promptly, and updates suppressed by the throttle window are followed by a trailing flush with the latest snapshot. This avoids stale active counters during long tool/file/network phases after a burst of LLM calls.

Scope

This implements layer 1 of BUG-005: active progress counters.

It intentionally does not implement layer 2 durable structured trace or layer 3 raw stream frame persistence.

Key Decisions

Use settled usage snapshots rather than token-by-token telemetry, because provider usage is reliably available after LLM/subagent calls complete and this keeps progress writes bounded to roughly one write per throttle interval per active run.
Use leading + trailing throttling so active counters update promptly without writing on every callback or stream chunk.
Keep /token-usage completed-only by default, with include_active=true as an explicit opt-in for callers that want running snapshots included.
Persist progress only for running runs without changing status; the final completion path remains the authoritative source of final totals.
Treat omitted progress fields as “do not update” rather than defaulting them to 0, so partial progress updates remain safe and consistent across stores.

Why Not Durable Trace Here

Durable structured trace is a separate design concern. DeerFlow already has run_events and external tracing integrations, but making local trace durable by default requires decisions around backend defaults, payload size limits, sensitive content, retention, event schema, and query semantics.

Keeping this PR focused avoids mixing active cost visibility with trace persistence and keeps the database write path lightweight.

Semantics

Progress counters are snapshots of settled usage, not token-by-token telemetry. They update after LLM/subagent usage is available and are throttled to avoid high-frequency writes.

The final completion path writes the authoritative final totals.

Test Plan

PYTHONPATH=. uv run pytest tests/test_run_repository.py tests/test_run_journal.py tests/test_thread_token_usage.py tests/test_worker_langfuse_metadata.py tests/test_persistence_scaffold.py -q
uv run ruff check packages/harness/deerflow/runtime/journal.py packages/harness/deerflow/runtime/runs/manager.py packages/harness/deerflow/runtime/runs/store/memory.py packages/harness/deerflow/runtime/runs/store/base.py packages/harness/deerflow/persistence/run/sql.py tests/test_run_journal.py tests/test_run_repository.py

…x/bug-005

kibabsquirrel · 2026-05-22T01:52:51Z

A few trade-off notes:

This PR uses settled usage snapshots, not token/chunk-level telemetry. With the current 5s throttle, bursty updates are coalesced into roughly one progress write per interval per active run, plus leading/final snapshots.

/token-usage keeps the completed-only default; include_active=true only opts into running run rows and does not add checkpoint/event scans.

I also left interrupted-run accounting unchanged here, since that feels like a separate product/accounting semantic decision from active progress visibility.

ShenAC-SAC · 2026-05-22T02:49:25Z

@kibabsquirrel Thanks for working on this.

Overall, I think the scope is reasonable. Focusing this PR on active progress counters, while leaving durable trace / raw stream persistence out of scope, keeps the change bounded.

The settled-usage snapshot approach also makes sense to me. Keeping /token-usage completed-only by default and requiring include_active=true for running snapshots is a good compatibility choice.

A few review notes:

Final progress may write counters more than once

Your trade-off note mentions leading + final snapshots. My main concern is the final snapshot part.

RunJournal.flush() writes a final progress snapshot through:

await self._progress_reporter(self.get_completion_data())

Then run_agent() immediately writes the authoritative final totals through:

completion = journal.get_completion_data()
await run_manager.update_run_completion(run_id, status=record.status.value, **completion)

Given the recent SQLite/write-pressure issues, I think we should avoid unnecessary duplicate writes on run finalization. Is the final progress write in flush() required, or can the authoritative final write stay only in update_run_completion()?

update_run_progress() semantics should be tightened

The PR describes this as active/running progress visibility, but update_run_progress() does not appear to guard that the run is still running.

I think we should either:

restrict progress updates to status == "running" at the store layer, or
rename/re-document it as a more general usage snapshot update if terminal runs may also be updated.

Otherwise this API may be easy to misuse later and could become semantic maintenance debt.

One test is timing-sensitive

test_throttled_progress_flush_emits_trailing_snapshot depends on a real sleep interval:

progress_flush_interval = 0.01
await asyncio.sleep(0.03)

It passed locally, but this kind of timing-based async test can become flaky under CI load. It may be more robust to drive it with an asyncio.Event, injected clock/sleep, or another deterministic synchronization point.

Minor maintainability note

The token/message counter fields are now repeated through RunRecord, RunResponse, RunStore, SQL store, and memory store. This is acceptable for this PR, but if run observability keeps expanding, it may be worth extracting a small RunUsageSnapshot / typed dict later to avoid field drift.

No blocker from tests/lint on my side, but I would prefer discussing points 1 and 2 with you before merge.

Copilot

Pull request overview

Adds near-real-time visibility into active run progress (tokens, LLM calls, message counts) by periodically persisting settled usage snapshots during execution, and optionally including running runs in thread-level token aggregation.

Changes:

Implement throttled progress snapshot reporting in RunJournal, wired from the worker to RunManager.update_run_progress().
Extend run stores/repository to persist progress snapshots and optionally include running runs in /token-usage aggregation (include_active=true).
Expose token/message counters on run list/detail responses and add/extend tests for the new behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
backend/packages/harness/deerflow/runtime/journal.py	Adds throttled progress snapshot scheduling + final flush behavior.
backend/packages/harness/deerflow/runtime/runs/worker.py	Wires `RunJournal.progress_reporter` to persist active progress via `RunManager`.
backend/packages/harness/deerflow/runtime/runs/manager.py	Adds in-memory + store-backed `update_run_progress()` and hydrates counters from store rows.
backend/packages/harness/deerflow/runtime/runs/store/base.py	Extends the RunStore interface with `update_run_progress()` and `include_active` aggregation flag.
backend/packages/harness/deerflow/runtime/runs/store/memory.py	Implements progress snapshot persistence and `include_active` aggregation for the in-memory store.
backend/packages/harness/deerflow/persistence/run/sql.py	Implements SQL-backed `update_run_progress()` and `include_active` token aggregation.
backend/app/gateway/routers/thread_runs.py	Exposes counters on `RunResponse` and adds `include_active` query param to `/token-usage`.
backend/tests/test_run_journal.py	Adds tests covering snapshot emission + throttling/trailing flush behavior.
backend/tests/test_run_repository.py	Adds tests for `update_run_progress()` + `include_active` aggregation.
backend/tests/test_thread_token_usage.py	Adds API test for `include_active=true` behavior.

kibabsquirrel · 2026-05-22T04:30:54Z

@ShenAC-SAC Thanks, these points make sense.

For (1), I agree the final progress write in flush() was unnecessary in the worker path. I updated this so flush() only cancels pending delayed progress work and no longer performs an extra final progress write.

For (2), agreed as well. The intended semantics are active/running progress, not a general post-hoc usage update. I tightened the path so progress updates only apply to running runs, while terminal totals remain owned by update_run_completion().

For (3), I made the trailing-flush test less timing-sensitive by waiting on an asyncio.Event from the reporter instead of relying only on a fixed sleep.

For (4), agreed as a follow-up. I kept the fields explicit in this first-layer PR to avoid expanding the scope, but a small RunUsageSnapshot / typed dict would make sense if run observability keeps growing.

I also followed up on the Copilot comments around update_run_progress() defaults. The runtime path already reports full snapshots, but I tightened the interface so omitted counters are treated as “do not update” rather than defaulting to 0, which keeps the SQL and memory stores consistent for partial updates as well.

ShenAC-SAC · 2026-05-22T04:46:33Z

@kibabsquirrel In backend/packages/harness/deerflow/runtime/runs/worker.py, I noticed journal = None is assigned twice in run_agent(). This looks like a pre-existing cleanup issue, not introduced by this PR, but it may be worth removing the duplicate assignment to keep the code tidy.

kibabsquirrel · 2026-05-22T05:05:12Z

@ShenAC-SAC Good catch, I've cleaned it up in this PR

WillemJiang · 2026-05-22T08:26:39Z

@kibabsquirrel please fix the lint error with the below command

cd backend
make format

kibabsquirrel · 2026-05-22T09:17:27Z

@WillemJiang solved it. I will make sure it is checked by configuring git hooks😂

kibabsquirrel added 3 commits May 22, 2026 09:31

fix(runs): expose active progress counters

bc4c7ff

Merge branch 'main' of https://github.com/bytedance/deer-flow into fi…

58fbe99

…x/bug-005

fix(runs): avoid delayed progress flush on completion

6617730

ShenAC-SAC added the reviewing The PR is in reviewing status label May 22, 2026

WillemJiang requested a review from Copilot May 22, 2026 03:45

Copilot started reviewing on behalf of WillemJiang May 22, 2026 03:46 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread backend/packages/harness/deerflow/persistence/run/sql.py

Comment thread backend/packages/harness/deerflow/runtime/runs/store/base.py Outdated

fix(runs): tighten progress snapshot semantics

01ed846

ShenAC-SAC mentioned this pull request May 22, 2026

fix: harden run finalization persistence #3155

Merged

fix(runs): preserve omitted progress fields

2d768d0

chore(runs): remove duplicate journal initialization

702ac1d

kibabsquirrel force-pushed the fix/bug-005-active-progress-counters branch from 483b19f to 702ac1d Compare May 22, 2026 08:50

WillemJiang approved these changes May 22, 2026

View reviewed changes

WillemJiang merged commit 2eeb597 into bytedance:main May 22, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runs): expose active progress counters#3148

fix(runs): expose active progress counters#3148
WillemJiang merged 6 commits into
bytedance:mainfrom
kibabsquirrel:fix/bug-005-active-progress-counters

kibabsquirrel commented May 22, 2026 •

edited

Loading

Uh oh!

kibabsquirrel commented May 22, 2026

Uh oh!

ShenAC-SAC commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

kibabsquirrel commented May 22, 2026 •

edited

Loading

Uh oh!

ShenAC-SAC commented May 22, 2026

Uh oh!

kibabsquirrel commented May 22, 2026

Uh oh!

WillemJiang commented May 22, 2026

Uh oh!

kibabsquirrel commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kibabsquirrel commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Key Decisions

Why Not Durable Trace Here

Semantics

Test Plan

Uh oh!

kibabsquirrel commented May 22, 2026

Uh oh!

ShenAC-SAC commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

kibabsquirrel commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShenAC-SAC commented May 22, 2026

Uh oh!

kibabsquirrel commented May 22, 2026

Uh oh!

WillemJiang commented May 22, 2026

Uh oh!

kibabsquirrel commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kibabsquirrel commented May 22, 2026 •

edited

Loading

kibabsquirrel commented May 22, 2026 •

edited

Loading