Skip to content

fix(runs): expose active progress counters#3148

Merged
WillemJiang merged 6 commits into
bytedance:mainfrom
kibabsquirrel:fix/bug-005-active-progress-counters
May 22, 2026
Merged

fix(runs): expose active progress counters#3148
WillemJiang merged 6 commits into
bytedance:mainfrom
kibabsquirrel:fix/bug-005-active-progress-counters

Conversation

@kibabsquirrel
Copy link
Copy Markdown
Contributor

@kibabsquirrel kibabsquirrel commented May 22, 2026

Summary

related to #3116
Fixes the active-run visibility part of BUG-005 by exposing near-real-time progress counters for running runs.

During long runs, RunJournal already accumulates token usage and message counts, but those values were only written to runs at completion. This made active runs appear as total_tokens=0, llm_call_count=0, and message_count=0 until the worker finalized the run.

This PR adds a throttled active progress snapshot path:

  • RunJournal reports accumulated usage snapshots after LLM/subagent usage is settled.
  • RunManager / RunStore persist progress without changing run status.
  • RunResponse exposes token/message counters for run detail and list APIs.
  • /api/threads/{thread_id}/token-usage?include_active=true can include running run snapshots while preserving the completed-only default.

Progress flushing uses leading + trailing throttling: the first settled usage update is reported promptly, and updates suppressed by the throttle window are followed by a trailing flush with the latest snapshot. This avoids stale active counters during long tool/file/network phases after a burst of LLM calls.

Scope

This implements layer 1 of BUG-005: active progress counters.

It intentionally does not implement layer 2 durable structured trace or layer 3 raw stream frame persistence.

Key Decisions

  • Use settled usage snapshots rather than token-by-token telemetry, because provider usage is reliably available after LLM/subagent calls complete and this keeps progress writes bounded to roughly one write per throttle interval per active run.
  • Use leading + trailing throttling so active counters update promptly without writing on every callback or stream chunk.
  • Keep /token-usage completed-only by default, with include_active=true as an explicit opt-in for callers that want running snapshots included.
  • Persist progress only for running runs without changing status; the final completion path remains the authoritative source of final totals.
  • Treat omitted progress fields as “do not update” rather than defaulting them to 0, so partial progress updates remain safe and consistent across stores.

Why Not Durable Trace Here

Durable structured trace is a separate design concern. DeerFlow already has run_events and external tracing integrations, but making local trace durable by default requires decisions around backend defaults, payload size limits, sensitive content, retention, event schema, and query semantics.

Keeping this PR focused avoids mixing active cost visibility with trace persistence and keeps the database write path lightweight.

Semantics

Progress counters are snapshots of settled usage, not token-by-token telemetry. They update after LLM/subagent usage is available and are throttled to avoid high-frequency writes.

The final completion path writes the authoritative final totals.

Test Plan

  • PYTHONPATH=. uv run pytest tests/test_run_repository.py tests/test_run_journal.py tests/test_thread_token_usage.py tests/test_worker_langfuse_metadata.py tests/test_persistence_scaffold.py -q
  • uv run ruff check packages/harness/deerflow/runtime/journal.py packages/harness/deerflow/runtime/runs/manager.py packages/harness/deerflow/runtime/runs/store/memory.py packages/harness/deerflow/runtime/runs/store/base.py packages/harness/deerflow/persistence/run/sql.py tests/test_run_journal.py tests/test_run_repository.py

@kibabsquirrel
Copy link
Copy Markdown
Contributor Author

A few trade-off notes:

This PR uses settled usage snapshots, not token/chunk-level telemetry. With the current 5s throttle, bursty updates are coalesced into roughly one progress write per interval per active run, plus leading/final snapshots.

/token-usage keeps the completed-only default; include_active=true only opts into running run rows and does not add checkpoint/event scans.

I also left interrupted-run accounting unchanged here, since that feels like a separate product/accounting semantic decision from active progress visibility.

@ShenAC-SAC
Copy link
Copy Markdown
Collaborator

@kibabsquirrel Thanks for working on this.

Overall, I think the scope is reasonable. Focusing this PR on active progress counters, while leaving durable trace / raw stream persistence out of scope, keeps the change bounded.

The settled-usage snapshot approach also makes sense to me. Keeping /token-usage completed-only by default and requiring include_active=true for running snapshots is a good compatibility choice.

A few review notes:

  1. Final progress may write counters more than once

Your trade-off note mentions leading + final snapshots. My main concern is the final snapshot part.

RunJournal.flush() writes a final progress snapshot through:

await self._progress_reporter(self.get_completion_data())

Then run_agent() immediately writes the authoritative final totals through:

completion = journal.get_completion_data()
await run_manager.update_run_completion(run_id, status=record.status.value, **completion)

Given the recent SQLite/write-pressure issues, I think we should avoid unnecessary duplicate writes on run finalization. Is the final progress write in flush() required, or can the authoritative final write stay only in update_run_completion()?

  1. update_run_progress() semantics should be tightened

The PR describes this as active/running progress visibility, but update_run_progress() does not appear to guard that the run is still running.

I think we should either:

  • restrict progress updates to status == "running" at the store layer, or
  • rename/re-document it as a more general usage snapshot update if terminal runs may also be updated.

Otherwise this API may be easy to misuse later and could become semantic maintenance debt.

  1. One test is timing-sensitive

test_throttled_progress_flush_emits_trailing_snapshot depends on a real sleep interval:

progress_flush_interval = 0.01
await asyncio.sleep(0.03)

It passed locally, but this kind of timing-based async test can become flaky under CI load. It may be more robust to drive it with an asyncio.Event, injected clock/sleep, or another deterministic synchronization point.

  1. Minor maintainability note

The token/message counter fields are now repeated through RunRecord, RunResponse, RunStore, SQL store, and memory store. This is acceptable for this PR, but if run observability keeps expanding, it may be worth extracting a small RunUsageSnapshot / typed dict later to avoid field drift.

No blocker from tests/lint on my side, but I would prefer discussing points 1 and 2 with you before merge.

@ShenAC-SAC ShenAC-SAC added the reviewing The PR is in reviewing status label May 22, 2026
@WillemJiang WillemJiang requested a review from Copilot May 22, 2026 03:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds near-real-time visibility into active run progress (tokens, LLM calls, message counts) by periodically persisting settled usage snapshots during execution, and optionally including running runs in thread-level token aggregation.

Changes:

  • Implement throttled progress snapshot reporting in RunJournal, wired from the worker to RunManager.update_run_progress().
  • Extend run stores/repository to persist progress snapshots and optionally include running runs in /token-usage aggregation (include_active=true).
  • Expose token/message counters on run list/detail responses and add/extend tests for the new behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
backend/packages/harness/deerflow/runtime/journal.py Adds throttled progress snapshot scheduling + final flush behavior.
backend/packages/harness/deerflow/runtime/runs/worker.py Wires RunJournal.progress_reporter to persist active progress via RunManager.
backend/packages/harness/deerflow/runtime/runs/manager.py Adds in-memory + store-backed update_run_progress() and hydrates counters from store rows.
backend/packages/harness/deerflow/runtime/runs/store/base.py Extends the RunStore interface with update_run_progress() and include_active aggregation flag.
backend/packages/harness/deerflow/runtime/runs/store/memory.py Implements progress snapshot persistence and include_active aggregation for the in-memory store.
backend/packages/harness/deerflow/persistence/run/sql.py Implements SQL-backed update_run_progress() and include_active token aggregation.
backend/app/gateway/routers/thread_runs.py Exposes counters on RunResponse and adds include_active query param to /token-usage.
backend/tests/test_run_journal.py Adds tests covering snapshot emission + throttling/trailing flush behavior.
backend/tests/test_run_repository.py Adds tests for update_run_progress() + include_active aggregation.
backend/tests/test_thread_token_usage.py Adds API test for include_active=true behavior.

Comment thread backend/packages/harness/deerflow/persistence/run/sql.py
Comment thread backend/packages/harness/deerflow/runtime/runs/store/base.py Outdated
@kibabsquirrel
Copy link
Copy Markdown
Contributor Author

kibabsquirrel commented May 22, 2026

@ShenAC-SAC Thanks, these points make sense.

For (1), I agree the final progress write in flush() was unnecessary in the worker path. I updated this so flush() only cancels pending delayed progress work and no longer performs an extra final progress write.

For (2), agreed as well. The intended semantics are active/running progress, not a general post-hoc usage update. I tightened the path so progress updates only apply to running runs, while terminal totals remain owned by update_run_completion().

For (3), I made the trailing-flush test less timing-sensitive by waiting on an asyncio.Event from the reporter instead of relying only on a fixed sleep.

For (4), agreed as a follow-up. I kept the fields explicit in this first-layer PR to avoid expanding the scope, but a small RunUsageSnapshot / typed dict would make sense if run observability keeps growing.

I also followed up on the Copilot comments around update_run_progress() defaults. The runtime path already reports full snapshots, but I tightened the interface so omitted counters are treated as “do not update” rather than defaulting to 0, which keeps the SQL and memory stores consistent for partial updates as well.

@ShenAC-SAC
Copy link
Copy Markdown
Collaborator

@kibabsquirrel In backend/packages/harness/deerflow/runtime/runs/worker.py, I noticed journal = None is assigned twice in run_agent(). This looks like a pre-existing cleanup issue, not introduced by this PR, but it may be worth removing the duplicate assignment to keep the code tidy.

@kibabsquirrel
Copy link
Copy Markdown
Contributor Author

@ShenAC-SAC Good catch, I've cleaned it up in this PR

@WillemJiang
Copy link
Copy Markdown
Collaborator

@kibabsquirrel please fix the lint error with the below command

cd backend
make format

@kibabsquirrel kibabsquirrel force-pushed the fix/bug-005-active-progress-counters branch from 483b19f to 702ac1d Compare May 22, 2026 08:50
@kibabsquirrel
Copy link
Copy Markdown
Contributor Author

@WillemJiang solved it. I will make sure it is checked by configuring git hooks😂

@WillemJiang WillemJiang merged commit 2eeb597 into bytedance:main May 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewing The PR is in reviewing status

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants