Skip to content

fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223)#3226

Merged
WillemJiang merged 6 commits into
mainfrom
fix-3223
Jun 1, 2026
Merged

fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223)#3226
WillemJiang merged 6 commits into
mainfrom
fix-3223

Conversation

@WillemJiang
Copy link
Copy Markdown
Collaborator

@WillemJiang WillemJiang commented May 26, 2026

Fixes #3223, #3254

Replace AsyncPostgresSaver.from_conn_string() with an explicit
AsyncConnectionPool that has check_connection enabled, so dead idle
connections are detected and replaced on checkout instead of raising
OperationalError.

…tale connection errors (#3223)

  Replace AsyncPostgresSaver.from_conn_string() with an explicit
  AsyncConnectionPool that has check_connection enabled, so dead idle
  connections are detected and replaced on checkout instead of raising
  OperationalError.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses long-lived async Postgres checkpointer failures by switching from a single connection created via AsyncPostgresSaver.from_conn_string() to an explicitly managed psycopg_pool.AsyncConnectionPool configured to validate connections on checkout, reducing “connection is closed” OperationalErrors in idle scenarios.

Changes:

  • Replace async Postgres checkpointer creation to use AsyncConnectionPool (with dict_row row factory and connection checking) instead of from_conn_string().
  • Add a unit test asserting the async Postgres checkpointer path constructs the saver using a connection pool.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
backend/packages/harness/deerflow/runtime/checkpointer/async_provider.py Build async Postgres checkpointer using AsyncConnectionPool and pass the pool into AsyncPostgresSaver.
backend/tests/test_checkpointer.py Add an async test to validate the Postgres checkpointer uses a connection pool rather than from_conn_string().

Comment thread backend/tests/test_checkpointer.py Outdated
WillemJiang and others added 3 commits May 26, 2026 10:56
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
  Enable TCP keepalive probes on the AsyncConnectionPool to prevent
  idle postgres connections from being dropped by the server or network
  middleware. Combined with the existing check_connection callback, this
  provides defense-in-depth against stale connection errors.

  Fixes #3254
@rayhpeng
Copy link
Copy Markdown
Collaborator

The direction looks good to me: replacing the single async Postgres connection with AsyncConnectionPool, enabling checkout validation via check_connection, and adding TCP keepalive should make the async checkpointer much more resilient to stale idle connections. CI is green, and the earlier async context manager test issue appears to be addressed.

I’m good with merging this. Two non-blocking suggestions:

  1. Consider extracting the duplicated Postgres pool construction into a small helper, since both the legacy checkpointer path and the unified database path now need to stay in sync.
  2. Consider adding a test for the unified database.backend == "postgres" path as well, since the PR changes that path too.

Also, the PR body mentions Fixes #3223, while the latest commit mentions Fixes #3254; it may be worth clarifying whether this PR is intended to close both issues.

@WillemJiang
Copy link
Copy Markdown
Collaborator Author

WillemJiang commented May 29, 2026

@rayhpeng Here's a summary of the changes 8f8bc60

async_provider.py — Extracted two helpers from the duplicated postgres code:

  • _build_postgres_pool(conn_string) — constructs the AsyncConnectionPool with keepalive + connection checking (was copy-pasted in both _async_checkpointer and _async_checkpointer_from_database)
  • _ensure_postgres_imports() — validates and returns the postgres dependencies (was also duplicated)

Both paths now call these shared helpers, so pool kwargs stay in sync by construction.

test_checkpointer.py — Added test_database_postgres_uses_connection_pool that verifies the unified database.backend == "postgres" path creates an AsyncConnectionPool with keepalive, constructs the saver with
conn=pool, and calls setup(). Mirrors the existing legacy-path test.

BTW, #3254 is the same as #3223

@WillemJiang WillemJiang merged commit 031d6fb into main Jun 1, 2026
12 checks passed
@WillemJiang WillemJiang deleted the fix-3223 branch June 1, 2026 01:05
zhongli-sz pushed a commit to zhongli-sz/deer-flow that referenced this pull request Jun 1, 2026
…tale connection errors (bytedance#3223) (bytedance#3226)

* fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (bytedance#3223)

  Replace AsyncPostgresSaver.from_conn_string() with an explicit
  AsyncConnectionPool that has check_connection enabled, so dead idle
  connections are detected and replaced on checkout instead of raising
  OperationalError.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Fixed the unit test error and lint error

* fix(checkpointer): add TCP keepalive to postgres connection pool (bytedance#3254)

  Enable TCP keepalive probes on the AsyncConnectionPool to prevent
  idle postgres connections from being dropped by the server or network
  middleware. Combined with the existing check_connection callback, this
  provides defense-in-depth against stale connection errors.

  Fixes bytedance#3254

* Changed the code as review suggestion

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Wingxxx pushed a commit to Wingxxx/deer-flow that referenced this pull request Jun 1, 2026
…tale connection errors (bytedance#3223) (bytedance#3226)

* fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (bytedance#3223)

  Replace AsyncPostgresSaver.from_conn_string() with an explicit
  AsyncConnectionPool that has check_connection enabled, so dead idle
  connections are detected and replaced on checkout instead of raising
  OperationalError.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Fixed the unit test error and lint error

* fix(checkpointer): add TCP keepalive to postgres connection pool (bytedance#3254)

  Enable TCP keepalive probes on the AsyncConnectionPool to prevent
  idle postgres connections from being dropped by the server or network
  middleware. Combined with the existing check_connection callback, this
  provides defense-in-depth against stale connection errors.

  Fixes bytedance#3254

* Changed the code as review suggestion

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

postgres做backend存储,时间长了会出现数据库连接报错

3 participants