fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223)#3226
Conversation
…tale connection errors (#3223) Replace AsyncPostgresSaver.from_conn_string() with an explicit AsyncConnectionPool that has check_connection enabled, so dead idle connections are detected and replaced on checkout instead of raising OperationalError.
There was a problem hiding this comment.
Pull request overview
This PR addresses long-lived async Postgres checkpointer failures by switching from a single connection created via AsyncPostgresSaver.from_conn_string() to an explicitly managed psycopg_pool.AsyncConnectionPool configured to validate connections on checkout, reducing “connection is closed” OperationalErrors in idle scenarios.
Changes:
- Replace async Postgres checkpointer creation to use
AsyncConnectionPool(withdict_rowrow factory and connection checking) instead offrom_conn_string(). - Add a unit test asserting the async Postgres checkpointer path constructs the saver using a connection pool.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| backend/packages/harness/deerflow/runtime/checkpointer/async_provider.py | Build async Postgres checkpointer using AsyncConnectionPool and pass the pool into AsyncPostgresSaver. |
| backend/tests/test_checkpointer.py | Add an async test to validate the Postgres checkpointer uses a connection pool rather than from_conn_string(). |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Enable TCP keepalive probes on the AsyncConnectionPool to prevent idle postgres connections from being dropped by the server or network middleware. Combined with the existing check_connection callback, this provides defense-in-depth against stale connection errors. Fixes #3254
|
The direction looks good to me: replacing the single async Postgres connection with I’m good with merging this. Two non-blocking suggestions:
Also, the PR body mentions |
|
@rayhpeng Here's a summary of the changes 8f8bc60 async_provider.py — Extracted two helpers from the duplicated postgres code:
Both paths now call these shared helpers, so pool kwargs stay in sync by construction. test_checkpointer.py — Added test_database_postgres_uses_connection_pool that verifies the unified database.backend == "postgres" path creates an AsyncConnectionPool with keepalive, constructs the saver with |
…tale connection errors (bytedance#3223) (bytedance#3226) * fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (bytedance#3223) Replace AsyncPostgresSaver.from_conn_string() with an explicit AsyncConnectionPool that has check_connection enabled, so dead idle connections are detected and replaced on checkout instead of raising OperationalError. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fixed the unit test error and lint error * fix(checkpointer): add TCP keepalive to postgres connection pool (bytedance#3254) Enable TCP keepalive probes on the AsyncConnectionPool to prevent idle postgres connections from being dropped by the server or network middleware. Combined with the existing check_connection callback, this provides defense-in-depth against stale connection errors. Fixes bytedance#3254 * Changed the code as review suggestion --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…tale connection errors (bytedance#3223) (bytedance#3226) * fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (bytedance#3223) Replace AsyncPostgresSaver.from_conn_string() with an explicit AsyncConnectionPool that has check_connection enabled, so dead idle connections are detected and replaced on checkout instead of raising OperationalError. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fixed the unit test error and lint error * fix(checkpointer): add TCP keepalive to postgres connection pool (bytedance#3254) Enable TCP keepalive probes on the AsyncConnectionPool to prevent idle postgres connections from being dropped by the server or network middleware. Combined with the existing check_connection callback, this provides defense-in-depth against stale connection errors. Fixes bytedance#3254 * Changed the code as review suggestion --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Fixes #3223, #3254
Replace AsyncPostgresSaver.from_conn_string() with an explicit
AsyncConnectionPool that has check_connection enabled, so dead idle
connections are detected and replaced on checkout instead of raising
OperationalError.