Resume classic remoting reader after failed handoff#3205
Open
He-Pin wants to merge 1 commit into
Open
Conversation
fc181bd to
704455e
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a classic remoting edge case where a passive read handoff can leave the original outbound reader stuck in notReading if the read-only handoff endpoint fails, preventing future inbound messages on the original association.
Changes:
- Track per-read-only-endpoint “resumable reader” fallback state and resume the original writable reader when the read-only endpoint terminates (only if remote address + UID still match).
- Add
EndpointWriter.ResumeReadinghandling acrossReliableDeliverySupervisor,EndpointWriter, andEndpointReaderto re-enable reads after stop/handoff. - Add regression coverage for resume-reading behavior and correct a stale passive-connections config reference in tests/docs.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| remote/src/main/scala/org/apache/pekko/remote/Remoting.scala | Adds fallback tracking for passive read handoffs and resumes original reader on read-only endpoint termination when safe. |
| remote/src/main/scala/org/apache/pekko/remote/Endpoint.scala | Introduces/handles ResumeReading across supervisor/writer/reader states, including gated/idle transitions. |
| remote/src/test/scala/org/apache/pekko/remote/ReliableDeliverySupervisorSpec.scala | Adds unit-level regression coverage for ResumeReading in idle/gated scenarios. |
| remote/src/test/scala/org/apache/pekko/remote/classic/RemotingSpec.scala | Adds an integration regression test covering failed passive handoff and successful subsequent delivery. |
| cluster/src/multi-jvm/scala/org/apache/pekko/cluster/RestartNodeSpec.scala | Updates commented config key to the correct classic remoting passive-connections setting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pjfanning
reviewed
Jun 27, 2026
He-Pin
commented
Jun 27, 2026
| case s @ Send(_: SystemMessage, _, _, _) => tryBuffer(s.copy(seqOpt = Some(nextSeq()))) | ||
| case s: Send => context.system.deadLetters ! s | ||
| case EndpointWriter.FlushAndStop => context.stop(self) | ||
| case EndpointWriter.ResumeReading => |
Member
Author
There was a problem hiding this comment.
There was no such case, so it can never resume.
a29905f to
3b57554
Compare
He-Pin
added a commit
to He-Pin/incubator-pekko
that referenced
this pull request
Jun 27, 2026
Motivation: PR validation failed in BindCanonicalAddressSpec when tls-tcp used the reference.conf default keystore path. Modification: Reuse the existing ArterySpecSupport TLS test configuration for the bind canonical address spec so tls-tcp tests resolve the test keystore and truststore resources. Result: The TLS bind canonical address cases no longer fail with NoSuchFileException: keystore. Tests: - rtk bash -lc 'export JAVA_HOME=/Users/hepin/Library/Java/JavaVirtualMachines/azul-17.0.17/Contents/Home; export PATH="/Users/hepin/.sdkman/candidates/java/current/bin:/Users/hepin/.nvm/versions/node/v22.23.0/lib/node_modules/@openai/codex/node_modules/@openai/codex-darwin-arm64/vendor/aarch64-apple-darwin/codex-path:/Users/hepin/.nvm/versions/node/v22.23.0/lib/node_modules/@openai/codex/node_modules/@openai/codex-darwin-arm64/vendor/aarch64-apple-darwin/codex-path:/Users/hepin/.codex/tmp/arg0/codex-arg02IyHp2:/Applications/Wukong.app/Contents/Resources/resources/dingsync/realdoc.app/Contents/MacOS:/Users/hepin/.local/bin:/Users/hepin/.nvm/versions/node/v22.23.0/bin:/Users/hepin/.mimocode/bin:/Users/hepin/Library/Python/3.10/bin:/Users/hepin/.qoderwork/bin:/Users/hepin/.opencode/bin:/Users/hepin/.antigravity/antigravity/bin:/Users/hepin/bin:/Users/hepin/.sdkman/candidates/java/current/bin:/Users/hepin/Library/Java/JavaVirtualMachines/openjdk-25.0.2/Contents/Home/bin:/Applications/Wukong.app/Contents/Resources/resources/dingsync/realdoc.app/Contents/MacOS:/Users/hepin/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/Applications/Wireshark.app/Contents/MacOS:/Users/hepin/.tmo/bin:/Users/hepin/.local/bin:/Users/hepin/.cargo/bin:/Users/hepin/Library/Application Support/JetBrains/Toolbox/scripts:/Users/hepin/Library/Application Support/Coursier/bin:/Users/hepin/.local/bin"; java -version; sbt "remote / Test / testOnly org.apache.pekko.remote.artery.BindCanonicalAddressSpec"' / passed (15 tests) - rtk scalafmt --mode diff-ref=origin/main --non-interactive / passed - rtk scalafmt --list --mode diff-ref=origin/main --non-interactive / passed - rtk bash -lc 'git diff --check' / passed References: Refs apache#3205
062e3ef to
3b57554
Compare
He-Pin
added a commit
to He-Pin/incubator-pekko
that referenced
this pull request
Jun 27, 2026
Motivation: PR validation failed in BindCanonicalAddressSpec when tls-tcp used the reference.conf default keystore path. Modification: Reuse the existing ArterySpecSupport TLS test configuration for the bind canonical address spec so tls-tcp tests resolve the test keystore and truststore resources. Result: The TLS bind canonical address cases no longer fail with NoSuchFileException: keystore. Tests: - rtk bash -lc 'export JAVA_HOME=/Users/hepin/Library/Java/JavaVirtualMachines/azul-17.0.17/Contents/Home; export PATH="/Users/hepin/.sdkman/candidates/java/current/bin:/Users/hepin/.nvm/versions/node/v22.23.0/lib/node_modules/@openai/codex/node_modules/@openai/codex-darwin-arm64/vendor/aarch64-apple-darwin/codex-path:/Users/hepin/.nvm/versions/node/v22.23.0/lib/node_modules/@openai/codex/node_modules/@openai/codex-darwin-arm64/vendor/aarch64-apple-darwin/codex-path:/Users/hepin/.codex/tmp/arg0/codex-arg02IyHp2:/Applications/Wukong.app/Contents/Resources/resources/dingsync/realdoc.app/Contents/MacOS:/Users/hepin/.local/bin:/Users/hepin/.nvm/versions/node/v22.23.0/bin:/Users/hepin/.mimocode/bin:/Users/hepin/Library/Python/3.10/bin:/Users/hepin/.qoderwork/bin:/Users/hepin/.opencode/bin:/Users/hepin/.antigravity/antigravity/bin:/Users/hepin/bin:/Users/hepin/.sdkman/candidates/java/current/bin:/Users/hepin/Library/Java/JavaVirtualMachines/openjdk-25.0.2/Contents/Home/bin:/Applications/Wukong.app/Contents/Resources/resources/dingsync/realdoc.app/Contents/MacOS:/Users/hepin/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/Applications/Wireshark.app/Contents/MacOS:/Users/hepin/.tmo/bin:/Users/hepin/.local/bin:/Users/hepin/.cargo/bin:/Users/hepin/Library/Application Support/JetBrains/Toolbox/scripts:/Users/hepin/Library/Application Support/Coursier/bin:/Users/hepin/.local/bin"; java -version; sbt "remote / Test / testOnly org.apache.pekko.remote.artery.BindCanonicalAddressSpec"' / passed (15 tests) - rtk scalafmt --mode diff-ref=origin/main --non-interactive / passed - rtk scalafmt --list --mode diff-ref=origin/main --non-interactive / passed - rtk bash -lc 'git diff --check' / passed References: Refs apache#3205
He-Pin
added a commit
that referenced
this pull request
Jun 27, 2026
Motivation: PR validation failed in BindCanonicalAddressSpec when tls-tcp used the reference.conf default keystore path. Modification: Reuse the existing ArterySpecSupport TLS test configuration for the bind canonical address spec so tls-tcp tests resolve the test keystore and truststore resources. Result: The TLS bind canonical address cases no longer fail with NoSuchFileException: keystore. Tests: - rtk bash -lc 'export JAVA_HOME=/Users/hepin/Library/Java/JavaVirtualMachines/azul-17.0.17/Contents/Home; export PATH="/Users/hepin/.sdkman/candidates/java/current/bin:/Users/hepin/.nvm/versions/node/v22.23.0/lib/node_modules/@openai/codex/node_modules/@openai/codex-darwin-arm64/vendor/aarch64-apple-darwin/codex-path:/Users/hepin/.nvm/versions/node/v22.23.0/lib/node_modules/@openai/codex/node_modules/@openai/codex-darwin-arm64/vendor/aarch64-apple-darwin/codex-path:/Users/hepin/.codex/tmp/arg0/codex-arg02IyHp2:/Applications/Wukong.app/Contents/Resources/resources/dingsync/realdoc.app/Contents/MacOS:/Users/hepin/.local/bin:/Users/hepin/.nvm/versions/node/v22.23.0/bin:/Users/hepin/.mimocode/bin:/Users/hepin/Library/Python/3.10/bin:/Users/hepin/.qoderwork/bin:/Users/hepin/.opencode/bin:/Users/hepin/.antigravity/antigravity/bin:/Users/hepin/bin:/Users/hepin/.sdkman/candidates/java/current/bin:/Users/hepin/Library/Java/JavaVirtualMachines/openjdk-25.0.2/Contents/Home/bin:/Applications/Wukong.app/Contents/Resources/resources/dingsync/realdoc.app/Contents/MacOS:/Users/hepin/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/pmk/env/global/bin:/Library/Apple/usr/bin:/Applications/Wireshark.app/Contents/MacOS:/Users/hepin/.tmo/bin:/Users/hepin/.local/bin:/Users/hepin/.cargo/bin:/Users/hepin/Library/Application Support/JetBrains/Toolbox/scripts:/Users/hepin/Library/Application Support/Coursier/bin:/Users/hepin/.local/bin"; java -version; sbt "remote / Test / testOnly org.apache.pekko.remote.artery.BindCanonicalAddressSpec"' / passed (15 tests) - rtk scalafmt --mode diff-ref=origin/main --non-interactive / passed - rtk scalafmt --list --mode diff-ref=origin/main --non-interactive / passed - rtk bash -lc 'git diff --check' / passed References: Refs #3205
Motivation: Classic remoting passive read handoff can leave the original outbound reader stuck in notReading if the read-only handoff endpoint fails. Modification: Track passive read handoff fallback endpoints and resume the original writable reader when the read-only endpoint terminates while the original endpoint and UID still match. Carry fallback state across replacement read-only endpoints, clear stale fallback state on UID/address changes, handle ResumeReading in supervisor states, add logging, and add regression coverage. Also correct the stale classic remoting passive-connections config text. Result: Classic remoting can recover future inbound messages on the original association after a failed passive read handoff. Tests: - rtk sbt "remote / Test / testOnly org.apache.pekko.remote.ReliableDeliverySupervisorSpec" / passed (4 tests) - rtk sbt "remote / Test / testOnly org.apache.pekko.remote.classic.RemotingSpec -- -z \"resume the outbound reader\"" / passed - rtk sbt "remote / Test / testOnly org.apache.pekko.remote.classic.RemotingSpec" / passed (22 tests) - rtk scalafmt --list --mode diff-ref=origin/main / passed (JVM Unsafe warnings only) - rtk bash -lc 'git diff --check' / passed - rtk sbt sortImports / failed: Scalafix/scala-meta NoSuchMethodError in local tooling; unrelated generated change was reverted - qodercli stdout review / No must-fix findings - independent subAgent review / No must-fix findings References: Fixes apache#3206
e971975 to
76fbfda
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Classic remoting passive read handoff can leave the original outbound reader stuck in
notReadingif the read-only handoff endpoint fails.PR validation also exposed an unrelated
BindCanonicalAddressSpecfailure in thetls-tcpcases, where the spec fell back to thereference.confdefaultkeystoreandtruststorerelative paths.Modification
Track passive read handoff fallback endpoints and resume the original writable reader when the read-only endpoint terminates while the original endpoint and UID still match.
Carry fallback state across replacement read-only endpoints, clear stale fallback state on UID/address changes, handle
ResumeReadingin supervisor states, add logging, and add regression coverage. Also correct a stale classic remoting passive-connections config reference.Also include the TLS test configuration fix from #3210, reusing
ArterySpecSupport.tlsConfiginBindCanonicalAddressSpec.commonConfigso the existing PR validation can complete.Result
Classic remoting can recover future inbound messages on the original association after a failed passive read handoff.
The TLS bind canonical address cases no longer fail with
NoSuchFileException: keystoreduring PR validation.Tests
rtk sbt "remote / Test / testOnly org.apache.pekko.remote.ReliableDeliverySupervisorSpec"/ passed (4 tests)rtk sbt "remote / Test / testOnly org.apache.pekko.remote.classic.RemotingSpec -- -z \"resume the outbound reader\""/ passedrtk sbt "remote / Test / testOnly org.apache.pekko.remote.classic.RemotingSpec"/ passed (22 tests)rtk bash -lc "export JAVA_HOME=$(/usr/libexec/java_home -v 17); export PATH=\"$JAVA_HOME/bin:$PATH\"; java -version; sbt \"remote / Test / testOnly org.apache.pekko.remote.artery.BindCanonicalAddressSpec\""/ passed (15 tests)rtk scalafmt --list --mode diff-ref=origin/main/ passed (JVM Unsafe warnings only)rtk bash -lc "git diff --check"/ passedrtk sbt sortImports/ failed: Scalafix/scala-metaNoSuchMethodErrorin local tooling; unrelated generated change was revertedqoderclistdout review / No must-fix findingsReferences
Fixes #3206
Refs #3210