fix: preserve Docker runner exit status#13402
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @Zakharden. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Zakhar Dvurechensky <72825626+Zakharden@users.noreply.github.com>
87ec70e to
e08eeda
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes a bug where the local Docker runner returned a success exit code (0) even when the container process failed, because auto_remove=True could cause inspection to fail with NotFound.
Changes:
- Switch
auto_removefromTruetoFalseso the container persists until its exit status is read. - Move container removal into a
finallyblock viacontainer.remove(force=True). - Update existing tests for the new
auto_removevalue and add a regression test verifying the exit code is propagated and cleanup occurs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| sdk/python/kfp/local/docker_task_handler.py | Disables auto-remove, reads StatusCode, and removes container in a finally block. |
| sdk/python/kfp/local/docker_task_handler_test.py | Updates auto_remove expectations and adds a regression test for exit status propagation and removal. |
Description of your changes
Fixes the local Docker runner path so container failures are propagated back to callers.
The previous implementation used
auto_remove=Trueand then inspected the container afterrun(). When Docker removed the container before inspection,docker.errors.NotFoundwas treated as success and the local task returned0, even if the container process failed.This change keeps the container available until its status is read, returns the Docker
StatusCode, and removes the container in a cleanup block. The tests cover the regression and the cleanup behavior.Fixes #13332
How has this code been tested?
python3 -m py_compile sdk/python/kfp/local/docker_task_handler.py sdk/python/kfp/local/docker_task_handler_test.pyI could not run the unit test module locally because this environment is missing test/runtime dependencies:
python3 -m pytest ...failed becausepytestis not installedPYTHONPATH=sdk/python python3 -m unittest ...failed becausedocstring_parseris not installedChecklist