chore: add sandbox memory profiling tools#3249
Open
LittleChenLiya wants to merge 4 commits into
Open
Conversation
a3bc8da to
42d43df
Compare
Collaborator
Author
|
@copilot review |
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds tooling and documentation to baseline and compare Kubernetes sandbox memory usage, plus provisioner toggles to disable in-sandbox services to reduce idle memory.
Changes:
- Introduces a
kubectl-based memory profiling script that outputs JSON/Markdown and can optionally sample top RSS processes. - Adds provisioner environment flags to pass
DISABLE_JUPYTER/DISABLE_CODE_SERVERinto sandbox pods. - Adds docs and tests covering the new profiling workflow and provisioner behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/sandbox_memory_profile.py | New CLI script to snapshot pod memory/CPU and (optionally) per-process RSS into JSON/Markdown reports. |
| docker/provisioner/app.py | Adds env-driven toggles to inject DISABLE_JUPYTER / DISABLE_CODE_SERVER into sandbox container spec. |
| docker/provisioner/README.md | Documents new provisioner env vars for disabling Jupyter and code-server. |
| docker/docker-compose.yaml | Wires new env vars through docker-compose deployment. |
| docker/docker-compose-dev.yaml | Wires new env vars through dev docker-compose deployment. |
| backend/tests/test_sandbox_memory_profile_script.py | Adds unit tests for parsing/merging/reporting logic in the new script. |
| backend/tests/test_provisioner_pvc_volumes.py | Adds tests asserting sandbox env toggles are included/omitted as configured. |
| backend/docs/SANDBOX_MEMORY_PROFILING.md | New guide describing what to measure and how to capture/compare memory snapshots. |
| backend/docs/README.md | Links the new sandbox memory profiling guide from docs index. |
42d43df to
a94d3b5
Compare
fb218bd to
58a0039
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题原因
AIO sandbox 在空闲时会启动多个常驻服务,Pod memory 可接近 1Gi。仅调整 Kubernetes request/limit 或限制并发不能解释单个 sandbox 的基础内存占用,需要先提供可重复的画像数据。
修改内容
关联 issue
Related to #3213
Problem Cause
The AIO sandbox starts several long-running services while idle, and Pod memory can approach 1 GiB. Changing Kubernetes requests/limits or reducing concurrency alone cannot explain each sandbox's baseline memory, so reproducible profiling data is needed first.
Changes
Related Issue
Related to #3213