bytedance · LittleChenLiya · May 26, 2026 · May 28, 2026 · May 30, 2026 · May 30, 2026
diff --git a/backend/docs/README.md b/backend/docs/README.md
@@ -19,6 +19,7 @@ This directory contains detailed documentation for the DeerFlow backend.
 | [STREAMING.md](STREAMING.md) | Token-level streaming design: Gateway vs DeerFlowClient paths, `stream_mode` semantics, per-id dedup |
 | [FILE_UPLOAD.md](FILE_UPLOAD.md) | File upload functionality |
 | [PATH_EXAMPLES.md](PATH_EXAMPLES.md) | Path types and usage examples |
+| [SANDBOX_MEMORY_PROFILING.md](SANDBOX_MEMORY_PROFILING.md) | Sandbox memory baseline and runtime comparison guide |
 | [summarization.md](summarization.md) | Context summarization feature |
 | [plan_mode_usage.md](plan_mode_usage.md) | Plan mode with TodoList |
 | [AUTO_TITLE_GENERATION.md](AUTO_TITLE_GENERATION.md) | Automatic title generation |

diff --git a/backend/docs/SANDBOX_MEMORY_PROFILING.md b/backend/docs/SANDBOX_MEMORY_PROFILING.md
@@ -0,0 +1,94 @@
+# Sandbox Memory Profiling
+
+This guide records a repeatable baseline before changing the sandbox runtime.
+Issue #3213 reports per-sandbox memory near 1 GiB in Kubernetes. Before adding
+or recommending a new provider, capture the current AIO sandbox baseline and
+compare candidates with the same DeerFlow workload.
+
+## What to Measure
+
+Measure at least these samples:
+
+1. Empty sandbox after it becomes ready.
+2. After a simple bash command.
+3. After a Python task that imports common packages.
+4. After a Node task when Node-based workloads are expected.
+5. After generating files under `/mnt/user-data/outputs`.
+6. After release and warm reuse.
+7. At the target concurrency level, for example 10, 50, or 100 sandboxes.
+
+`kubectl top` reports Kubernetes/container working set memory. Treat it as a
+capacity signal, not exclusive RSS/PSS. Pod-level memory includes every
+container in the Pod and may include cache charged to the cgroup. If a result
+looks surprising, inspect the sandbox processes and cgroup metrics on the node
+before drawing conclusions.
+
+## Capture a Snapshot
+
+Run this from the repository root:
+
+```bash
+python scripts/sandbox_memory_profile.py \
+  --namespace deer-flow \
+  --selector app=deer-flow-sandbox \
+  --sample empty \
+  --include-processes \
+  --format markdown
+```
+
+Use a descriptive `--sample` value for each phase:
+
+```bash
+python scripts/sandbox_memory_profile.py --sample after-bash --format json
+python scripts/sandbox_memory_profile.py --sample after-python --format json
+python scripts/sandbox_memory_profile.py --sample after-artifact --format json
+```
+
+`--include-processes` runs `kubectl exec ... ps` in each sandbox Pod and adds
+the highest-RSS processes to the report. This helps distinguish Pod-level cgroup
+memory from process RSS. The two numbers will not match exactly because cgroup
+memory can include cache and other kernel-accounted memory.
+
+Save the raw JSON when comparing backends so totals, pod names, images,
+requests, limits, and timestamps can be audited later.
+
+## Known AIO Memory Levers
+
+The AIO sandbox image supports `DISABLE_JUPYTER` and `DISABLE_CODE_SERVER`.
+In provisioner mode, enable them through `SANDBOX_DISABLE_JUPYTER=true` and
+`SANDBOX_DISABLE_CODE_SERVER=true` when the deployment does not need the
+in-sandbox Jupyter or code-server services.
+
+On a local kind baseline, the default idle sandbox was about 0.8 GiB. Disabling
+Jupyter and code-server reduced the idle Pod to about 0.4 GiB. Temporarily
+stopping browser, VNC, browser MCP, and Openbox reduced it further to about
+0.18 GiB, so browser-stack lazy startup should be evaluated as a separate,
+larger change.
+
+## Candidate Runtime Matrix
+
+For AIO, CubeSandbox, OpenSandbox, gVisor, Kata, or another candidate, compare
+the same workload and record:
+
+| Area | Required Evidence |
+| --- | --- |
+| Capacity | Pod or instance count, total memory, average memory, max memory |
+| Startup | Ready latency at 1, 10, 50, and 100 concurrent sandboxes |
+| Commands | Bash output, timeout behavior, failure shape |
+| Files | `read_file`, `write_file`, binary `update_file`, `list_dir`, `glob`, `grep` |
+| Uploads | Files uploaded by the gateway are visible inside the sandbox |
+| Artifacts | Files written to `/mnt/user-data/outputs` are readable by the backend artifact API |
+| Paths | `/mnt/user-data/workspace`, `/mnt/user-data/uploads`, `/mnt/user-data/outputs`, `/mnt/acp-workspace`, and skills paths keep their expected semantics |
+| Isolation | Different users and threads cannot read each other's data |
+| Cleanup | Release, idle timeout, process restart, and orphan cleanup free resources |
+| Operations | Deployment prerequisites, privileged components, networking, storage, and upgrade path |
+
+## PR Guidance
+
+Do not claim that a new provider fixes high-concurrency memory usage until the
+same DeerFlow workload has been measured on both the current AIO sandbox and the
+candidate backend.
+
+For an experimental provider PR, prefer `Related to #3213` unless the PR also
+includes reproducible DeerFlow workload data that demonstrates the target memory
+reduction and preserves uploads, outputs, artifacts, and isolation behavior.
diff --git a/backend/tests/test_provisioner_pvc_volumes.py b/backend/tests/test_provisioner_pvc_volumes.py
@@ -153,6 +153,37 @@ def test_pod_spec_has_volume_mounts(self, provisioner_module):
         pod = provisioner_module._build_pod("sandbox-1", "thread-1")
         assert len(pod.spec.containers[0].volume_mounts) == 2
 
+    def test_pod_includes_configured_sandbox_env(self, provisioner_module):
+        """Provisioner should pass optional sandbox memory-saving toggles to Pods."""
+        provisioner_module.SANDBOX_DISABLE_JUPYTER = " TRUE "
+        provisioner_module.SANDBOX_DISABLE_CODE_SERVER = "true"
+
+        pod = provisioner_module._build_pod("sandbox-1", "thread-1")
+
+        env = {item.name: item.value for item in pod.spec.containers[0].env}
+        assert env == {
+            "DISABLE_JUPYTER": "true",
+            "DISABLE_CODE_SERVER": "true",
+        }
+
+    def test_pod_omits_disabled_sandbox_env(self, provisioner_module):
+        """False values should preserve the sandbox image defaults."""
+        provisioner_module.SANDBOX_DISABLE_JUPYTER = "false"
+        provisioner_module.SANDBOX_DISABLE_CODE_SERVER = ""
+
+        pod = provisioner_module._build_pod("sandbox-1", "thread-1")
+
+        assert pod.spec.containers[0].env == []
+
+    def test_pod_omits_non_true_sandbox_env_values(self, provisioner_module):
+        """Only explicit true should opt in to sandbox service disable flags."""
+        provisioner_module.SANDBOX_DISABLE_JUPYTER = "1"
+        provisioner_module.SANDBOX_DISABLE_CODE_SERVER = "yes"
+
+        pod = provisioner_module._build_pod("sandbox-1", "thread-1")
+
+        assert pod.spec.containers[0].env == []
+
     def test_pod_pvc_mode_uses_user_scoped_subpath(self, provisioner_module):
         """Pod should use a user-scoped subPath for PVC user-data."""
         provisioner_module.SKILLS_PVC_NAME = "skills-pvc"