fix: only add string_view life support for transient sources#6096
fix: only add string_view life support for transient sources#6096henryiii wants to merge 3 commits into
Conversation
b8a651e to
a46ca00
Compare
13bc9b4 to
c31659c
Compare
| // a bound function (no life support frame) the latter cannot be made safe, so `add_patient` | ||
| // rightly throws rather than producing a dangling view. | ||
| inline bool &loading_from_transient_source() { | ||
| static thread_local bool flag = false; |
There was a problem hiding this comment.
This seems like an ugly hack. Surely claude can suggest a different way?
There was a problem hiding this comment.
We don't pass anything through, though? Didn't seem obvious to me but I can poke at it a bit.
|
Should we rollback #6092 until this or similar solution is submitted? The bug is real but the problems caused by that change are not worth it with this one. |
I haven't look at this PR yet, but rolling back #6092 sounds like the right step, to cleanly go back to the original, long-lived baseline. @charlesbeattie if you send the rollback, I'll merge it straightaway. |
…ind#6092)" This re-applies pybind#6092 (reverting pybind#6097) so the follow-up fixes in this PR can build on it. Assisted-by: ClaudeCode:claude-opus-4.8
PR pybind#6092 added loader_life_support::add_patient(src) to keep the source object alive when loading a string view, fixing a real use-after-free when a container of views is built from a non-sequence iterable (e.g. a generator): list_caster materializes a temporary tuple that owns the strings and destroys it when load() returns, before the bound function body runs. add_patient throws when there is no life support frame, so casting to a view outside a bound function (e.g. a manual py::cast<std::string_view>) now raises instead of relying on the caller-owned source, a regression from pybind#6092. For these view-into-src cases registration is best effort: inside a bound function it keeps src alive (fixing the UAF), and outside one the caller owns src's lifetime as before. Add try_add_patient(), which returns false instead of throwing when there is no frame, and use it at the three view load sites. add_patient() keeps its strict contract for value-creating conversions. Assisted-by: ClaudeCode:claude-opus-4.8
Refine the previous commit. Best-effort registration (try_add_patient) silently produces a dangling view when a container of views is built from a generator outside a bound function: there the materialized temporary is released before the view is used, and with no frame nothing keeps it alive. Such a cast cannot be made safe, so it should fail loudly, while a view into a durable, caller-owned object needs no life support at all. The view caster cannot tell a durable source from a pybind11-managed transient one; that provenance lives in the container caster. Introduce an ambient transient_source_guard that the list, set, map, and array casters set around their generator/materialized paths, and have the string caster keep the source alive only when loading from a transient source (via the throwing add_patient, so try_add_patient is no longer needed). This means: - views into durable sources (direct arguments, sequences, manual casts) add no life support and no longer throw outside a bound function, and - a generator used outside a frame throws, rather than silently dangling. The guard restores (rather than clears) the previous value, so a durable container nested in a transient one is correctly treated as transient. Verified with AddressSanitizer: the in-frame generator case is clean, the out-of-frame durable cases succeed, and the out-of-frame generator case throws. Assisted-by: ClaudeCode:claude-opus-4.8
c31659c to
e18b834
Compare
|
🤖 AI text below 🤖 I now fully understand the mechanism. Here's the key finding: The There's a much simpler alternative. if (IsView && loader_life_support::active()) {
loader_life_support::add_patient(src);
}This deletes the thread-local flag, the There's exactly one behavioral tradeoff, which is the whole reason the guard was introduced, so I want to confirm it with you before finalizing the plan. The guard-free approach (gate |
|
(The guard-free approach is commit 2 already, so leaving commit 3 so reviewers can see both approaches) |
🤖 AI text below 🤖
Description
Re-applies #6092 and fixes the regression that caused it to be reverted.
#6092 was reverted on master in #6097 because, while it fixed a real use-after-free, it also introduced a regression (details below). This PR re-introduces #6092 and then corrects it, so it is structured as three commits:
Background
#6092 added
loader_life_support::add_patient(src)whenever a string view (std::string_view) is loaded, to fix a real use-after-free: when a container of views is built from a non-sequence iterable (e.g. a generator), the container caster materializes a temporary (atuple/dict, or the generator's per-iteration items) that owns the strings and releases it before the bound function body runs, leaving the views dangling. Confirmed under AddressSanitizer (freed-by:pybind11::tuple::~tuple()→tupledealloc→unicode_dealloc; UAF read in the function body). So the fix is needed — simply reverting #6092 reintroduces the UAF.However,
add_patientthrows when there is no life support frame (i.e. outside a bound function dispatch), so casting a view via a manualpy::cast<std::string_view>(obj)in helper/embedding code started throwing even though the caller ownsobj— a regression hit in real codebases. That regression is why #6092 was reverted.Approach
The view caster cannot tell a durable, caller-owned source from a transient, pybind11-managed one (a tuple materialized from a generator, or a generator element the iterator only borrows). That provenance lives in the container caster. This PR introduces an ambient
detail::transient_source_guardthat thelist,set,map, andarraycasters set around their generator/materialized paths, and the string caster keeps the source alive only when loading from a transient source.Result:
This addresses both goals: life support is added only when actually needed, and a generator used outside a frame fails loudly instead of silently dangling. The guard restores (rather than clears) the previous value, so a durable container nested in a transient one is correctly treated as transient.
The pre-existing
add_patient(utfNbytes)for UTF-16/32 views stays unconditional: there the view points into a freshly re-encoded temporary that always needs life support regardless of source.Tests
test_interpreter.cpp(no-frame context, whichm.deflambdas can't reproduce since they always have a frame): casting a view from a durable source (str/bytes/bytearray/list) outside a bound function succeeds; castingstd::vector<std::string_view>from a generator outside a bound function throwscast_error.test_stlgenerator test continues to cover the in-frame UAF (CI runs it under ASAN).Verified locally with a standalone libpython+ASAN harness across the full matrix above: in-frame generator is ASAN-clean, out-of-frame durable cases succeed, out-of-frame generator throws.
Suggested changelog entry:
std::string_view) are now kept alive only when loaded from a transient source (such as a generator), fixing both a use-after-free and a regression where casting a view from a durable object outside a bound function would throw.