Skip to content

fix: only add string_view life support for transient sources#6096

Open
henryiii wants to merge 3 commits into
pybind:masterfrom
henryiii:fix-string-view-life-support-no-frame
Open

fix: only add string_view life support for transient sources#6096
henryiii wants to merge 3 commits into
pybind:masterfrom
henryiii:fix-string-view-life-support-no-frame

Conversation

@henryiii

@henryiii henryiii commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

🤖 AI text below 🤖

Description

Re-applies #6092 and fixes the regression that caused it to be reverted.

#6092 was reverted on master in #6097 because, while it fixed a real use-after-free, it also introduced a regression (details below). This PR re-introduces #6092 and then corrects it, so it is structured as three commits:

  1. revert the revert — re-applies fix: add life support to handles cast to string_view #6092 on top of the current (reverted) master.
  2. don't throw from string_view life support outside a bound function
  3. only add string_view life support for transient sources

Background

#6092 added loader_life_support::add_patient(src) whenever a string view (std::string_view) is loaded, to fix a real use-after-free: when a container of views is built from a non-sequence iterable (e.g. a generator), the container caster materializes a temporary (a tuple/dict, or the generator's per-iteration items) that owns the strings and releases it before the bound function body runs, leaving the views dangling. Confirmed under AddressSanitizer (freed-by: pybind11::tuple::~tuple()tupledeallocunicode_dealloc; UAF read in the function body). So the fix is needed — simply reverting #6092 reintroduces the UAF.

However, add_patient throws when there is no life support frame (i.e. outside a bound function dispatch), so casting a view via a manual py::cast<std::string_view>(obj) in helper/embedding code started throwing even though the caller owns obj — a regression hit in real codebases. That regression is why #6092 was reverted.

Approach

The view caster cannot tell a durable, caller-owned source from a transient, pybind11-managed one (a tuple materialized from a generator, or a generator element the iterator only borrows). That provenance lives in the container caster. This PR introduces an ambient detail::transient_source_guard that the list, set, map, and array casters set around their generator/materialized paths, and the string caster keeps the source alive only when loading from a transient source.

Result:

source inside bound function outside (no frame)
durable (direct arg, list/set/dict, manual cast) no life support added works (no throw)
transient (generator/iterator) kept alive via life support throws (cannot be made safe)

This addresses both goals: life support is added only when actually needed, and a generator used outside a frame fails loudly instead of silently dangling. The guard restores (rather than clears) the previous value, so a durable container nested in a transient one is correctly treated as transient.

The pre-existing add_patient(utfNbytes) for UTF-16/32 views stays unconditional: there the view points into a freshly re-encoded temporary that always needs life support regardless of source.

Tests

  • New Catch2 tests in test_interpreter.cpp (no-frame context, which m.def lambdas can't reproduce since they always have a frame): casting a view from a durable source (str/bytes/bytearray/list) outside a bound function succeeds; casting std::vector<std::string_view> from a generator outside a bound function throws cast_error.
  • fix: add life support to handles cast to string_view #6092's existing test_stl generator test continues to cover the in-frame UAF (CI runs it under ASAN).

Verified locally with a standalone libpython+ASAN harness across the full matrix above: in-frame generator is ASAN-clean, out-of-frame durable cases succeed, out-of-frame generator throws.

Suggested changelog entry:

  • String views (e.g. std::string_view) are now kept alive only when loaded from a transient source (such as a generator), fixing both a use-after-free and a regression where casting a view from a durable object outside a bound function would throw.

@henryiii henryiii force-pushed the fix-string-view-life-support-no-frame branch from b8a651e to a46ca00 Compare June 26, 2026 15:12
@henryiii henryiii changed the title fix: don't throw from string_view life support outside a bound function fix: only add string_view life support for transient sources Jun 26, 2026
@henryiii henryiii force-pushed the fix-string-view-life-support-no-frame branch 3 times, most recently from 13bc9b4 to c31659c Compare June 26, 2026 16:05
// a bound function (no life support frame) the latter cannot be made safe, so `add_patient`
// rightly throws rather than producing a dangling view.
inline bool &loading_from_transient_source() {
static thread_local bool flag = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an ugly hack. Surely claude can suggest a different way?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't pass anything through, though? Didn't seem obvious to me but I can poke at it a bit.

@charlesbeattie

Copy link
Copy Markdown
Contributor

Should we rollback #6092 until this or similar solution is submitted? The bug is real but the problems caused by that change are not worth it with this one.

@rwgk

rwgk commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Should we rollback #6092 until this or similar solution is submitted? The bug is real but the problems caused by that change are not worth it with this one.

I haven't look at this PR yet, but rolling back #6092 sounds like the right step, to cleanly go back to the original, long-lived baseline. @charlesbeattie if you send the rollback, I'll merge it straightaway.

henryiii added a commit that referenced this pull request Jun 29, 2026
)

This reverts commit 59d7cb2 (#6092).

#6092 introduced regressions that are being addressed in #6096; reverting

Assisted-by: ClaudeCode:claude-opus-4.8
henryiii added 3 commits June 29, 2026 00:42
…ind#6092)"

This re-applies pybind#6092 (reverting pybind#6097) so the follow-up fixes in this PR can build on it.

Assisted-by: ClaudeCode:claude-opus-4.8
PR pybind#6092 added loader_life_support::add_patient(src) to keep the source
object alive when loading a string view, fixing a real use-after-free when
a container of views is built from a non-sequence iterable (e.g. a
generator): list_caster materializes a temporary tuple that owns the
strings and destroys it when load() returns, before the bound function
body runs.

add_patient throws when there is no life support frame, so casting to a
view outside a bound function (e.g. a manual py::cast<std::string_view>)
now raises instead of relying on the caller-owned source, a regression
from pybind#6092.

For these view-into-src cases registration is best effort: inside a bound
function it keeps src alive (fixing the UAF), and outside one the caller
owns src's lifetime as before. Add try_add_patient(), which returns false
instead of throwing when there is no frame, and use it at the three view
load sites. add_patient() keeps its strict contract for value-creating
conversions.

Assisted-by: ClaudeCode:claude-opus-4.8
Refine the previous commit. Best-effort registration (try_add_patient)
silently produces a dangling view when a container of views is built from
a generator outside a bound function: there the materialized temporary is
released before the view is used, and with no frame nothing keeps it
alive. Such a cast cannot be made safe, so it should fail loudly, while a
view into a durable, caller-owned object needs no life support at all.

The view caster cannot tell a durable source from a pybind11-managed
transient one; that provenance lives in the container caster. Introduce an
ambient transient_source_guard that the list, set, map, and array casters
set around their generator/materialized paths, and have the string caster
keep the source alive only when loading from a transient source (via the
throwing add_patient, so try_add_patient is no longer needed). This means:

- views into durable sources (direct arguments, sequences, manual casts)
  add no life support and no longer throw outside a bound function, and
- a generator used outside a frame throws, rather than silently dangling.

The guard restores (rather than clears) the previous value, so a durable
container nested in a transient one is correctly treated as transient.

Verified with AddressSanitizer: the in-frame generator case is clean, the
out-of-frame durable cases succeed, and the out-of-frame generator case
throws.

Assisted-by: ClaudeCode:claude-opus-4.8
@henryiii henryiii force-pushed the fix-string-view-life-support-no-frame branch from c31659c to e18b834 Compare June 29, 2026 04:43
@henryiii

Copy link
Copy Markdown
Collaborator Author

🤖 AI text below 🤖

I now fully understand the mechanism. Here's the key finding:

The transient_source_guard exists for exactly one reason: to carry "is this source transient?" provenance from the container casters down into the string_caster, so the view caster can throw loudly when a generator-sourced view is cast outside a bound function (no life-support frame).

There's a much simpler alternative. loader_life_support already knows whether a frame is active (tls_current_frame() != nullptr); it just doesn't expose it. The view caster's only problem in #6092 was that add_patient throws when there's no frame. So instead of distinguishing transient vs. durable, we can simply gate the call on whether a frame exists:

if (IsView && loader_life_support::active()) {
    loader_life_support::add_patient(src);
}

This deletes the thread-local flag, the transient_source_guard class, and all four stl.h container edits — collapsing the change back to roughly #6092 plus a one-line guard and a small accessor.

There's exactly one behavioral tradeoff, which is the whole reason the guard was introduced, so I want to confirm it with you before finalizing the plan.

The guard-free approach (gate add_patient on frame presence) changes one edge case: casting a view from a generator/iterator OUTSIDE a bound function (e.g. a manual py::cast<vector<string_view>>(gen) at top level). With the guard it throws a clear cast_error; without it, it silently produces a dangling view again (same as pre-#6092). The in-frame case — real function arguments, which is the actual UAF #6092 fixed — is fully safe either way. Is losing that loud out-of-frame throw acceptable? (Drop guard, accept tradeoff / Keep the guard)

@henryiii

Copy link
Copy Markdown
Collaborator Author

(The guard-free approach is commit 2 already, so leaving commit 3 so reviewers can see both approaches)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants