Skip to content

[Native Image] pthread_setspecific: wrong arguments crash on shutdown — riscv64 only, root cause in PosixPlatformThreads.createUnmanagedThreadLocal() #13386

@gounthar

Description

@gounthar

Describe the Issue

Running a native-image binary on riscv64, the program runs fine but crashes on the way out:

Hello, riscv64!
pthread_setspecific(key, value): wrong arguments

The Hello, riscv64! line appears, so the binary ran to completion. Shutdown is what fails. Exit code is non-zero; glibc aborts from inside pthread_setspecific.

This seems to be riscv64-only. The same binary compiles and exits cleanly on amd64 and aarch64. I spent a while assuming I'd built something wrong before the disassembly pointed somewhere more specific.

Using the latest version of GraalVM can resolve many issues.

(No riscv64 distribution exists yet; this issue is part of the effort to enable one.)

GraalVM Version

Built from gounthar/graal fork at commit 3575f11303fd (tracking 25.1.0-dev), JAVA_HOME pointing to labs-openjdk jvmci-25.1-b17 built natively on riscv64. I checked PosixPlatformThreads.java in the upstream oracle/graal tree and the same code is there.

Operating System and Version

Linux banana-pi-f3 6.6.66-3-deepin-riscv64 #1 SMP RISC-V GNU/Linux
Debian 13 (Trixie), rv64gc, SpacemiT K1

Troubleshooting Confirmation

Run Command

./helloworld   # native-image binary compiled with --tool:llvm-backend

Expected Behavior

Clean exit, code 0.

Actual Behavior

Hello, riscv64!
pthread_setspecific(key, value): wrong arguments

Non-zero exit (SIGABRT from glibc's pthread_setspecific key validation).

Steps to Reproduce

  1. Build substratevm for riscv64 using labs-openjdk jvmci-25.1-b17 on a native riscv64 machine
  2. Compile a minimal Hello World: native-image --tool:llvm-backend -cp . HelloWorld -o helloworld
  3. Run ./helloworld

Additional Context

What I think is happening

As best I can tell, the crash path is: IsolateThreadCache.clear() -> PosixPlatformThreads.setUnmanagedThreadLocalValue() -> pthread_setspecific() with a garbage key.

I think the issue is in PosixPlatformThreads.createUnmanagedThreadLocal(). It calls StackValue.get(Pthread.pthread_key_tPointer.class) to allocate stack space for the key. Because pthread_key_tPointer is @CPointerTo(nameOfCType = "size_t"), StackValue allocates 8 bytes. pthread_key_create() then writes 4 bytes (the actual pthread_key_t, which is unsigned int on Linux). The subsequent key.read() does an 8-byte load, and the upper 32 bits are whatever happened to be on the stack.

Where amd64 and riscv64 seem to diverge: amd64 glibc validates the key with a 32-bit cmpl, so the garbage upper bits are never examined. riscv64 glibc uses a 64-bit bltu, so a key value like 0x????????00000002 looks >= 1024, pthread_setspecific returns EINVAL, and glibc aborts.

The disassembly seems consistent with this (from objdump -d):

addi a0,s0,-48          # a0 = &stack_slot (8 bytes, from StackValue.get)
jalr  (pthread_key_create@plt)   # writes 4 bytes into the slot
ld    s2,-48(s0)        # 8-byte load, upper 4 bytes are stack garbage
...
mv    a0,s2             # key with garbage upper bits
jalr  (pthread_setspecific@plt)  # EINVAL on riscv64, silent pass on amd64

What I tried

Zeroing the stack slot before calling pthread_key_create. In PosixPlatformThreads.java, createUnmanagedThreadLocal():

Pthread.pthread_key_tPointer key = StackValue.get(Pthread.pthread_key_tPointer.class);
// pthread_key_t is unsigned int (4 bytes) but StackValue allocates size_t (8 bytes).
// Zero the slot so the upper 4 bytes are clean on riscv64.
((WordPointer) key).write(Word.zero());
PosixUtils.checkStatusIs0(Pthread.pthread_key_create(key, Word.nullPointer()), ...);

With this change, the binary ran and exited cleanly on the F3. I'm not confident this is the right fix though. The deeper issue may be that pthread_key_tPointer is typed as size_t when the underlying C type is unsigned int; fixing the typedef might be cleaner, though I haven't worked out whether that ripples elsewhere. I'd rather flag it and be wrong than stay quiet.

Worth checking other StackValue.get(Pthread.pthread_key_tPointer.class) call sites too; if this pattern appears elsewhere, the same truncation issue would be there.

I'm not a SubstrateVM contributor and this is my first real read of this part of the codebase. I came at it from the riscv64 side, trying to get native-image working on the Banana Pi F3 as part of getting GraalVM onto riscv64 at all. I've signed (or will sign before this is acted on) the OCA.

If the fix is in the wrong place or the wrong approach, I'm happy to test patches on hardware; the F3 is available.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions