Skip to content

cts: load a bare HSA code object and fix CPU-coupled lifecycle tests#10

Open
zjgarvey wants to merge 1 commit into
users/awoloszyn/updated_streamingfrom
users/zjgarvey/cts-gpu-test-fixes
Open

cts: load a bare HSA code object and fix CPU-coupled lifecycle tests#10
zjgarvey wants to merge 1 commit into
users/awoloszyn/updated_streamingfrom
users/zjgarvey/cts-gpu-test-fixes

Conversation

@zjgarvey
Copy link
Copy Markdown
Contributor

@zjgarvey zjgarvey commented May 27, 2026

The executable CTS compiled its noop kernel with clang++ -x hip --offload-device-only -c, which wraps the gfx code object in a CLANG_OFFLOAD_BUNDLE fat binary. The native loader wants a bare HSA code object and rejected it ("does not begin with ELF magic"). Pass --no-gpu-bundle-output so clang emits the unwrapped object.

A raw code object carries no HAL reflection, so the loader projects the kernel's whole kernarg segment as dispatch constants (kernarg_segment_size / sizeof(uint32_t); the COV5 implicit kernarg block makes this 64 even for a no-argument kernel). Assert constant_count > 0 rather than == 0, and feed that many zero-filled constants to the dispatch, which validates the size.

The lifecycle suite's CPU-coupled cases assumed main.cpp had initialized the CPU accelerator, which it only does on a cpu:N device spec, so they failed on gpu:0. "CPU init and shutdown" is CPU-specific (cpu_device_count) and now skips on non-CPU runs; "Double init returns ALREADY_EXISTS" initializes whichever accelerator matches the test device, keeping coverage on both.

Verified on gpu:0 (MI300X/gfx942): hrx_cts_executable passes (22 assertions); hrx_cts_lifecycle 3 passed / 1 skipped. cpu:0: executable returns early (GPU-only), lifecycle 6 assertions pass.

The executable CTS compiled its noop kernel with `clang++ -x hip
--offload-device-only -c`, which wraps the gfx code object in a
__CLANG_OFFLOAD_BUNDLE__ fat binary. The native loader wants a bare HSA code
object and rejected it ("does not begin with ELF magic"). Pass
--no-gpu-bundle-output so clang emits the unwrapped object.

A raw code object carries no HAL reflection, so the loader projects the
kernel's whole kernarg segment as dispatch constants (kernarg_segment_size /
sizeof(uint32_t); the COV5 implicit kernarg block makes this 64 even for a
no-argument kernel). Assert constant_count > 0 rather than == 0, and feed that
many zero-filled constants to the dispatch, which validates the size.

The lifecycle suite's CPU-coupled cases assumed main.cpp had initialized the
CPU accelerator, which it only does on a cpu:N device spec, so they failed on
gpu:0. "CPU init and shutdown" is CPU-specific (cpu_device_count) and now skips
on non-CPU runs; "Double init returns ALREADY_EXISTS" initializes whichever
accelerator matches the test device, keeping coverage on both.

Verified on gpu:0 (MI300X/gfx942): hrx_cts_executable passes (22 assertions);
hrx_cts_lifecycle 3 passed / 1 skipped. cpu:0: executable returns early
(GPU-only), lifecycle 6 assertions pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stellaraccident stellaraccident requested a review from AWoloszyn May 29, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant