cts: load a bare HSA code object and fix CPU-coupled lifecycle tests#10
Open
zjgarvey wants to merge 1 commit into
Open
cts: load a bare HSA code object and fix CPU-coupled lifecycle tests#10zjgarvey wants to merge 1 commit into
zjgarvey wants to merge 1 commit into
Conversation
The executable CTS compiled its noop kernel with `clang++ -x hip
--offload-device-only -c`, which wraps the gfx code object in a
__CLANG_OFFLOAD_BUNDLE__ fat binary. The native loader wants a bare HSA code
object and rejected it ("does not begin with ELF magic"). Pass
--no-gpu-bundle-output so clang emits the unwrapped object.
A raw code object carries no HAL reflection, so the loader projects the
kernel's whole kernarg segment as dispatch constants (kernarg_segment_size /
sizeof(uint32_t); the COV5 implicit kernarg block makes this 64 even for a
no-argument kernel). Assert constant_count > 0 rather than == 0, and feed that
many zero-filled constants to the dispatch, which validates the size.
The lifecycle suite's CPU-coupled cases assumed main.cpp had initialized the
CPU accelerator, which it only does on a cpu:N device spec, so they failed on
gpu:0. "CPU init and shutdown" is CPU-specific (cpu_device_count) and now skips
on non-CPU runs; "Double init returns ALREADY_EXISTS" initializes whichever
accelerator matches the test device, keeping coverage on both.
Verified on gpu:0 (MI300X/gfx942): hrx_cts_executable passes (22 assertions);
hrx_cts_lifecycle 3 passed / 1 skipped. cpu:0: executable returns early
(GPU-only), lifecycle 6 assertions pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The executable CTS compiled its noop kernel with
clang++ -x hip --offload-device-only -c, which wraps the gfx code object in a CLANG_OFFLOAD_BUNDLE fat binary. The native loader wants a bare HSA code object and rejected it ("does not begin with ELF magic"). Pass --no-gpu-bundle-output so clang emits the unwrapped object.A raw code object carries no HAL reflection, so the loader projects the kernel's whole kernarg segment as dispatch constants (kernarg_segment_size / sizeof(uint32_t); the COV5 implicit kernarg block makes this 64 even for a no-argument kernel). Assert constant_count > 0 rather than == 0, and feed that many zero-filled constants to the dispatch, which validates the size.
The lifecycle suite's CPU-coupled cases assumed main.cpp had initialized the CPU accelerator, which it only does on a cpu:N device spec, so they failed on gpu:0. "CPU init and shutdown" is CPU-specific (cpu_device_count) and now skips on non-CPU runs; "Double init returns ALREADY_EXISTS" initializes whichever accelerator matches the test device, keeping coverage on both.
Verified on gpu:0 (MI300X/gfx942): hrx_cts_executable passes (22 assertions); hrx_cts_lifecycle 3 passed / 1 skipped. cpu:0: executable returns early (GPU-only), lifecycle 6 assertions pass.