[CI] run_llama.sh: Replace llama-cli usage with llama-bench#2231
Merged
mhalk merged 1 commit intoMay 26, 2026
Merged
Conversation
llama-cli has a tendency to hang during prefetch/load step llama-bench can handle this step in a more robust manner - Note: cold and warm caches provide the same perf results Fixed current / changed cache directory structure conformity Support fallback as before
Contributor
Author
|
Let's see how this performs :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
llama-cli has a tendency to hang during prefetch/load step llama-bench can handle this step in a more robust manner
Fixed current / changed cache directory structure conformity Support fallback as before
Motivation
Intermittent hangs of
llama-cliprocesses during prefetch step.Technical Details
This avoids the
llamacpp-perfhang by removing the normal-pathllama-cli -hf ... --prompt /exit -stprefetch step.Modern
llama-benchsupports-hfdirectly, so when no cached GGUF is available the script now letsllama-benchresolve/download and benchmark the requested HF model itself.The existing cached-model behavior is preserved: if cached GGUF files are found, the script still benchmarks them through the existing
llama-bench -mloop. The fallback also handles the current HF cache layout by finding symlinked snapshot GGUFs.Test Plan
Run script with [warm, cold] cache x [with, without] fix; 100 times each.
Document hangs/timeouts and reported performance results.
Sporadic runs of the nightly scripts.
Test Result
Tested on
gfx950:Without the fix, timeouts stopped in
llama-cli -hf ... --prompt /exit -statLoading model..., before benchmark rows were emitted. With the fix, that prefetch path is avoided while successful benchmark output remains in the same format and has roughly equivalent performance.The single "cold-fix" timeout seems to have had slow download speed (chose a rather narrow timeout setting of 42s).
The returncode was
-15while the other hanging returncodes were all-9.All manual nightly script runs (10) returned results in the expected time and range.
Submission Checklist