fix(litellm): detect GPT-5 family when model has openai/ or azure/ prefix#2401
fix(litellm): detect GPT-5 family when model has openai/ or azure/ prefix#2401sizickp wants to merge 4 commits into
Conversation
GPT-5 codex models only accept `temperature=1`. The handler converted
`temperature` into `reasoning_effort` only when the model string started
with `gpt-5`, but users frequently set the full provider-qualified name
in their config (e.g. `openai/gpt-5.1-codex-max`). That prefix made the
`startswith('gpt-5')` check fail, so `temperature=0.2` was sent and
litellm rejected the request with `UnsupportedParamsError`. The primary
model then always fell back to the secondary one — silently downgrading
quality and doubling per-call latency/cost.
Normalize the model name by stripping the `openai/` / `azure/` prefix
before the GPT-5 check, and rebuild the routed name with the original
provider so Azure routing is preserved.
Adds regression tests for `openai/gpt-5*` (including `codex-max`) and
for the `_thinking` suffix together with a prefix.
Review Summary by QodoFix GPT-5 detection with provider-prefixed model names
WalkthroughsDescription• Strip provider prefix before GPT-5 detection to handle prefixed model names • Preserve original provider prefix in routed model name for Azure compatibility • Add regression tests for openai/gpt-5* variants and _thinking suffix handling Diagramflowchart LR
A["Model name with prefix<br/>openai/gpt-5.1-codex-max"] -->|removeprefix| B["Normalized base<br/>gpt-5.1-codex-max"]
B -->|startswith check| C["GPT-5 path triggered"]
C -->|set reasoning_effort| D["Drop temperature"]
D -->|rebuild with prefix| E["Route to openai/gpt-5.1-codex-max"]
File Changes1. pr_agent/algo/ai_handlers/litellm_ai_handler.py
|
Code Review by Qodo
1.
|
| model_base = model.removeprefix('openai/').removeprefix('azure/') | ||
| if model_base.startswith('gpt-5'): | ||
| # Use configured reasoning_effort or default to MEDIUM |
There was a problem hiding this comment.
1. Azure+prefix skips gpt-5 🐞 Bug ≡ Correctness
In Azure mode, chat_completion() prepends azure/ before computing model_base, and the new removeprefix() chain only removes a single leading provider prefix. If the configured model is already provider-qualified (e.g. openai/gpt-5* or azure/gpt-5*), model_base can still start with openai/ or azure/, so the GPT-5 branch is skipped and temperature is sent (triggering the same LiteLLM UnsupportedParamsError this PR aims to fix).
Agent Prompt
### Issue description
In `LiteLLMAIHandler.chat_completion()`, GPT-5 detection relies on `model_base` computed via `removeprefix('openai/').removeprefix('azure/')`. When `self.azure` is enabled, the function prepends `azure/` to the model first; if the incoming model already has a provider prefix, the resulting string can contain multiple provider segments (e.g. `azure/openai/gpt-5...` or `azure/azure/gpt-5...`). The current normalization removes at most one leading prefix, so `model_base` may still begin with a provider segment and fail `startswith('gpt-5')`, causing `temperature` to be passed to GPT-5.
### Issue Context
Models come directly from user configuration/fallback lists and can be provider-qualified strings.
### Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[409-413]
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[433-461]
- pr_agent/algo/pr_processing.py[341-354]
### Suggested fix
Refactor provider handling so you do not build composite provider prefixes:
1) Determine/normalize the provider prefix separately from the model name (don’t mutate `model` with `'azure/' + model` before parsing).
2) Strip provider prefixes in a loop (or parse once) until the remaining model name is the real base (`gpt-5...`).
3) Then rebuild the final routed model string once (provider + base model), ensuring exactly one provider prefix.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
b7fe4eb to
0d9006b
Compare
|
Persistent review updated to latest commit 0d9006b |
0d9006b to
8c4377f
Compare
|
Persistent review updated to latest commit 8c4377f |
…efixes Address two routing issues raised by the Qodo review on the initial fix: 1. Stacked prefixes (Azure mode + provider-qualified config). With self.azure prepending "azure/" to a model that already had an "openai/" or "azure/" prefix in config, the single removeprefix() call left a residual provider segment, GPT-5 detection failed, and temperature was still sent (the original bug this PR fixes). 2. Explicit "azure/" rewritten to "openai/" when self.azure was false. The previous rebuild used "azure/ if self.azure else openai/", which ignored an explicit "azure/" the user had written in config and silently rerouted the request to OpenAI. Strip provider prefixes in a loop so any number of stacked segments is removed, and choose the rebuilt prefix by priority: Azure mode > explicit prefix in user config > "openai/" default. Capture the user's original model string before the azure auto-prepend so the explicit prefix is still visible at rebuild time. Tests added for: - explicit "azure/" preserved when self.azure is false - Azure mode with bare / openai/- / azure/- prefixed configs all producing exactly one "azure/" prefix
Four `with patch(...)` statements in the GPT-5 provider-prefix tests exceeded the repo's 120-character line limit (ruff line-length=120 in pyproject.toml). Wrap them onto multiple lines so the test file does not regress lint compliance for the newly added cases.
8c4377f to
66026ff
Compare
|
Persistent review updated to latest commit 66026ff |
Address Qodo review feedback: the added Group 8 tests instantiate LiteLLMAIHandler() directly, and its __init__ branches on environment variables such as AWS_USE_IMDS, OPENAI_API_KEY, and the AWS_* set. With those vars potentially leaking from the runner environment, the tests could behave differently across machines or CI runners (e.g. attempt boto3 import when AWS_USE_IMDS is set). Explicitly delete the relevant env vars via monkeypatch.delenv() at the start of each new test, so the constructor takes a deterministic path regardless of where the suite runs.
|
Persistent review updated to latest commit 79513ba |
|
Error running review: usage: ensemble_review.py [-h] [--post] pr |
|
📡 Fetching PR #2401... 🚀 Starting ensemble review with 3 models... ============================================================ 📋 Ensemble PR Review (Critical Issues Only)PR: fix(litellm): detect GPT-5 family when model has openai/ or azure/ prefix 🔷 GPT-5 MiniNo critical issues found Rating: 10/10 — The change is a small, well-contained normalization and reconstruction of the model string to detect GPT-5 regardless of provider prefixes, with tests covering the important cases (openai/ and azure/ prefixes, _thinking suffix, Azure mode). I did not see any of the listed critical problems: no new secrets, no SQL/injection, no race conditions, no unhandled exceptions that would obviously crash (string ops are safe), and behavior is backward-compatible. Confidence: High 🟠 Gemini 3.1 Flash1. Critical IssuesNo critical issues found. The PR correctly addresses a bug where GPT-5 models (which require 2. RatingRating: 10/10 3. ConfidenceConfidence: High 🔵 DeepSeek V4 FlashAnalysisCritical IssuesNo critical issues found. Rating10/10 — The change is minimal, targeted, and well-tested. It correctly handles provider prefixes for GPT-5 model detection without introducing any security vulnerabilities, crashes, data loss, breaking changes, race conditions, injection flaws, hardcoded secrets, or unhandled exceptions. The logic for stripping and rebuilding prefixes is sound and matches the described intent. All edge cases (stacked prefixes, Azure mode, explicit prefixes, ConfidenceHigh — The diff is small, the logic is straightforward, and the test suite comprehensively validates the new behavior. No hidden risks are evident. 🤖 Review generated via ensemble of 3 models (DeepSeek V4 Flash, Gemini 3 Flash, GPT-5 Mini) |
Summary
GPT-5 codex models only accept
temperature=1. The LiteLLM handler swapstemperatureforreasoning_effortonly when the model string starts withgpt-5, but users frequently put the full provider-qualified name in[config].model(e.g.openai/gpt-5.1-codex-max). That prefix makesmodel.startswith('gpt-5')returnFalse, so the handler still sendstemperature=0.2and litellm rejects the call:The primary model always errors, and PR-Agent silently falls back to the secondary model on every invocation — degrading answer quality vs. the configured model, doubling per-call latency, and burning input tokens on a request that can never succeed.
Fix
In
pr_agent/algo/ai_handlers/litellm_ai_handler.py::chat_completion, normalize the model name by stripping theopenai//azure/prefix before the GPT-5 detection, then rebuild the routed name with the original provider prefix (so Azure routing is preserved whenself.azureis set).This is a minimal, surgical change — no behavior changes for unprefixed names, no widening of the heuristic; just the prefix-aware detection.
Repro
configuration.toml:Before: every call logs the warning above and falls back. After: the GPT-5 branch runs,
reasoning_effortis set,temperatureis dropped, and the primary model is used.Test plan
test_gpt5_with_openai_prefix_triggers_reasoning_effortcoveringopenai/gpt-5,openai/gpt-5.1-codex,openai/gpt-5.1-codex-max,openai/gpt-5.4-mini. Assertsreasoning_effortis set,temperatureis not in kwargs, and the model name is not double-prefixed (openai/openai/...).test_gpt5_with_openai_prefix_strips_thinking_suffixforopenai/gpt-5_thinking— verifies suffix is removed and provider prefix kept exactly once.test_litellm_*suites still pass (68 tests).