Skip to content

fix(evaluation): return last assistant message on max_iterations fallback#173

Open
Ricardo-M-L wants to merge 1 commit into
agent-infra:mainfrom
Ricardo-M-L:fix/agent-loop-max-iterations-fallback
Open

fix(evaluation): return last assistant message on max_iterations fallback#173
Ricardo-M-L wants to merge 1 commit into
agent-infra:mainfrom
Ricardo-M-L:fix/agent-loop-max-iterations-fallback

Conversation

@Ricardo-M-L
Copy link
Copy Markdown

Summary

Fixes #172.

When the evaluation agent loop hits max_iterations, both AzureOpenAIAgentLoop and OpenAIAgentLoop returned messages[-1].get(\"content\", \"\"). Since each tool call appends a {\"role\": \"tool\", ...} entry to messages, the final element is typically a tool response — not the assistant's reasoning — so the evaluation harness treats raw tool output as the agent's answer.

This PR walks messages in reverse to find the last assistant message with non-empty content, falling back to \"\" only if none exists.

Changes

  • evaluation/agent_loop.py — apply the fallback walk in both agent loops (identical 4-line change in each)
  • evaluation/tests/test_max_iterations_fallback.py — 3 unit tests per loop covering:
    • last message is a tool response → returns earlier assistant content
    • only tool responses present → returns \"\"
    • last message is assistant → returned directly

Notes

🤖 Generated with Claude Code

…back

When the agent loop hits `max_iterations` without producing a final
answer, both `AzureOpenAIAgentLoop` and `OpenAIAgentLoop` returned
`messages[-1].get("content", "")`. But the loop appends a
`{"role": "tool", ...}` message after each tool call, so the last
message is often a tool response — not the assistant's last
reasoning/answer.

Walk the list in reverse to find the last assistant message with
non-empty content instead, falling back to "" only if none exists.

Also add unit tests covering:
 - last message is a tool response -> returns earlier assistant content
 - only tool responses present -> returns ""
 - last message is assistant -> returned directly

Broken out from the closed PR agent-infra#167 per the "one fix per PR" policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

evaluation: agent loop max_iterations fallback returns tool response instead of assistant message

1 participant