Skip to content

fix(adapters): harden native hooks — edge-case tests + CHANGELOG [v2]#1608

Closed
miyannishar wants to merge 2 commits into
microsoft:mainfrom
miyannishar:fix/native-hooks-review-v2
Closed

fix(adapters): harden native hooks — edge-case tests + CHANGELOG [v2]#1608
miyannishar wants to merge 2 commits into
microsoft:mainfrom
miyannishar:fix/native-hooks-review-v2

Conversation

@miyannishar
Copy link
Copy Markdown
Collaborator

Summary

This is the successor to #1593, rebased on main and addressing every issue raised in that review.

Closes / relates to #1571 (deprecate wrap() API in favour of native hooks).


What changed

🧪 Test coverage

All four adapter test suites now cover the edge cases flagged by the test-generator and docs-sync bots in #1593:

Anthropic (test_anthropic_hooks.py)

  • UUID session-ID format assertion (ant-hook-<hex12>)
  • Uniqueness across 10 rapid-construction instances
  • empty messages list does not raise
  • Blocked pattern detected in assistant-role messages
  • Client exceptions propagate without mutating token state
  • Cumulative token tracking across 3 calls
  • Exact-1 DeprecationWarning assertion for both wrap() and wrap_client()

Semantic Kernel (test_semantic_kernel_hooks.py)

  • Context key lookup updated to UUID-keyed _ctx (was hardcoded 'sk-filter')
  • Cross-filter isolation: exhausting f1 does not affect f2
  • Wildcard Plugin.* allowlist scope
  • Nested dict / None arguments handled without crash
  • call_count assertions corrected (filter + pre_execute each increment → ×2 per invocation)
  • Error-match patterns corrected to actual message text

PydanticAI (test_pydantic_ai_hooks.py)

  • Empty-string prompt passes guard
  • Long clean prompt (≥ 20 kB) passes guard
  • Long prompt with buried blocked pattern is caught
  • Multiple tool calls sequence: before_tool_execute × 3 with distinct call_number audit entries
  • Audit log .audit_log returns a copy — mutation does not affect internal state
  • None tool result returned unchanged
  • Full lifecycle integration test asserting exact event sequence

conftest.py

  • Pre-stubs llama_index module tree so all four suites can be collected on Python 3.9 (CI uses 3.10+, where the real package is available)

📖 CHANGELOG

Added [Unreleased] entries for:

  • Four new native-hook factory methods across all adapters
  • Deprecation notices for all wrap() variants with migration guidance

Checklist

  • 98/98 tests pass locally
  • Rebased on latest main
  • DeprecationWarning emits exactly once per wrap() call
  • All session IDs use uuid.uuid4() — no time-collision risk
  • CHANGELOG updated

Bot notes addressed: test-generator: missing edge cases, docs-sync: no CHANGELOG entry

Nishar added 2 commits April 29, 2026 20:56
Fixes 5 issues raised by @imran-siddique:

BLOCKING:
- SK wrap_kernel(), PydanticAI wrap(): remove contextlib.suppress(Exception)
  which was silently swallowing ALL exceptions (not just DeprecationWarning).
  Now uses warnings.catch_warnings() alone to suppress only the nested
  DeprecationWarning, so real errors (invalid kernel, policy validation
  failures) propagate correctly.

BUG:
- test_smolagents_hooks: test_blocks_pattern_in_observation incorrectly
  assumed a step with no tool calls but a blocked observation would pass.
  GovernanceStepCallback scans observations unconditionally; the test now
  correctly asserts PolicyViolationError is raised in both cases (no-tool-
  call step and allowed-tool-call step with blocked observation).

WARNINGS:
- Anthropic wrap_client(): suppress nested DeprecationWarning from kernel.wrap()
  so callers see only one warning instead of two.
- Anthropic wrap()/as_message_hook(), SK GovernanceFunctionFilter: replace
  int(time.time()) session IDs with uuid.uuid4().hex[:12] to prevent
  same-second session ID collisions in audit trails.
- SK GovernanceFunctionFilter: use the uuid-based ID as the _contexts dict
  key so calling as_filter() multiple times no longer overwrites the first
  filter's context entry.

Part of: microsoft#1593
…CHANGELOG

Bot-flagged issues resolved:

## Test coverage (docs-sync + test-generator bot)
- anthropic: add UUID session-ID format tests, empty messages list,
  assistant-message blocked-pattern scan, client exception propagation,
  cumulative token tracking, exact-one-deprecation assertion
- semantic_kernel: fix context key lookup (now UUID-keyed, not 'sk-filter'),
  add cross-filter isolation, wildcard scope, nested blocked-arg, None-args,
  correct call_count assertions (increments by 2: filter + pre_execute)
- pydantic_ai: add empty/long prompt edge cases, multiple-tool sequence,
  call_number monotonicity, audit-log immutability (returns copy),
  full lifecycle integration test, repr-updates-after-calls
- conftest: pre-stub llama_index tree so all four suites run on Python 3.9

## CHANGELOG (docs-sync bot)
- Added native hooks entry under [Unreleased] detailing new factory methods
- Added Deprecated section for all wrap() variants with migration notes

All 98 hook tests now pass locally.

Relates to microsoft#1571
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: code-reviewer — Action Items:

TL;DR: 0 blockers, 1 warning. Solid improvements; minor follow-up suggested.

# Sev Issue Where
1 Warn DeprecationWarning suppression logic could mask unrelated warnings anthropic_adapter.py, pydantic_ai_adapter.py, semantic_kernel_adapter.py

Action Items:

  • None.
# Follow-up Warnings Where
1 Review DeprecationWarning suppression to ensure it doesn't unintentionally mask other warnings. Fine as follow-up PRs.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity Change Impact
High Deprecated wrap() and wrap_client() methods across all adapters. Existing code using these methods will encounter DeprecationWarning and will need to migrate to new methods (as_message_hook(), as_filter(), as_step_callback(), as_capability()) before the next major release.
High UUID-based session identifiers replace timestamp-based identifiers. Code relying on the previous session ID format (ant-<timestamp>) may break if it parses or validates session IDs.
Medium Context key lookup updated to UUID-keyed _ctx in Semantic Kernel. Code relying on hardcoded 'sk-filter' keys in wrapper._contexts will no longer work.
Medium Duplicate filter registration isolation in Semantic Kernel. Code assuming shared call counters across multiple filters will need adjustment.
Medium Audit log immutability in PydanticAI. Code attempting to modify the returned audit log directly will no longer affect the internal state.

@github-actions github-actions Bot added documentation Improvements or additions to documentation tests labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: security-scanner — View details

No security issues found.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: test-generator — View details

Test coverage looks good. No gaps identified.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

  • GovernanceMessageHook in anthropic_adapter.py -- missing docstring
  • GovernanceFunctionFilter in semantic_kernel_adapter.py -- missing docstring
  • GovernanceCapability in pydantic_ai_adapter.py -- missing docstring
  • README.md -- section on adapter APIs needs update to reflect new native hooks and deprecations

@github-actions github-actions Bot added the size/L Large PR (< 500 lines) label Apr 30, 2026
@github-actions
Copy link
Copy Markdown

PR Review Summary

Check Status Details
🔍 Code Review ❌ Failed Issues detected
🛡️ Security Scan ✅ Completed Analysis complete
🔄 Breaking Changes ⚠️ Warning See details
📝 Docs Sync ✅ Completed Analysis complete
🧪 Test Coverage ✅ Completed Analysis complete

Verdict: ❌ Changes needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/L Large PR (< 500 lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant