feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI by miyannishar · Pull Request #1593 · microsoft/agent-governance-toolkit

miyannishar · 2026-04-29T21:15:19Z

Summary

Complete the native-hooks migration for all four remaining agent framework adapters: Anthropic, Semantic Kernel, smolagents, and PydanticAI. Each adapter now exposes a factory method that returns a native framework hook — replacing the legacy monkey-patching wrap() pattern with a composable, non-invasive integration path.

Related to #1571

Changes

Anthropic (`anthropic_adapter.py`)

GovernanceMessageHook — stateless hook implementing a create() method that wraps client.messages.create()
- Pre-execution: content scanning, tool allowlist validation, token-limit enforcement
- Post-execution: tool_use block validation, token tracking, audit trail
AnthropicKernel.as_message_hook() — factory method (recommended entry point)
wrap() / wrap_client() — deprecated with DeprecationWarning, directing to as_message_hook()

Semantic Kernel (`semantic_kernel_adapter.py`)

GovernanceFunctionFilter — async callable conforming to SK's add_filter("auto_function_invocation", ...) protocol
- Validates function names against allowlist (supports wildcards)
- Blocked-pattern scanning in arguments
- max_tool_calls enforcement
- Per-invocation audit recording
SemanticKernelWrapper.as_filter() — factory method (recommended entry point)
wrap() / wrap_kernel() — deprecated with DeprecationWarning, directing to as_filter()

Smolagents (`smolagents_adapter.py`)

GovernanceStepCallback — callable implementing __call__(step, agent) for smolagents' `step_callbacks``
- Tool blocklist/allowlist enforcement
- Blocked-pattern scanning in arguments and observations
- max_tool_calls enforcement
- Step-level audit trail
SmolagentsKernel.as_step_callback() — factory method (recommended entry point)
wrap() — deprecated with DeprecationWarning, directing to as_step_callback()

PydanticAI (`pydantic_ai_adapter.py`)

GovernanceCapability — capability implementing lifecycle hooks:
- before_run: prompt content scanning
- before_tool_execute: tool allowlist, blocked patterns, call-count limits
- after_tool_execute: audit recording
- after_run: completion recording
PydanticAIKernel.as_capability() — factory method (recommended entry point)
wrap() — deprecated with DeprecationWarning, directing to as_capability()

Package exports (`init.py`)

Added AnthropicGovernanceHook, SKGovernanceFilter, SmolagentsGovernanceCallback, PydanticAIGovernanceCapability, SmolagentsKernel

Migration Guide

Before (legacy monkey-patching):

# Anthropic
governed = kernel.wrap(client)

# Semantic Kernel  
governed = wrapper.wrap(sk_kernel)

# Smolagents
kernel.wrap(agent)

# PydanticAI
governed = kernel.wrap(agent)

After (native hooks — recommended):

# Anthropic
hook = kernel.as_message_hook()
response = hook.create(client, model="claude-sonnet-4-20250514", ...)

# Semantic Kernel
gov_filter = wrapper.as_filter()
sk_kernel.add_filter("auto_function_invocation", gov_filter)

# Smolagents
callback = kernel.as_step_callback()
agent = CodeAgent(tools=[...], step_callbacks=[callback])

# PydanticAI
capability = kernel.as_capability()
agent = Agent("openai:gpt-4o", capabilities=[capability])

Tests

Adapter	Test File	Test Count
Anthropic	`test_anthropic_hooks.py`	12
Semantic Kernel	`test_semantic_kernel_hooks.py`	10
Smolagents	`test_smolagents_hooks.py`	14
PydanticAI	`test_pydantic_ai_hooks.py`	16

Design Decisions

Non-invasive governance — All hooks integrate at the framework's native lifecycle points (callbacks, filters, capabilities) instead of wrapping/proxying objects, avoiding type-checking issues and SDK breakage.
Backward compatibility — Legacy wrap() methods still work but emit DeprecationWarning with clear migration instructions.
Stateless hooks — Hooks can be freely shared, re-instantiated, or composed without side effects on the underlying framework clients.

…ticAI Complete the native-hooks migration for all four remaining adapters: Anthropic: - Add GovernanceMessageHook + as_message_hook() factory - Pre-execution: content scanning, tool allowlist, token limits - Post-execution: tool_use validation, token tracking, audit - Deprecate wrap() and wrap_client() with migration guidance Semantic Kernel: - Add GovernanceFunctionFilter + as_filter() factory - Uses SK's native add_filter('auto_function_invocation', ...) system - Validates function names, blocked patterns, call counts - Deprecate wrap() and wrap_kernel() with migration guidance Smolagents: - Add GovernanceStepCallback + as_step_callback() factory - Implements step_callbacks protocol: __call__(step, agent) - Validates tool names, blocked patterns, observations - Deprecate wrap() with migration guidance PydanticAI: - Add GovernanceCapability + as_capability() factory - Lifecycle hooks: before/after_run, before/after_tool_execute - Pre-execution policy gating, post-execution drift detection - Deprecate wrap() with migration guidance Package exports: - Export AnthropicGovernanceHook, SKGovernanceFilter, SmolagentsGovernanceCallback, PydanticAIGovernanceCapability from integrations __init__.py Tests: - test_anthropic_hooks.py: 12 tests - test_semantic_kernel_hooks.py: 10 tests - test_smolagents_hooks.py: 14 tests - test_pydantic_ai_hooks.py: 16 tests Part of: microsoft#1571

github-actions · 2026-04-29T21:15:46Z

🤖 AI Agent: security-scanner — View details

No security issues found.

github-actions · 2026-04-29T21:15:47Z

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

GovernanceMessageHook in anthropic_adapter.py -- missing docstring
GovernanceCapability in pydantic_ai_adapter.py -- missing docstring
GovernanceFunctionFilter in semantic_kernel_adapter.py -- missing docstring
README.md -- migration guide needs update to reflect new native hooks
CHANGELOG.md -- missing entry for new native hooks and deprecation of wrap() methods

github-actions · 2026-04-29T21:15:49Z

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity	Change	Impact
Breaking	Deprecated `wrap()` and `wrap_client()` methods in `AnthropicKernel`.	Existing code using these methods will emit `DeprecationWarning`.
Breaking	Deprecated `wrap()` method in `PydanticAIKernel`.	Existing code using this method will emit `DeprecationWarning`.
Breaking	Deprecated `wrap()` method in `SemanticKernelWrapper`.	Existing code using this method will emit `DeprecationWarning`.
Breaking	Deprecated `wrap_kernel()` function in `semantic_kernel_adapter.py`.	Existing code using this function will emit `DeprecationWarning`.
Breaking	Deprecated `wrap()` function in `pydantic_ai_adapter.py`.	Existing code using this function will emit `DeprecationWarning`.

github-actions · 2026-04-29T21:15:54Z

🤖 AI Agent: test-generator — `anthropic_adapter.py`

Test Coverage Analysis

`anthropic_adapter.py`

Existing coverage:
- test_anthropic_hooks.py includes 12 tests covering the GovernanceMessageHook and its create() method.
- Tests validate pre-execution checks (blocked patterns, tool allowlist, token limits), post-execution checks (tool use validation, token tracking), and audit trail recording.
Missing coverage:
- Edge cases for token limits (e.g., exactly at the limit, slightly exceeding).
- Scenarios with no tools specified in the request.
- Handling of malformed messages or tools inputs.
- Behavior when client.messages.create() raises an exception.
Suggested test cases:
1. test_create_token_limit_boundary — Test create() with max_tokens exactly at the policy limit and slightly above it.
2. test_create_no_tools_specified — Validate behavior when no tools are provided in the request.
3. test_create_malformed_messages — Test create() with malformed messages (e.g., missing content field, invalid types).
4. test_create_client_exception — Simulate an exception from client.messages.create() and ensure proper error handling.

`semantic_kernel_adapter.py`

Existing coverage:
- test_semantic_kernel_hooks.py includes 10 tests covering the GovernanceFunctionFilter and its integration with add_filter.
- Tests validate function name allowlist, blocked-pattern scanning in arguments, and max_tool_calls enforcement.
Missing coverage:
- Wildcard support in function name allowlist.
- Handling of invalid or malformed function arguments.
- Behavior when add_filter is called multiple times with the same filter.
Suggested test cases:
1. test_function_name_wildcard_matching — Verify that wildcard patterns in the function name allowlist are correctly matched.
2. test_invalid_function_arguments — Test behavior when function arguments are malformed or missing required fields.
3. test_duplicate_filter_registration — Ensure that registering the same filter multiple times does not cause unexpected behavior.

`smolagents_adapter.py`

Existing coverage:
- test_smolagents_hooks.py includes 14 tests covering the GovernanceStepCallback and its __call__ method.
- Tests validate tool allowlist/blocklist enforcement, blocked-pattern scanning, and step-level audit trail recording.
Missing coverage:
- Scenarios with partial failures in step execution.
- Handling of concurrent step callbacks.
- Behavior when step or agent inputs are malformed or missing fields.
Suggested test cases:
1. test_partial_step_failure — Simulate a failure in one step and ensure subsequent steps are handled correctly.
2. test_concurrent_step_callbacks — Test for race conditions when multiple step callbacks are executed concurrently.
3. test_malformed_step_input — Validate behavior when step or agent inputs are malformed or incomplete.

`pydantic_ai_adapter.py`

Existing coverage:
- test_pydantic_ai_hooks.py includes 16 tests covering the GovernanceCapability and its lifecycle hooks (before_run, before_tool_execute, after_tool_execute, after_run).
- Tests validate prompt content scanning, tool allowlist/blocklist enforcement, and audit recording.
Missing coverage:
- Edge cases for before_run (e.g., empty or excessively long prompts).
- Scenarios with multiple tools in a single execution.
- Handling of expired or invalid governance policies.
Suggested test cases:
1. test_before_run_empty_prompt — Validate behavior when the prompt is empty.
2. test_before_run_long_prompt — Test behavior with a prompt exceeding typical length limits.
3. test_multiple_tool_calls — Verify correct handling of multiple tool calls in a single execution, including max_tool_calls enforcement.
4. test_expired_governance_policy — Simulate an expired policy and ensure proper error handling.

`init.py`

Existing coverage:
- No direct tests for __init__.py as it primarily handles imports and exports.
Missing coverage:
- Validation of new exports (AnthropicGovernanceHook, SKGovernanceFilter, SmolagentsGovernanceCallback, PydanticAIGovernanceCapability).
Suggested test cases:
1. test_import_anthropic_governance_hook — Verify that AnthropicGovernanceHook can be imported and instantiated.
2. test_import_sk_governance_filter — Verify that SKGovernanceFilter can be imported and used.
3. test_import_smolagents_governance_callback — Verify that SmolagentsGovernanceCallback can be imported and used.
4. test_import_pydantic_ai_governance_capability — Verify that PydanticAIGovernanceCapability can be imported and used.

General Recommendations

Ensure that all deprecation warnings (wrap() methods) are tested to confirm they are raised correctly.
Add tests for concurrency and race conditions, especially for shared state (e.g., tool_call_count in GovernanceCapability).
Validate input handling for all hooks and capabilities, including malformed or missing fields.
Test edge cases for policy evaluation, such as conflicting policies or boundary conditions.

github-actions · 2026-04-29T21:16:08Z

🤖 AI Agent: code-reviewer — Feedback on Pull Request: feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI

Feedback on Pull Request: feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI

CRITICAL Issues

Policy Engine Correctness:
- Anthropic Adapter:
  - In the GovernanceMessageHook.create() method, the pre-execution checks for messages only validate the content field. If the SDK introduces new fields that could contain sensitive or malicious data, these fields might bypass governance checks. Consider implementing a more comprehensive validation mechanism that inspects all relevant fields in the messages payload.
- PydanticAI Adapter:
  - The GovernanceCapability.before_tool_execute() method validates tool_name and arguments, but it does not account for nested or dynamically generated arguments that might bypass blocked patterns. Ensure recursive validation of deeply nested structures to avoid policy circumvention.
Trust/Identity:
- Audit Trail Integrity:
  - The audit logs in all adapters (e.g., GovernanceMessageHook, GovernanceCapability) are stored in memory (self._audit or self._ctx). This approach is vulnerable to tampering during runtime. Consider implementing cryptographic signing or hashing of audit entries to ensure their integrity and non-repudiation.
Sandbox Escape Vectors:
- Tool Execution:
  - The GovernanceCapability.before_tool_execute() method does not explicitly sanitize or validate the arguments passed to tools. This could allow malicious payloads to execute in the tool's context. Ensure strict input validation and sanitization for all tool arguments.

WARNING Issues

Backward Compatibility:
- The deprecation of the wrap() methods across all adapters introduces potential breaking changes for existing users who rely on these methods. While the DeprecationWarning provides guidance, ensure that:
  - The deprecation period is well-documented and communicated.
  - The legacy methods remain functional until the next major version release.
- Consider adding a migration script or utility to help users transition from wrap() to the new factory methods.

SUGGESTIONS

Thread Safety:
- The adapters (e.g., GovernanceMessageHook, GovernanceCapability) maintain state in instance variables (self._ctx, self._tool_call_count, etc.). In concurrent agent execution scenarios, this could lead to race conditions or inconsistent state. Consider:
  - Using thread-local storage for per-thread state.
  - Documenting whether these hooks are thread-safe or require external synchronization.
OWASP Agentic Top 10 Compliance:
- Audit Trail Completeness:
  - The audit logs currently capture key events (e.g., run_start, tool_blocked, tool_executed). However, they do not include metadata such as the user ID, session ID, or IP address. Adding this information would improve traceability and compliance with audit requirements.
- Error Handling:
  - Ensure that all exceptions raised by governance checks (e.g., PolicyViolationError) are logged with sufficient context (e.g., offending input, policy details) to facilitate debugging and forensic analysis.
Type Safety and Pydantic Validation:
- The adapters rely on dynamic dictionaries (e.g., kwargs, arguments) for input validation. Consider leveraging Pydantic models to enforce schema validation and type safety for these inputs. This would reduce the risk of runtime errors and improve maintainability.
Documentation:
- The migration guide in the PR description is helpful but could be expanded to include:
  - A detailed comparison of the legacy and new integration patterns.
  - Common pitfalls during migration and how to address them.
  - Examples of how to test the new hooks in isolation.
Testing Coverage:
- While the PR includes tests for all adapters, ensure that:
  - Edge cases (e.g., deeply nested arguments, malformed inputs) are thoroughly tested.
  - Concurrent execution scenarios are tested to identify potential race conditions or state corruption.
Performance Optimization:
- The GovernanceMessageHook.create() method performs multiple checks (e.g., blocked patterns, tool validation) sequentially. For large payloads or high-frequency calls, this could introduce latency. Consider optimizing the checks (e.g., using compiled regex patterns, caching policy lookups).

Summary of Recommendations

Address the CRITICAL issues related to policy engine correctness, audit trail integrity, and sandbox escape vectors.
Mitigate the WARNING issue by ensuring backward compatibility and providing migration support.
Implement the SUGGESTIONS to improve thread safety, compliance, type safety, documentation, testing, and performance.

This PR introduces a significant improvement in governance integration by adopting native hooks, but careful attention to the identified issues will ensure robustness, security, and user adoption.

github-actions · 2026-04-29T21:16:28Z

PR Review Summary

Check	Status	Details
🔍 Code Review	❌ Failed	Issues detected
🛡️ Security Scan	✅ Completed	Analysis complete
🔄 Breaking Changes	⚠️ Warning	See details
📝 Docs Sync	✅ Completed	Analysis complete
🧪 Test Coverage	❌ Failed	Issues detected

Verdict: ❌ Changes needed

imran-siddique

Code Review: Anthropic, Semantic Kernel, smolagents, PydanticAI native hooks

Impressive scope covering 4 adapters in one PR, @miyannishar. Architecture is solid across all four. Issues found:

Blocking

1. contextlib.suppress(Exception) silently swallows ALL errors
In both semantic_kernel_adapter.py (wrap_kernel()) and pydantic_ai_adapter.py (wrap()), the pattern:
\\python
with contextlib.suppress(Exception), warnings.catch_warnings():
warnings.simplefilter('ignore', DeprecationWarning)
return wrapper.wrap(kernel)
\
catches all exceptions, not just deprecation warnings. If wrap() raises a real error (invalid kernel, policy validation), it's silently eaten and returns None. Remove contextlib.suppress(Exception) and keep only warnings.catch_warnings().

2. Anthropic wrap_client() double-emits deprecation
Unlike SK and PydanticAI which suppress the nested warning, Anthropic doesn't. Users see two warnings per call.

Security/Correctness

3. Session ID collisions
GovernanceMessageHook and GovernanceFunctionFilter use int(time.time()) for session IDs. Two hooks created in the same second share a session ID, corrupting audit trails. Use uuid.uuid4().hex[:12] instead.

4. Repeated as_filter()/as_message_hook() calls overwrite context entries
GovernanceFunctionFilter hardcodes wrapper._contexts['sk-filter']. Calling as_filter() twice (as shown in the docstring example for both filter types) overwrites the first filter's context.

5. Smolagents test will fail at runtime
test_blocks_pattern_in_observation passes a step with observation='Result: rm -rf / completed' to a callback with blocked_patterns=['rm -rf']. But GovernanceStepCallback scans observations unconditionally, so this will raise PolicyViolationError before reaching the assertion. Fix either the test expectation or the implementation.

Warnings

6. Post-execution token limit check is detective, not preventive
In Anthropic GovernanceMessageHook.create(), the token check runs after client.messages.create() returns. Tokens are already billed. Document this limitation.

7. Deleted # Google Gemini section comment in __init__.py
Gemini entries now appear under the Anthropic section header. Restore the comment.

Nits

PydanticAI GovernanceCapability maintains redundant dual counters (_tool_call_count and _ctx.call_count)
SK tests use deprecated asyncio.get_event_loop().run_until_complete()

The contextlib.suppress(Exception) issue (#1) is the main blocker since it can mask real failures silently.

imran-siddique

Updated review (condensed):

TL;DR: 1 blocker (error suppression), 1 test bug, rest are nice-to-haves.

#	Sev	Issue	Where
1	Block	`contextlib.suppress(Exception)` swallows ALL errors, not just deprecation warnings	SK `wrap_kernel()`, PydanticAI `wrap()`
2	Bug	smolagents `test_blocks_pattern_in_observation` will raise before reaching assertion	test file
3	Warn	Anthropic `wrap_client()` double-emits deprecation (SK/PydanticAI suppress it correctly)	`wrap_client()`
4	Warn	Session IDs use `int(time.time())` -- collide within same second	Anthropic, SK
5	Warn	Repeated `as_filter()` calls overwrite context entries	SK `GovernanceFunctionFilter`

#1: Remove contextlib.suppress(Exception), keep only warnings.catch_warnings().

#2: Either the test expectation is wrong or observation scanning should skip when no tool calls.

#3-5 are fine as follow-ups.

imran-siddique

Approving native hooks migration for remaining adapters.

imran-siddique · 2026-04-30T03:47:25Z

Closing - rebased and re-opened as a new PR to resolve merge conflicts with recently merged native-hooks PRs.

Blockers: - Remove contextlib.suppress(Exception) in SK wrap_kernel() and PydanticAI wrap() -- was silently swallowing real errors, not just DeprecationWarning. Use warnings.catch_warnings() only. - Fix Anthropic wrap_client() double-emitting deprecation by wrapping the inner kernel.wrap() call with warnings.catch_warnings(). - Fix smolagents test_blocks_pattern_in_observation: the fixture kernel has blocked_patterns=[rm -rf] so observation scanning fires even without a tool call. Corrected both sub-cases to expect raises. Warnings: - Replace int(time.time()) session IDs with uuid.uuid4().hex[:12] to eliminate same-second collision and audit trail corruption. - Make GovernanceFunctionFilter context key unique per instance (was hardcoded 'sk-filter'), preventing context overwrites when as_filter() is called multiple times on the same wrapper.

miyannishar mentioned this pull request Apr 29, 2026

refactor(adk): deprecate wrap()/unwrap()/get_callbacks() in favor of GovernancePlugin #1571

Closed

github-actions Bot added the tests label Apr 29, 2026

github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 29, 2026

imran-siddique requested changes Apr 29, 2026

View reviewed changes

imran-siddique reviewed Apr 29, 2026

View reviewed changes

imran-siddique approved these changes Apr 30, 2026

View reviewed changes

imran-siddique closed this Apr 30, 2026

imran-siddique mentioned this pull request Apr 30, 2026

feat(adapters): add native hooks for Anthropic, SK, smolagents, PydanticAI #1605

Merged

This was referenced Apr 30, 2026

fix(adapters): address review feedback from #1593 — error suppression, session IDs, test bug #1607

Closed

fix(adapters): harden native hooks — edge-case tests + CHANGELOG [v2] #1608

Closed

Conversation

miyannishar commented Apr 29, 2026

Summary

Changes

Anthropic (anthropic_adapter.py)

Semantic Kernel (semantic_kernel_adapter.py)

Smolagents (smolagents_adapter.py)

PydanticAI (pydantic_ai_adapter.py)

Package exports (__init__.py)

Migration Guide

Tests

Design Decisions

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Docs Sync

Uh oh!

github-actions Bot commented Apr 29, 2026

API Compatibility

Uh oh!

github-actions Bot commented Apr 29, 2026

Test Coverage Analysis

anthropic_adapter.py

semantic_kernel_adapter.py

smolagents_adapter.py

pydantic_ai_adapter.py

__init__.py

General Recommendations

Uh oh!

github-actions Bot commented Apr 29, 2026

Feedback on Pull Request: feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI

CRITICAL Issues

WARNING Issues

SUGGESTIONS

Summary of Recommendations

Uh oh!

github-actions Bot commented Apr 29, 2026

PR Review Summary

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Code Review: Anthropic, Semantic Kernel, smolagents, PydanticAI native hooks

Blocking

Security/Correctness

Warnings

Nits

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

imran-siddique commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Anthropic (`anthropic_adapter.py`)

Semantic Kernel (`semantic_kernel_adapter.py`)

Smolagents (`smolagents_adapter.py`)

PydanticAI (`pydantic_ai_adapter.py`)

Package exports (`init.py`)

`anthropic_adapter.py`

`semantic_kernel_adapter.py`

`smolagents_adapter.py`

`pydantic_ai_adapter.py`

`init.py`