Skip to content

feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI#1593

Closed
miyannishar wants to merge 1 commit into
microsoft:mainfrom
miyannishar:feat/remaining-native-hooks
Closed

feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI#1593
miyannishar wants to merge 1 commit into
microsoft:mainfrom
miyannishar:feat/remaining-native-hooks

Conversation

@miyannishar
Copy link
Copy Markdown
Collaborator

Summary

Complete the native-hooks migration for all four remaining agent framework adapters: Anthropic, Semantic Kernel, smolagents, and PydanticAI. Each adapter now exposes a factory method that returns a native framework hook — replacing the legacy monkey-patching wrap() pattern with a composable, non-invasive integration path.

Related to #1571

Changes

Anthropic (anthropic_adapter.py)

  • GovernanceMessageHook — stateless hook implementing a create() method that wraps client.messages.create()
    • Pre-execution: content scanning, tool allowlist validation, token-limit enforcement
    • Post-execution: tool_use block validation, token tracking, audit trail
  • AnthropicKernel.as_message_hook() — factory method (recommended entry point)
  • wrap() / wrap_client() — deprecated with DeprecationWarning, directing to as_message_hook()

Semantic Kernel (semantic_kernel_adapter.py)

  • GovernanceFunctionFilter — async callable conforming to SK's add_filter("auto_function_invocation", ...) protocol
    • Validates function names against allowlist (supports wildcards)
    • Blocked-pattern scanning in arguments
    • max_tool_calls enforcement
    • Per-invocation audit recording
  • SemanticKernelWrapper.as_filter() — factory method (recommended entry point)
  • wrap() / wrap_kernel() — deprecated with DeprecationWarning, directing to as_filter()

Smolagents (smolagents_adapter.py)

  • GovernanceStepCallback — callable implementing __call__(step, agent) for smolagents' `step_callbacks``
    • Tool blocklist/allowlist enforcement
    • Blocked-pattern scanning in arguments and observations
    • max_tool_calls enforcement
    • Step-level audit trail
  • SmolagentsKernel.as_step_callback() — factory method (recommended entry point)
  • wrap() — deprecated with DeprecationWarning, directing to as_step_callback()

PydanticAI (pydantic_ai_adapter.py)

  • GovernanceCapability — capability implementing lifecycle hooks:
    • before_run: prompt content scanning
    • before_tool_execute: tool allowlist, blocked patterns, call-count limits
    • after_tool_execute: audit recording
    • after_run: completion recording
  • PydanticAIKernel.as_capability() — factory method (recommended entry point)
  • wrap() — deprecated with DeprecationWarning, directing to as_capability()

Package exports (__init__.py)

  • Added AnthropicGovernanceHook, SKGovernanceFilter, SmolagentsGovernanceCallback, PydanticAIGovernanceCapability, SmolagentsKernel

Migration Guide

Before (legacy monkey-patching):

# Anthropic
governed = kernel.wrap(client)

# Semantic Kernel  
governed = wrapper.wrap(sk_kernel)

# Smolagents
kernel.wrap(agent)

# PydanticAI
governed = kernel.wrap(agent)

After (native hooks — recommended):

# Anthropic
hook = kernel.as_message_hook()
response = hook.create(client, model="claude-sonnet-4-20250514", ...)

# Semantic Kernel
gov_filter = wrapper.as_filter()
sk_kernel.add_filter("auto_function_invocation", gov_filter)

# Smolagents
callback = kernel.as_step_callback()
agent = CodeAgent(tools=[...], step_callbacks=[callback])

# PydanticAI
capability = kernel.as_capability()
agent = Agent("openai:gpt-4o", capabilities=[capability])

Tests

Adapter Test File Test Count
Anthropic test_anthropic_hooks.py 12
Semantic Kernel test_semantic_kernel_hooks.py 10
Smolagents test_smolagents_hooks.py 14
PydanticAI test_pydantic_ai_hooks.py 16

Design Decisions

  1. Non-invasive governance — All hooks integrate at the framework's native lifecycle points (callbacks, filters, capabilities) instead of wrapping/proxying objects, avoiding type-checking issues and SDK breakage.
  2. Backward compatibility — Legacy wrap() methods still work but emit DeprecationWarning with clear migration instructions.
  3. Stateless hooks — Hooks can be freely shared, re-instantiated, or composed without side effects on the underlying framework clients.

…ticAI

Complete the native-hooks migration for all four remaining adapters:

Anthropic:
- Add GovernanceMessageHook + as_message_hook() factory
- Pre-execution: content scanning, tool allowlist, token limits
- Post-execution: tool_use validation, token tracking, audit
- Deprecate wrap() and wrap_client() with migration guidance

Semantic Kernel:
- Add GovernanceFunctionFilter + as_filter() factory
- Uses SK's native add_filter('auto_function_invocation', ...) system
- Validates function names, blocked patterns, call counts
- Deprecate wrap() and wrap_kernel() with migration guidance

Smolagents:
- Add GovernanceStepCallback + as_step_callback() factory
- Implements step_callbacks protocol: __call__(step, agent)
- Validates tool names, blocked patterns, observations
- Deprecate wrap() with migration guidance

PydanticAI:
- Add GovernanceCapability + as_capability() factory
- Lifecycle hooks: before/after_run, before/after_tool_execute
- Pre-execution policy gating, post-execution drift detection
- Deprecate wrap() with migration guidance

Package exports:
- Export AnthropicGovernanceHook, SKGovernanceFilter,
  SmolagentsGovernanceCallback, PydanticAIGovernanceCapability
  from integrations __init__.py

Tests:
- test_anthropic_hooks.py: 12 tests
- test_semantic_kernel_hooks.py: 10 tests
- test_smolagents_hooks.py: 14 tests
- test_pydantic_ai_hooks.py: 16 tests

Part of: microsoft#1571
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: security-scanner — View details

No security issues found.

@github-actions github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 29, 2026
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

  • GovernanceMessageHook in anthropic_adapter.py -- missing docstring
  • GovernanceCapability in pydantic_ai_adapter.py -- missing docstring
  • GovernanceFunctionFilter in semantic_kernel_adapter.py -- missing docstring
  • README.md -- migration guide needs update to reflect new native hooks
  • CHANGELOG.md -- missing entry for new native hooks and deprecation of wrap() methods

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity Change Impact
Breaking Deprecated wrap() and wrap_client() methods in AnthropicKernel. Existing code using these methods will emit DeprecationWarning.
Breaking Deprecated wrap() method in PydanticAIKernel. Existing code using this method will emit DeprecationWarning.
Breaking Deprecated wrap() method in SemanticKernelWrapper. Existing code using this method will emit DeprecationWarning.
Breaking Deprecated wrap_kernel() function in semantic_kernel_adapter.py. Existing code using this function will emit DeprecationWarning.
Breaking Deprecated wrap() function in pydantic_ai_adapter.py. Existing code using this function will emit DeprecationWarning.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: test-generator — `anthropic_adapter.py`

Test Coverage Analysis

anthropic_adapter.py

  • Existing coverage:
    • test_anthropic_hooks.py includes 12 tests covering the GovernanceMessageHook and its create() method.
    • Tests validate pre-execution checks (blocked patterns, tool allowlist, token limits), post-execution checks (tool use validation, token tracking), and audit trail recording.
  • Missing coverage:
    • Edge cases for token limits (e.g., exactly at the limit, slightly exceeding).
    • Scenarios with no tools specified in the request.
    • Handling of malformed messages or tools inputs.
    • Behavior when client.messages.create() raises an exception.
  • Suggested test cases:
    1. test_create_token_limit_boundary — Test create() with max_tokens exactly at the policy limit and slightly above it.
    2. test_create_no_tools_specified — Validate behavior when no tools are provided in the request.
    3. test_create_malformed_messages — Test create() with malformed messages (e.g., missing content field, invalid types).
    4. test_create_client_exception — Simulate an exception from client.messages.create() and ensure proper error handling.

semantic_kernel_adapter.py

  • Existing coverage:
    • test_semantic_kernel_hooks.py includes 10 tests covering the GovernanceFunctionFilter and its integration with add_filter.
    • Tests validate function name allowlist, blocked-pattern scanning in arguments, and max_tool_calls enforcement.
  • Missing coverage:
    • Wildcard support in function name allowlist.
    • Handling of invalid or malformed function arguments.
    • Behavior when add_filter is called multiple times with the same filter.
  • Suggested test cases:
    1. test_function_name_wildcard_matching — Verify that wildcard patterns in the function name allowlist are correctly matched.
    2. test_invalid_function_arguments — Test behavior when function arguments are malformed or missing required fields.
    3. test_duplicate_filter_registration — Ensure that registering the same filter multiple times does not cause unexpected behavior.

smolagents_adapter.py

  • Existing coverage:
    • test_smolagents_hooks.py includes 14 tests covering the GovernanceStepCallback and its __call__ method.
    • Tests validate tool allowlist/blocklist enforcement, blocked-pattern scanning, and step-level audit trail recording.
  • Missing coverage:
    • Scenarios with partial failures in step execution.
    • Handling of concurrent step callbacks.
    • Behavior when step or agent inputs are malformed or missing fields.
  • Suggested test cases:
    1. test_partial_step_failure — Simulate a failure in one step and ensure subsequent steps are handled correctly.
    2. test_concurrent_step_callbacks — Test for race conditions when multiple step callbacks are executed concurrently.
    3. test_malformed_step_input — Validate behavior when step or agent inputs are malformed or incomplete.

pydantic_ai_adapter.py

  • Existing coverage:
    • test_pydantic_ai_hooks.py includes 16 tests covering the GovernanceCapability and its lifecycle hooks (before_run, before_tool_execute, after_tool_execute, after_run).
    • Tests validate prompt content scanning, tool allowlist/blocklist enforcement, and audit recording.
  • Missing coverage:
    • Edge cases for before_run (e.g., empty or excessively long prompts).
    • Scenarios with multiple tools in a single execution.
    • Handling of expired or invalid governance policies.
  • Suggested test cases:
    1. test_before_run_empty_prompt — Validate behavior when the prompt is empty.
    2. test_before_run_long_prompt — Test behavior with a prompt exceeding typical length limits.
    3. test_multiple_tool_calls — Verify correct handling of multiple tool calls in a single execution, including max_tool_calls enforcement.
    4. test_expired_governance_policy — Simulate an expired policy and ensure proper error handling.

__init__.py

  • Existing coverage:
    • No direct tests for __init__.py as it primarily handles imports and exports.
  • Missing coverage:
    • Validation of new exports (AnthropicGovernanceHook, SKGovernanceFilter, SmolagentsGovernanceCallback, PydanticAIGovernanceCapability).
  • Suggested test cases:
    1. test_import_anthropic_governance_hook — Verify that AnthropicGovernanceHook can be imported and instantiated.
    2. test_import_sk_governance_filter — Verify that SKGovernanceFilter can be imported and used.
    3. test_import_smolagents_governance_callback — Verify that SmolagentsGovernanceCallback can be imported and used.
    4. test_import_pydantic_ai_governance_capability — Verify that PydanticAIGovernanceCapability can be imported and used.

General Recommendations

  • Ensure that all deprecation warnings (wrap() methods) are tested to confirm they are raised correctly.
  • Add tests for concurrency and race conditions, especially for shared state (e.g., tool_call_count in GovernanceCapability).
  • Validate input handling for all hooks and capabilities, including malformed or missing fields.
  • Test edge cases for policy evaluation, such as conflicting policies or boundary conditions.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: code-reviewer — Feedback on Pull Request: feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI

Feedback on Pull Request: feat(adapters): native hooks for Anthropic, Semantic Kernel, smolagents, PydanticAI


CRITICAL Issues

  1. Policy Engine Correctness:

    • Anthropic Adapter:
      • In the GovernanceMessageHook.create() method, the pre-execution checks for messages only validate the content field. If the SDK introduces new fields that could contain sensitive or malicious data, these fields might bypass governance checks. Consider implementing a more comprehensive validation mechanism that inspects all relevant fields in the messages payload.
    • PydanticAI Adapter:
      • The GovernanceCapability.before_tool_execute() method validates tool_name and arguments, but it does not account for nested or dynamically generated arguments that might bypass blocked patterns. Ensure recursive validation of deeply nested structures to avoid policy circumvention.
  2. Trust/Identity:

    • Audit Trail Integrity:
      • The audit logs in all adapters (e.g., GovernanceMessageHook, GovernanceCapability) are stored in memory (self._audit or self._ctx). This approach is vulnerable to tampering during runtime. Consider implementing cryptographic signing or hashing of audit entries to ensure their integrity and non-repudiation.
  3. Sandbox Escape Vectors:

    • Tool Execution:
      • The GovernanceCapability.before_tool_execute() method does not explicitly sanitize or validate the arguments passed to tools. This could allow malicious payloads to execute in the tool's context. Ensure strict input validation and sanitization for all tool arguments.

WARNING Issues

  1. Backward Compatibility:
    • The deprecation of the wrap() methods across all adapters introduces potential breaking changes for existing users who rely on these methods. While the DeprecationWarning provides guidance, ensure that:
      • The deprecation period is well-documented and communicated.
      • The legacy methods remain functional until the next major version release.
    • Consider adding a migration script or utility to help users transition from wrap() to the new factory methods.

SUGGESTIONS

  1. Thread Safety:

    • The adapters (e.g., GovernanceMessageHook, GovernanceCapability) maintain state in instance variables (self._ctx, self._tool_call_count, etc.). In concurrent agent execution scenarios, this could lead to race conditions or inconsistent state. Consider:
      • Using thread-local storage for per-thread state.
      • Documenting whether these hooks are thread-safe or require external synchronization.
  2. OWASP Agentic Top 10 Compliance:

    • Audit Trail Completeness:
      • The audit logs currently capture key events (e.g., run_start, tool_blocked, tool_executed). However, they do not include metadata such as the user ID, session ID, or IP address. Adding this information would improve traceability and compliance with audit requirements.
    • Error Handling:
      • Ensure that all exceptions raised by governance checks (e.g., PolicyViolationError) are logged with sufficient context (e.g., offending input, policy details) to facilitate debugging and forensic analysis.
  3. Type Safety and Pydantic Validation:

    • The adapters rely on dynamic dictionaries (e.g., kwargs, arguments) for input validation. Consider leveraging Pydantic models to enforce schema validation and type safety for these inputs. This would reduce the risk of runtime errors and improve maintainability.
  4. Documentation:

    • The migration guide in the PR description is helpful but could be expanded to include:
      • A detailed comparison of the legacy and new integration patterns.
      • Common pitfalls during migration and how to address them.
      • Examples of how to test the new hooks in isolation.
  5. Testing Coverage:

    • While the PR includes tests for all adapters, ensure that:
      • Edge cases (e.g., deeply nested arguments, malformed inputs) are thoroughly tested.
      • Concurrent execution scenarios are tested to identify potential race conditions or state corruption.
  6. Performance Optimization:

    • The GovernanceMessageHook.create() method performs multiple checks (e.g., blocked patterns, tool validation) sequentially. For large payloads or high-frequency calls, this could introduce latency. Consider optimizing the checks (e.g., using compiled regex patterns, caching policy lookups).

Summary of Recommendations

  • Address the CRITICAL issues related to policy engine correctness, audit trail integrity, and sandbox escape vectors.
  • Mitigate the WARNING issue by ensuring backward compatibility and providing migration support.
  • Implement the SUGGESTIONS to improve thread safety, compliance, type safety, documentation, testing, and performance.

This PR introduces a significant improvement in governance integration by adopting native hooks, but careful attention to the identified issues will ensure robustness, security, and user adoption.

@github-actions
Copy link
Copy Markdown

PR Review Summary

Check Status Details
🔍 Code Review ❌ Failed Issues detected
🛡️ Security Scan ✅ Completed Analysis complete
🔄 Breaking Changes ⚠️ Warning See details
📝 Docs Sync ✅ Completed Analysis complete
🧪 Test Coverage ❌ Failed Issues detected

Verdict: ❌ Changes needed

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Anthropic, Semantic Kernel, smolagents, PydanticAI native hooks

Impressive scope covering 4 adapters in one PR, @miyannishar. Architecture is solid across all four. Issues found:

Blocking

1. contextlib.suppress(Exception) silently swallows ALL errors
In both semantic_kernel_adapter.py (wrap_kernel()) and pydantic_ai_adapter.py (wrap()), the pattern:
\\python
with contextlib.suppress(Exception), warnings.catch_warnings():
warnings.simplefilter('ignore', DeprecationWarning)
return wrapper.wrap(kernel)
\
catches all exceptions, not just deprecation warnings. If wrap() raises a real error (invalid kernel, policy validation), it's silently eaten and returns None. Remove contextlib.suppress(Exception) and keep only warnings.catch_warnings().

2. Anthropic wrap_client() double-emits deprecation
Unlike SK and PydanticAI which suppress the nested warning, Anthropic doesn't. Users see two warnings per call.

Security/Correctness

3. Session ID collisions
GovernanceMessageHook and GovernanceFunctionFilter use int(time.time()) for session IDs. Two hooks created in the same second share a session ID, corrupting audit trails. Use uuid.uuid4().hex[:12] instead.

4. Repeated as_filter()/as_message_hook() calls overwrite context entries
GovernanceFunctionFilter hardcodes wrapper._contexts['sk-filter']. Calling as_filter() twice (as shown in the docstring example for both filter types) overwrites the first filter's context.

5. Smolagents test will fail at runtime
test_blocks_pattern_in_observation passes a step with observation='Result: rm -rf / completed' to a callback with blocked_patterns=['rm -rf']. But GovernanceStepCallback scans observations unconditionally, so this will raise PolicyViolationError before reaching the assertion. Fix either the test expectation or the implementation.

Warnings

6. Post-execution token limit check is detective, not preventive
In Anthropic GovernanceMessageHook.create(), the token check runs after client.messages.create() returns. Tokens are already billed. Document this limitation.

7. Deleted # Google Gemini section comment in __init__.py
Gemini entries now appear under the Anthropic section header. Restore the comment.

Nits

  • PydanticAI GovernanceCapability maintains redundant dual counters (_tool_call_count and _ctx.call_count)
  • SK tests use deprecated asyncio.get_event_loop().run_until_complete()

The contextlib.suppress(Exception) issue (#1) is the main blocker since it can mask real failures silently.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated review (condensed):

TL;DR: 1 blocker (error suppression), 1 test bug, rest are nice-to-haves.

# Sev Issue Where
1 Block contextlib.suppress(Exception) swallows ALL errors, not just deprecation warnings SK wrap_kernel(), PydanticAI wrap()
2 Bug smolagents test_blocks_pattern_in_observation will raise before reaching assertion test file
3 Warn Anthropic wrap_client() double-emits deprecation (SK/PydanticAI suppress it correctly) wrap_client()
4 Warn Session IDs use int(time.time()) -- collide within same second Anthropic, SK
5 Warn Repeated as_filter() calls overwrite context entries SK GovernanceFunctionFilter

#1: Remove contextlib.suppress(Exception), keep only warnings.catch_warnings().

#2: Either the test expectation is wrong or observation scanning should skip when no tool calls.

#3-5 are fine as follow-ups.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving native hooks migration for remaining adapters.

@imran-siddique
Copy link
Copy Markdown
Member

Closing - rebased and re-opened as a new PR to resolve merge conflicts with recently merged native-hooks PRs.

miyannishar pushed a commit to miyannishar/agent-governance-toolkit that referenced this pull request Apr 30, 2026
Blockers:
- Remove contextlib.suppress(Exception) in SK wrap_kernel() and
  PydanticAI wrap() -- was silently swallowing real errors, not just
  DeprecationWarning. Use warnings.catch_warnings() only.
- Fix Anthropic wrap_client() double-emitting deprecation by wrapping
  the inner kernel.wrap() call with warnings.catch_warnings().
- Fix smolagents test_blocks_pattern_in_observation: the fixture kernel
  has blocked_patterns=[rm -rf] so observation scanning fires even
  without a tool call. Corrected both sub-cases to expect raises.

Warnings:
- Replace int(time.time()) session IDs with uuid.uuid4().hex[:12]
  to eliminate same-second collision and audit trail corruption.
- Make GovernanceFunctionFilter context key unique per instance
  (was hardcoded 'sk-filter'), preventing context overwrites when
  as_filter() is called multiple times on the same wrapper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR (500+ lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants