feat(crewai): add native GovernanceHooks using CrewAI execution hooks by miyannishar · Pull Request #1588 · microsoft/agent-governance-toolkit

miyannishar · 2026-04-29T20:49:51Z

Summary

Replaces fragile proxy-based monkey-patching in the CrewAI adapter with CrewAI's native execution hooks (@before_tool_call, @after_tool_call, @before_llm_call, @after_llm_call) introduced in CrewAI 0.80+.

Resolves #1587

Changes

New: `GovernanceHooks` class

Registers four global governance hooks that intercept every tool and LLM call across all agents:

Hook	Governance Action
`before_tool_call`	Tool allowlist, blocked-pattern scan on args/name, Cedar/OPA `pre_execute` gate, max call count
`after_tool_call`	Blocked-pattern scan on output, `post_execute` drift detection
`before_llm_call`	Content filter on input messages, Cedar/OPA `pre_execute` gate
`after_llm_call`	Blocked-pattern scan on LLM response, `post_execute` drift detection

New: `CrewAIKernel.as_hooks()` factory

kernel = CrewAIKernel(policy=GovernancePolicy(
    blocked_patterns=["DROP TABLE"],
    allowed_tools=["search", "calculator"],
))
hooks = kernel.as_hooks()        # register governance hooks
result = my_crew.kickoff()       # hooks intercept every call
hooks.unregister()               # clean up

Deprecated: `wrap()` and module-level `wrap()`

Both now emit DeprecationWarning pointing to as_hooks(). Full backward compatibility maintained — all 12 existing regression tests pass unchanged.

Export

CrewAIGovernanceHooks exported from agent_os.integrations.

Testing

43 new tests covering all four hook types, Cedar evaluator integration, deprecation warnings, and backward compatibility
12 existing regression tests pass unchanged (full backward compatibility)
Tests use stub crewai.hooks module since CrewAI is not installed in CI

Design Decisions

Native over proxy: Hooks intercept at the framework level without mutating crew/agent/tool objects, preserving isinstance() checks and composability
Fail-closed: All policy violations block or raise PolicyViolationError
Graceful degradation: try/except import ensures the adapter works even when crewai.hooks is unavailable (falls back to legacy wrap())
Pattern parity: as_hooks() mirrors ADK's as_plugin() and OpenAI's as_hooks() for consistent API surface

Replace fragile proxy-based monkey-patching with CrewAI's native execution hooks (@before_tool_call, @after_tool_call, @before_llm_call, @after_llm_call) introduced in CrewAI 0.80+. Changes: - Add GovernanceHooks class that registers four global governance hooks for tool allowlist, blocked-pattern scanning, Cedar/OPA pre_execute gates, and output content filtering - Add CrewAIKernel.as_hooks() factory method as the recommended integration path - Deprecate CrewAIKernel.wrap() with DeprecationWarning pointing to as_hooks() - Export CrewAIGovernanceHooks from integrations package - Add 43 new tests covering all four hook types, deprecation warnings, Cedar evaluator integration, and backward compatibility - All 12 existing CrewAI regression tests pass unchanged Resolves microsoft#1576

github-actions · 2026-04-29T20:50:11Z

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

GovernanceHooks in crewai_adapter.py -- missing docstring for methods _make_after_llm_call and _make_before_llm_call.
README.md -- update needed to reflect the new GovernanceHooks class and as_hooks() method as the recommended integration approach.
CHANGELOG.md -- missing entry for the introduction of GovernanceHooks and deprecation of wrap().

github-actions · 2026-04-29T20:50:15Z

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity	Change	Impact
Breaking	`wrap()` method is deprecated in `CrewAIKernel`.	Users relying on `wrap()` will need to migrate to the new `as_hooks()` method in the future.
Breaking	Module-level `wrap()` function is deprecated.	Users relying on the module-level `wrap()` function will need to transition to `as_hooks()`.
Potential	`GovernanceHooks` requires `crewai.hooks` module (available in CrewAI 0.80+).	Users with older versions of CrewAI or without `crewai.hooks` will need to upgrade or use the legacy `wrap()` method.
Potential	`GovernanceHooks` registers global hooks, which may interfere with other CrewAI usages.	Users with multiple CrewAI integrations in the same process may encounter unexpected behavior.

github-actions · 2026-04-29T20:50:15Z

🤖 AI Agent: security-scanner — View details

No security issues found.

github-actions · 2026-04-29T20:50:20Z

🤖 AI Agent: code-reviewer — Review Summary

Review Summary

This pull request introduces a significant enhancement to the microsoft/agent-governance-toolkit repository by replacing the fragile proxy-based monkey-patching in the CrewAI adapter with native execution hooks provided by CrewAI version 0.80+. The new GovernanceHooks class provides a robust and modular way to enforce governance policies for tool and LLM calls. The PR also maintains backward compatibility with the deprecated wrap() method, ensuring a smooth transition for existing users.

Below is a detailed review of the changes, focusing on the specified focus areas.

CRITICAL

Fail-Closed Behavior in GovernanceHooks
- The GovernanceHooks class is designed to enforce a fail-closed policy, which is a critical security feature. However, the unregister() method does not actually remove the hooks from CrewAI's global registry, as noted in the comments. This could lead to a situation where hooks remain active even after they are "unregistered," potentially causing unintended behavior or security gaps.
- Recommendation: Investigate whether CrewAI provides a mechanism to unregister hooks. If not, consider implementing a workaround (e.g., replacing the hook functions with no-op implementations) to ensure hooks are effectively disabled when unregister() is called.
Blocked-Pattern Matching
- The matches_pattern method is used extensively to scan for blocked patterns in tool names, arguments, and outputs. However, the implementation of matches_pattern is not included in the diff, so it is unclear if it is robust against edge cases such as partial matches, case sensitivity, or encoding issues.
- Recommendation: Review the implementation of matches_pattern to ensure it is secure and handles edge cases correctly. Consider adding tests for edge cases, such as patterns with special characters or patterns embedded in larger strings.
Thread Safety
- The GovernanceHooks class uses a shared execution context (self._ctx) and maintains state (e.g., self._registered, self._hook_fns). If multiple threads or asynchronous tasks are using the same GovernanceHooks instance, this could lead to race conditions or inconsistent state.
- Recommendation: Use thread-safe data structures or synchronization mechanisms (e.g., threading.Lock) to ensure thread safety when accessing or modifying shared state.
Cedar/OPA Integration
- The pre_execute and post_execute methods are used for policy evaluation, but their implementations are not included in the diff. It is unclear how these methods handle untrusted input or whether they are vulnerable to injection attacks.
- Recommendation: Review the pre_execute and post_execute implementations to ensure they properly sanitize and validate input. Add tests to verify that malicious input cannot bypass governance policies.

WARNING

Backward Compatibility
- The wrap() method is marked as deprecated and emits a DeprecationWarning. While this maintains backward compatibility for now, the eventual removal of wrap() in v1.0 will be a breaking change for users relying on this method.
- Recommendation: Clearly document the deprecation timeline and provide migration guides to help users transition to the new as_hooks() API. Consider gathering feedback from users to ensure the transition is smooth.
Global Hook Registration
- The GovernanceHooks class registers hooks globally, which could lead to conflicts if multiple GovernanceHooks instances are registered simultaneously.
- Recommendation: Add safeguards to prevent multiple GovernanceHooks instances from being registered at the same time. For example, maintain a global registry of active GovernanceHooks instances and raise an error if a new instance is registered while another is active.

SUGGESTIONS

Logging Improvements
- The logging statements in the GovernanceHooks class provide useful information, but they could be enhanced with additional context, such as the specific agent or crew involved in a policy violation.
- Recommendation: Include more contextual information (e.g., agent name, crew ID) in log messages to aid debugging and auditing.
Testing Coverage
- The PR mentions 43 new tests, but it is unclear if these tests cover all edge cases, such as:
  - Nested or complex data structures in tool arguments or LLM messages.
  - Performance under high concurrency.
  - Behavior when crewai.hooks is unavailable.
- Recommendation: Ensure comprehensive test coverage, including edge cases and performance under load. Consider using tools like pytest-xdist to test concurrency.
Documentation
- The new GovernanceHooks class and as_hooks() method are well-documented, but the documentation could benefit from additional examples and use cases, especially for the new hooks.
- Recommendation: Add more detailed examples to the documentation, including how to handle common scenarios like blocked patterns, allowlists, and Cedar/OPA integration.
Error Handling
- The PolicyViolationError is raised for policy violations, but it is unclear if this exception is logged or handled in a way that provides actionable insights to users.
- Recommendation: Ensure that all PolicyViolationError instances include sufficient context (e.g., the specific policy violated, the input that caused the violation) and that these errors are logged appropriately.
Performance Considerations
- The hooks perform multiple checks (e.g., pattern matching, policy evaluation) for every tool and LLM call. This could introduce performance overhead, especially in high-throughput scenarios.
- Recommendation: Profile the performance of the hooks under realistic workloads. Consider optimizing the matches_pattern method and other frequently called functions.

Conclusion

This PR introduces a robust and modular approach to governance in the CrewAI adapter, addressing several limitations of the previous proxy-based implementation. While the changes are well-designed and maintain backward compatibility, there are critical areas that require attention, particularly around fail-closed behavior, thread safety, and policy evaluation. Addressing these issues will ensure the security and reliability of the new governance hooks.

github-actions · 2026-04-29T20:50:26Z

🤖 AI Agent: test-generator — `agent_os/integrations/crewai_adapter.py`

Test Coverage Analysis

`agent_os/integrations/crewai_adapter.py`

Existing Coverage:

GovernanceHooks Class:
- The PR mentions 43 new tests covering all four hook types (before_tool_call, after_tool_call, before_llm_call, after_llm_call), Cedar evaluator integration, deprecation warnings, and backward compatibility.
- The register and unregister methods are likely tested for basic functionality, as the PR mentions that hooks are being tested.
- The wrap() method is explicitly mentioned as being covered by 12 existing regression tests.
CrewAIKernel Class:
- The wrap() method is covered by existing regression tests.
- The as_hooks() method is likely covered by the new tests, as it is the primary entry point for the new functionality.

Missing Coverage:

Edge Cases for GovernanceHooks:
- Handling of malformed inputs, such as invalid ToolCallHookContext or LLMCallHookContext objects.
- Behavior when crewai.hooks is unavailable (e.g., _HOOKS_AVAILABLE is False).
- Concurrent registration and unregistration of hooks (potential race conditions).
- Behavior when unregister() is called multiple times or without prior registration.
- Handling of unexpected exceptions within hook functions (e.g., pre_execute or post_execute raising errors).
Policy Evaluation Edge Cases:
- Boundary conditions for max_tool_calls (e.g., exactly at the limit, just below, just above).
- Conflicting policies (e.g., a tool is both in the allowlist and matches a blocked pattern).
- Behavior when Cedar/OPA evaluator returns unexpected results or errors.
Trust Scoring Edge Cases:
- Handling of expired certificates or invalid trust scores in pre_execute and post_execute.
- Behavior when post_execute drift detection returns ambiguous or null results.
Chaos Experiments:
- Simulating partial failures in hooks (e.g., before_tool_call succeeds but after_tool_call fails).
- Timeout handling in pre_execute and post_execute.
Concurrency:
- Simultaneous tool/LLM calls from multiple agents and crews.
- Race conditions in shared state (e.g., ctx.call_count).
Input Validation:
- Malformed inputs to kickoff() and kickoff_async() (e.g., non-dict inputs, missing keys).
- Injection attempts in tool names, arguments, or LLM messages.

Suggested Test Cases:

test_hooks_registration_failure
- Simulate the absence of crewai.hooks and verify that register() raises a RuntimeError.
test_hooks_unregister_without_register
- Call unregister() without prior registration and verify no errors occur.
test_hooks_concurrent_registration
- Simulate concurrent calls to register() and unregister() and verify no race conditions or inconsistent states.
test_policy_conflicting_rules
- Define a policy where a tool is both in the allowlist and matches a blocked pattern. Verify the blocked pattern takes precedence.
test_max_tool_calls_boundary
- Test max_tool_calls at boundary conditions (e.g., exactly at the limit, just below, just above).
test_pre_execute_timeout
- Simulate a timeout in the pre_execute evaluator and verify the tool/LLM call is blocked.
test_post_execute_partial_failure
- Simulate a failure in post_execute drift detection and verify the appropriate exception is raised.
test_llm_input_injection
- Test LLM input messages with potential injection attempts (e.g., SQL injection, script tags) and verify they are blocked.
test_tool_output_blocked_pattern
- Simulate a tool output containing a blocked pattern and verify the appropriate exception is raised.
test_concurrent_tool_calls
- Simulate multiple agents making concurrent tool calls and verify ctx.call_count is updated correctly without race conditions.
test_malformed_tool_call_context
- Pass a malformed ToolCallHookContext to before_tool_call and verify graceful handling or appropriate errors.
test_deprecated_wrap_warning
- Verify that using the wrap() method emits a DeprecationWarning.

By adding these test cases, the coverage of edge cases and robustness of the new GovernanceHooks and CrewAIKernel functionality can be significantly improved.

github-actions · 2026-04-29T20:50:47Z

PR Review Summary

Check	Status	Details
🔍 Code Review	❌ Failed	Issues detected
🛡️ Security Scan	✅ Completed	Analysis complete
🔄 Breaking Changes	⚠️ Warning	See details
📝 Docs Sync	✅ Completed	Analysis complete
🧪 Test Coverage	❌ Failed	Issues detected

Verdict: ❌ Changes needed

imran-siddique

Code Review: CrewAI GovernanceHooks

Clean API design, @miyannishar. The as_hooks() pattern is consistent with the other adapters. Two blocking issues:

Blocking

1. unregister() doesn't actually remove hooks from CrewAI
unregister() sets self._registered = False and clears self._hook_fns, but the closures registered with CrewAI still hold live references and continue enforcing governance. The user believes they cleaned up, but governance is silently still active.

Fix: add an _active guard inside each hook closure:
python def governance_before_tool(context): if not self._registered: return None # passthrough when unregistered # ... existing logic

2. Non-string tool/LLM results bypass blocked-pattern scanning
after_tool_call and after_llm_call only scan str results. Dict, list, or object results skip scanning entirely. A tool returning {"query": "DROP TABLE users"} passes undetected. The before_tool_call hook already handles this correctly by coercing with str(tool_input). Apply the same pattern to after hooks.

Warnings

3. call_count incremented before max-check
Blocked calls still inflate the counter. Check before incrementing:
python if kernel.policy.max_tool_calls and ctx.call_count + 1 > kernel.policy.max_tool_calls: return False ctx.call_count += 1

4. Double deprecation warning from module-level wrap()
Same pattern as other adapters. Suppress the inner warning.

5. before_llm_call uses inconsistent content extraction for Cedar/OPA
The per-message loop handles dict/str/object, but combined_input for Cedar/OPA only handles two types. Extract a helper _extract_content(msg) and use in both places.

imran-siddique

Updated review (condensed):

TL;DR: 2 blockers. Fix those and this ships.

#	Sev	Issue	Where
1	Block	`unregister()` doesn't guard hook closures -- governance silently persists after cleanup	`unregister()`
2	Block	Non-string tool/LLM results bypass blocked-pattern scanning	`after_tool_call`, `after_llm_call`
3	Warn	`call_count` incremented before max-check (blocked calls inflate counter)	`before_tool_call`
4	Warn	Double deprecation warning from module-level `wrap()`	`wrap()`
5	Warn	`combined_input` for Cedar/OPA uses different content extraction than pattern scanner	`before_llm_call`

#1: Add if not self._registered: return None guard inside each hook closure.

#2: Coerce with str(tool_result) before scanning, matching before_tool_call pattern.

#3-5 are fine as follow-ups.

imran-siddique

Approving native hooks migration.

…microsoft#1588) Replace fragile proxy-based monkey-patching with CrewAI's native execution hooks (@before_tool_call, @after_tool_call, @before_llm_call, @after_llm_call) introduced in CrewAI 0.80+. Changes: - Add GovernanceHooks class that registers four global governance hooks for tool allowlist, blocked-pattern scanning, Cedar/OPA pre_execute gates, and output content filtering - Add CrewAIKernel.as_hooks() factory method as the recommended integration path - Deprecate CrewAIKernel.wrap() with DeprecationWarning pointing to as_hooks() - Export CrewAIGovernanceHooks from integrations package - Add 43 new tests covering all four hook types, deprecation warnings, Cedar evaluator integration, and backward compatibility - All 12 existing CrewAI regression tests pass unchanged Resolves microsoft#1576 Co-authored-by: Nishar <you@example.com>

github-actions Bot added the tests label Apr 29, 2026

miyannishar mentioned this pull request Apr 29, 2026

Deprecate CrewAIKernel.wrap() in favor of native as_hooks() #1589

Closed

github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 29, 2026

miyannishar mentioned this pull request Apr 29, 2026

feat(autogen): add native GovernanceInterventionHandler via AutoGen v0.4+ hooks #1591

Merged

imran-siddique requested changes Apr 29, 2026

View reviewed changes

imran-siddique reviewed Apr 29, 2026

View reviewed changes

imran-siddique approved these changes Apr 30, 2026

View reviewed changes

imran-siddique merged commit 1436169 into microsoft:main Apr 30, 2026
13 of 14 checks passed

This was referenced Apr 30, 2026

feat(adapters): add native hooks for Anthropic, SK, smolagents, PydanticAI #1605

Merged

fix(tests): fix 48 native-hooks test failures in docker CI #1610

Merged

Conversation

miyannishar commented Apr 29, 2026

Summary

Changes

New: GovernanceHooks class

New: CrewAIKernel.as_hooks() factory

Deprecated: wrap() and module-level wrap()

Export

Testing

Design Decisions

Related

Uh oh!

github-actions Bot commented Apr 29, 2026

Docs Sync

Uh oh!

github-actions Bot commented Apr 29, 2026

API Compatibility

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Review Summary

CRITICAL

WARNING

SUGGESTIONS

Conclusion

Uh oh!

github-actions Bot commented Apr 29, 2026

Test Coverage Analysis

agent_os/integrations/crewai_adapter.py

Existing Coverage:

Missing Coverage:

Suggested Test Cases:

Uh oh!

github-actions Bot commented Apr 29, 2026

PR Review Summary

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Code Review: CrewAI GovernanceHooks

Blocking

Warnings

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New: `GovernanceHooks` class

New: `CrewAIKernel.as_hooks()` factory

Deprecated: `wrap()` and module-level `wrap()`

`agent_os/integrations/crewai_adapter.py`