Skip to content

feat(crewai): add native GovernanceHooks using CrewAI execution hooks#1588

Merged
imran-siddique merged 1 commit into
microsoft:mainfrom
miyannishar:feat/crewai-native-hooks
Apr 30, 2026
Merged

feat(crewai): add native GovernanceHooks using CrewAI execution hooks#1588
imran-siddique merged 1 commit into
microsoft:mainfrom
miyannishar:feat/crewai-native-hooks

Conversation

@miyannishar
Copy link
Copy Markdown
Collaborator

Summary

Replaces fragile proxy-based monkey-patching in the CrewAI adapter with CrewAI's native execution hooks (@before_tool_call, @after_tool_call, @before_llm_call, @after_llm_call) introduced in CrewAI 0.80+.

Resolves #1587

Changes

New: GovernanceHooks class

Registers four global governance hooks that intercept every tool and LLM call across all agents:

Hook Governance Action
before_tool_call Tool allowlist, blocked-pattern scan on args/name, Cedar/OPA pre_execute gate, max call count
after_tool_call Blocked-pattern scan on output, post_execute drift detection
before_llm_call Content filter on input messages, Cedar/OPA pre_execute gate
after_llm_call Blocked-pattern scan on LLM response, post_execute drift detection

New: CrewAIKernel.as_hooks() factory

kernel = CrewAIKernel(policy=GovernancePolicy(
    blocked_patterns=["DROP TABLE"],
    allowed_tools=["search", "calculator"],
))
hooks = kernel.as_hooks()        # register governance hooks
result = my_crew.kickoff()       # hooks intercept every call
hooks.unregister()               # clean up

Deprecated: wrap() and module-level wrap()

Both now emit DeprecationWarning pointing to as_hooks(). Full backward compatibility maintained — all 12 existing regression tests pass unchanged.

Export

CrewAIGovernanceHooks exported from agent_os.integrations.

Testing

  • 43 new tests covering all four hook types, Cedar evaluator integration, deprecation warnings, and backward compatibility
  • 12 existing regression tests pass unchanged (full backward compatibility)
  • Tests use stub crewai.hooks module since CrewAI is not installed in CI

Design Decisions

  1. Native over proxy: Hooks intercept at the framework level without mutating crew/agent/tool objects, preserving isinstance() checks and composability
  2. Fail-closed: All policy violations block or raise PolicyViolationError
  3. Graceful degradation: try/except import ensures the adapter works even when crewai.hooks is unavailable (falls back to legacy wrap())
  4. Pattern parity: as_hooks() mirrors ADK's as_plugin() and OpenAI's as_hooks() for consistent API surface

Related

Replace fragile proxy-based monkey-patching with CrewAI's native
execution hooks (@before_tool_call, @after_tool_call, @before_llm_call,
@after_llm_call) introduced in CrewAI 0.80+.

Changes:
- Add GovernanceHooks class that registers four global governance hooks
  for tool allowlist, blocked-pattern scanning, Cedar/OPA pre_execute
  gates, and output content filtering
- Add CrewAIKernel.as_hooks() factory method as the recommended
  integration path
- Deprecate CrewAIKernel.wrap() with DeprecationWarning pointing to
  as_hooks()
- Export CrewAIGovernanceHooks from integrations package
- Add 43 new tests covering all four hook types, deprecation warnings,
  Cedar evaluator integration, and backward compatibility
- All 12 existing CrewAI regression tests pass unchanged

Resolves microsoft#1576
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

  • GovernanceHooks in crewai_adapter.py -- missing docstring for methods _make_after_llm_call and _make_before_llm_call.
  • README.md -- update needed to reflect the new GovernanceHooks class and as_hooks() method as the recommended integration approach.
  • CHANGELOG.md -- missing entry for the introduction of GovernanceHooks and deprecation of wrap().

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity Change Impact
Breaking wrap() method is deprecated in CrewAIKernel. Users relying on wrap() will need to migrate to the new as_hooks() method in the future.
Breaking Module-level wrap() function is deprecated. Users relying on the module-level wrap() function will need to transition to as_hooks().
Potential GovernanceHooks requires crewai.hooks module (available in CrewAI 0.80+). Users with older versions of CrewAI or without crewai.hooks will need to upgrade or use the legacy wrap() method.
Potential GovernanceHooks registers global hooks, which may interfere with other CrewAI usages. Users with multiple CrewAI integrations in the same process may encounter unexpected behavior.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: security-scanner — View details

No security issues found.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: code-reviewer — Review Summary

Review Summary

This pull request introduces a significant enhancement to the microsoft/agent-governance-toolkit repository by replacing the fragile proxy-based monkey-patching in the CrewAI adapter with native execution hooks provided by CrewAI version 0.80+. The new GovernanceHooks class provides a robust and modular way to enforce governance policies for tool and LLM calls. The PR also maintains backward compatibility with the deprecated wrap() method, ensuring a smooth transition for existing users.

Below is a detailed review of the changes, focusing on the specified focus areas.


CRITICAL

  1. Fail-Closed Behavior in GovernanceHooks

    • The GovernanceHooks class is designed to enforce a fail-closed policy, which is a critical security feature. However, the unregister() method does not actually remove the hooks from CrewAI's global registry, as noted in the comments. This could lead to a situation where hooks remain active even after they are "unregistered," potentially causing unintended behavior or security gaps.
    • Recommendation: Investigate whether CrewAI provides a mechanism to unregister hooks. If not, consider implementing a workaround (e.g., replacing the hook functions with no-op implementations) to ensure hooks are effectively disabled when unregister() is called.
  2. Blocked-Pattern Matching

    • The matches_pattern method is used extensively to scan for blocked patterns in tool names, arguments, and outputs. However, the implementation of matches_pattern is not included in the diff, so it is unclear if it is robust against edge cases such as partial matches, case sensitivity, or encoding issues.
    • Recommendation: Review the implementation of matches_pattern to ensure it is secure and handles edge cases correctly. Consider adding tests for edge cases, such as patterns with special characters or patterns embedded in larger strings.
  3. Thread Safety

    • The GovernanceHooks class uses a shared execution context (self._ctx) and maintains state (e.g., self._registered, self._hook_fns). If multiple threads or asynchronous tasks are using the same GovernanceHooks instance, this could lead to race conditions or inconsistent state.
    • Recommendation: Use thread-safe data structures or synchronization mechanisms (e.g., threading.Lock) to ensure thread safety when accessing or modifying shared state.
  4. Cedar/OPA Integration

    • The pre_execute and post_execute methods are used for policy evaluation, but their implementations are not included in the diff. It is unclear how these methods handle untrusted input or whether they are vulnerable to injection attacks.
    • Recommendation: Review the pre_execute and post_execute implementations to ensure they properly sanitize and validate input. Add tests to verify that malicious input cannot bypass governance policies.

WARNING

  1. Backward Compatibility

    • The wrap() method is marked as deprecated and emits a DeprecationWarning. While this maintains backward compatibility for now, the eventual removal of wrap() in v1.0 will be a breaking change for users relying on this method.
    • Recommendation: Clearly document the deprecation timeline and provide migration guides to help users transition to the new as_hooks() API. Consider gathering feedback from users to ensure the transition is smooth.
  2. Global Hook Registration

    • The GovernanceHooks class registers hooks globally, which could lead to conflicts if multiple GovernanceHooks instances are registered simultaneously.
    • Recommendation: Add safeguards to prevent multiple GovernanceHooks instances from being registered at the same time. For example, maintain a global registry of active GovernanceHooks instances and raise an error if a new instance is registered while another is active.

SUGGESTIONS

  1. Logging Improvements

    • The logging statements in the GovernanceHooks class provide useful information, but they could be enhanced with additional context, such as the specific agent or crew involved in a policy violation.
    • Recommendation: Include more contextual information (e.g., agent name, crew ID) in log messages to aid debugging and auditing.
  2. Testing Coverage

    • The PR mentions 43 new tests, but it is unclear if these tests cover all edge cases, such as:
      • Nested or complex data structures in tool arguments or LLM messages.
      • Performance under high concurrency.
      • Behavior when crewai.hooks is unavailable.
    • Recommendation: Ensure comprehensive test coverage, including edge cases and performance under load. Consider using tools like pytest-xdist to test concurrency.
  3. Documentation

    • The new GovernanceHooks class and as_hooks() method are well-documented, but the documentation could benefit from additional examples and use cases, especially for the new hooks.
    • Recommendation: Add more detailed examples to the documentation, including how to handle common scenarios like blocked patterns, allowlists, and Cedar/OPA integration.
  4. Error Handling

    • The PolicyViolationError is raised for policy violations, but it is unclear if this exception is logged or handled in a way that provides actionable insights to users.
    • Recommendation: Ensure that all PolicyViolationError instances include sufficient context (e.g., the specific policy violated, the input that caused the violation) and that these errors are logged appropriately.
  5. Performance Considerations

    • The hooks perform multiple checks (e.g., pattern matching, policy evaluation) for every tool and LLM call. This could introduce performance overhead, especially in high-throughput scenarios.
    • Recommendation: Profile the performance of the hooks under realistic workloads. Consider optimizing the matches_pattern method and other frequently called functions.

Conclusion

This PR introduces a robust and modular approach to governance in the CrewAI adapter, addressing several limitations of the previous proxy-based implementation. While the changes are well-designed and maintain backward compatibility, there are critical areas that require attention, particularly around fail-closed behavior, thread safety, and policy evaluation. Addressing these issues will ensure the security and reliability of the new governance hooks.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: test-generator — `agent_os/integrations/crewai_adapter.py`

Test Coverage Analysis

agent_os/integrations/crewai_adapter.py

Existing Coverage:

  1. GovernanceHooks Class:

    • The PR mentions 43 new tests covering all four hook types (before_tool_call, after_tool_call, before_llm_call, after_llm_call), Cedar evaluator integration, deprecation warnings, and backward compatibility.
    • The register and unregister methods are likely tested for basic functionality, as the PR mentions that hooks are being tested.
    • The wrap() method is explicitly mentioned as being covered by 12 existing regression tests.
  2. CrewAIKernel Class:

    • The wrap() method is covered by existing regression tests.
    • The as_hooks() method is likely covered by the new tests, as it is the primary entry point for the new functionality.

Missing Coverage:

  1. Edge Cases for GovernanceHooks:

    • Handling of malformed inputs, such as invalid ToolCallHookContext or LLMCallHookContext objects.
    • Behavior when crewai.hooks is unavailable (e.g., _HOOKS_AVAILABLE is False).
    • Concurrent registration and unregistration of hooks (potential race conditions).
    • Behavior when unregister() is called multiple times or without prior registration.
    • Handling of unexpected exceptions within hook functions (e.g., pre_execute or post_execute raising errors).
  2. Policy Evaluation Edge Cases:

    • Boundary conditions for max_tool_calls (e.g., exactly at the limit, just below, just above).
    • Conflicting policies (e.g., a tool is both in the allowlist and matches a blocked pattern).
    • Behavior when Cedar/OPA evaluator returns unexpected results or errors.
  3. Trust Scoring Edge Cases:

    • Handling of expired certificates or invalid trust scores in pre_execute and post_execute.
    • Behavior when post_execute drift detection returns ambiguous or null results.
  4. Chaos Experiments:

    • Simulating partial failures in hooks (e.g., before_tool_call succeeds but after_tool_call fails).
    • Timeout handling in pre_execute and post_execute.
  5. Concurrency:

    • Simultaneous tool/LLM calls from multiple agents and crews.
    • Race conditions in shared state (e.g., ctx.call_count).
  6. Input Validation:

    • Malformed inputs to kickoff() and kickoff_async() (e.g., non-dict inputs, missing keys).
    • Injection attempts in tool names, arguments, or LLM messages.

Suggested Test Cases:

  1. test_hooks_registration_failure

    • Simulate the absence of crewai.hooks and verify that register() raises a RuntimeError.
  2. test_hooks_unregister_without_register

    • Call unregister() without prior registration and verify no errors occur.
  3. test_hooks_concurrent_registration

    • Simulate concurrent calls to register() and unregister() and verify no race conditions or inconsistent states.
  4. test_policy_conflicting_rules

    • Define a policy where a tool is both in the allowlist and matches a blocked pattern. Verify the blocked pattern takes precedence.
  5. test_max_tool_calls_boundary

    • Test max_tool_calls at boundary conditions (e.g., exactly at the limit, just below, just above).
  6. test_pre_execute_timeout

    • Simulate a timeout in the pre_execute evaluator and verify the tool/LLM call is blocked.
  7. test_post_execute_partial_failure

    • Simulate a failure in post_execute drift detection and verify the appropriate exception is raised.
  8. test_llm_input_injection

    • Test LLM input messages with potential injection attempts (e.g., SQL injection, script tags) and verify they are blocked.
  9. test_tool_output_blocked_pattern

    • Simulate a tool output containing a blocked pattern and verify the appropriate exception is raised.
  10. test_concurrent_tool_calls

    • Simulate multiple agents making concurrent tool calls and verify ctx.call_count is updated correctly without race conditions.
  11. test_malformed_tool_call_context

    • Pass a malformed ToolCallHookContext to before_tool_call and verify graceful handling or appropriate errors.
  12. test_deprecated_wrap_warning

    • Verify that using the wrap() method emits a DeprecationWarning.

By adding these test cases, the coverage of edge cases and robustness of the new GovernanceHooks and CrewAIKernel functionality can be significantly improved.

@github-actions
Copy link
Copy Markdown

PR Review Summary

Check Status Details
🔍 Code Review ❌ Failed Issues detected
🛡️ Security Scan ✅ Completed Analysis complete
🔄 Breaking Changes ⚠️ Warning See details
📝 Docs Sync ✅ Completed Analysis complete
🧪 Test Coverage ❌ Failed Issues detected

Verdict: ❌ Changes needed

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: CrewAI GovernanceHooks

Clean API design, @miyannishar. The as_hooks() pattern is consistent with the other adapters. Two blocking issues:

Blocking

1. unregister() doesn't actually remove hooks from CrewAI
unregister() sets self._registered = False and clears self._hook_fns, but the closures registered with CrewAI still hold live references and continue enforcing governance. The user believes they cleaned up, but governance is silently still active.

Fix: add an _active guard inside each hook closure:
python def governance_before_tool(context): if not self._registered: return None # passthrough when unregistered # ... existing logic

2. Non-string tool/LLM results bypass blocked-pattern scanning
after_tool_call and after_llm_call only scan str results. Dict, list, or object results skip scanning entirely. A tool returning {"query": "DROP TABLE users"} passes undetected. The before_tool_call hook already handles this correctly by coercing with str(tool_input). Apply the same pattern to after hooks.

Warnings

3. call_count incremented before max-check
Blocked calls still inflate the counter. Check before incrementing:
python if kernel.policy.max_tool_calls and ctx.call_count + 1 > kernel.policy.max_tool_calls: return False ctx.call_count += 1

4. Double deprecation warning from module-level wrap()
Same pattern as other adapters. Suppress the inner warning.

5. before_llm_call uses inconsistent content extraction for Cedar/OPA
The per-message loop handles dict/str/object, but combined_input for Cedar/OPA only handles two types. Extract a helper _extract_content(msg) and use in both places.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated review (condensed):

TL;DR: 2 blockers. Fix those and this ships.

# Sev Issue Where
1 Block unregister() doesn't guard hook closures -- governance silently persists after cleanup unregister()
2 Block Non-string tool/LLM results bypass blocked-pattern scanning after_tool_call, after_llm_call
3 Warn call_count incremented before max-check (blocked calls inflate counter) before_tool_call
4 Warn Double deprecation warning from module-level wrap() wrap()
5 Warn combined_input for Cedar/OPA uses different content extraction than pattern scanner before_llm_call

#1: Add if not self._registered: return None guard inside each hook closure.

#2: Coerce with str(tool_result) before scanning, matching before_tool_call pattern.

#3-5 are fine as follow-ups.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving native hooks migration.

@imran-siddique imran-siddique merged commit 1436169 into microsoft:main Apr 30, 2026
13 of 14 checks passed
imran-siddique pushed a commit to imran-siddique/agent-governance-toolkit that referenced this pull request May 4, 2026
…microsoft#1588)

Replace fragile proxy-based monkey-patching with CrewAI's native
execution hooks (@before_tool_call, @after_tool_call, @before_llm_call,
@after_llm_call) introduced in CrewAI 0.80+.

Changes:
- Add GovernanceHooks class that registers four global governance hooks
  for tool allowlist, blocked-pattern scanning, Cedar/OPA pre_execute
  gates, and output content filtering
- Add CrewAIKernel.as_hooks() factory method as the recommended
  integration path
- Deprecate CrewAIKernel.wrap() with DeprecationWarning pointing to
  as_hooks()
- Export CrewAIGovernanceHooks from integrations package
- Add 43 new tests covering all four hook types, deprecation warnings,
  Cedar evaluator integration, and backward compatibility
- All 12 existing CrewAI regression tests pass unchanged

Resolves microsoft#1576

Co-authored-by: Nishar <you@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR (500+ lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CrewAI adapter: use native execution hooks instead of wrap() workaround

2 participants