feat(openai): native RunHooks lifecycle + BaseIntegration inheritance by miyannishar · Pull Request #1582 · microsoft/agent-governance-toolkit

miyannishar · 2026-04-29T19:37:55Z

Summary

Refactors OpenAIAgentsKernel to extend BaseIntegration and implement the OpenAI Agents SDK's native RunHooks lifecycle interface, replacing the fragile proxy-based wrap()/wrap_runner() workarounds with first-class framework hooks.

Closes #1576

Motivation

The existing OpenAI adapter relied on proxy wrapping (wrap(), wrap_runner()) to intercept agent and tool execution. This approach:

Could not intercept lifecycle events between agent turns (e.g., handoffs)
Required fragile __getattr__ delegation that broke with SDK updates
Was inconsistent with the ADK adapter's native BasePlugin integration pattern
Did not inherit BaseIntegration's Cedar/OPA policy evaluation

This PR aligns the OpenAI adapter with the ADK adapter's architecture by using the SDK's native RunHooks interface.

Changes

Core Adapter (`openai_agents_sdk.py`)

Area	Before	After
Base class	Standalone class	Extends `BaseIntegration`
Primary API	`kernel.wrap(agent)` + `kernel.wrap_runner(Runner)`	`Runner.run(agent, hooks=kernel.as_hooks())`
Lifecycle hooks	None (proxy-based interception)	`GovernanceRunHooks(RunHooks)` with 5 callbacks
Policy evaluation	Local checks only	Cedar/OPA via `pre_execute()`/`post_execute()` + local checks
Tool identity	Not available in Cedar context	`tool_name`/`tool_args` threaded into Cedar context

`GovernanceRunHooks` Lifecycle Coverage

Callback	Governance Action
`on_agent_start`	Content filter, Cedar/OPA gate, `pre_execute()`
`on_agent_end`	Output validation via `post_execute()`, audit recording
`on_tool_start`	Tool allow/blocklist, budget enforcement, Cedar gate with tool identity
`on_tool_end`	Output content filter, audit recording
`on_handoff`	Handoff limit enforcement, audit trail

Backward Compatibility

All three legacy methods are preserved with DeprecationWarning:

wrap() → use as_hooks()
wrap_runner() → use as_hooks()
create_tool_guard() → handled by on_tool_start

Bug Fixes

ctx.events → ctx.tool_calls: Fixed wrap_runner() to use the correct ExecutionContext field
Double-block in on_agent_start: Prevented content filter and Cedar/OPA gate from both triggering on the same violation

Code Quality

Standardized all docstrings/comments to match ADK/LangChain adapter conventions (RST-style Args:/Returns:/Raises:, consistent section separators)

Test Results

Suite	Result
`test_openai_agents_sdk_adapter.py`	63/63 ✅
`test_coverage_boost.py` (OAI section)	21/21 ✅
Full regression	3070 passed, 46 skipped, 0 failures

Usage Example

from agent_os.integrations.openai_agents_sdk import OpenAIAgentsKernel
from agents import Agent, Runner

# Basic governance
kernel = OpenAIAgentsKernel(
    blocked_tools=["shell_exec"],
    blocked_patterns=["DROP TABLE"],
)
result = await Runner.run(agent, "Analyze data", hooks=kernel.as_hooks())

# With Cedar policy evaluation
kernel = OpenAIAgentsKernel.from_cedar("policies/governance.cedar")
result = await Runner.run(agent, "...", hooks=kernel.as_hooks())

Refactor OpenAIAgentsKernel to extend BaseIntegration and implement the OpenAI Agents SDK's native RunHooks lifecycle interface, replacing the fragile proxy-based wrap()/wrap_runner() workarounds. Changes: - Implement GovernanceRunHooks(RunHooks) with all 5 lifecycle callbacks: on_agent_start, on_agent_end, on_tool_start, on_tool_end, on_handoff - Extend BaseIntegration for centralized Cedar/OPA policy evaluation via pre_execute()/post_execute() - Add as_hooks() as the primary API entry point - Thread tool_name/tool_args into Cedar context for precise policy gating - Deprecate wrap(), wrap_runner(), create_tool_guard() with warnings - Fix ctx.events -> ctx.tool_calls for ExecutionContext compatibility - Fix double-block in on_agent_start (content filter + Cedar gate) - Standardize docstrings and comments to match ADK/LangChain conventions - Update test_coverage_boost.py for new BaseIntegration API Test results: 3070 passed, 0 failures (full regression) Closes microsoft#1576

github-actions · 2026-04-29T19:48:06Z

🤖 AI Agent: security-scanner — View details

No security issues found.

github-actions · 2026-04-29T19:48:12Z

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

OpenAIAgentsKernel in openai_agents_sdk.py -- missing docstring for as_hooks() method.
README.md -- update required to reflect the new as_hooks() method and deprecation of wrap() and wrap_runner().
CHANGELOG.md -- missing entry for the introduction of as_hooks() and deprecation of wrap() and wrap_runner() methods.

github-actions · 2026-04-29T19:48:13Z

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity	Change	Impact
High	`wrap()`, `wrap_runner()`, and `create_tool_guard()` methods are deprecated.	Existing users relying on these methods will need to migrate to the new `as_hooks()` method.
Medium	`wrap()` and `wrap_runner()` now issue `DeprecationWarning`.	May cause warnings in existing codebases using these methods.
Medium	`create_tool_guard()` is deprecated in favor of `as_hooks()` with `on_tool_start`.	Users relying on `create_tool_guard()` must transition to the new hook-based governance model.

github-actions · 2026-04-29T19:48:21Z

🤖 AI Agent: code-reviewer — Review Summary

Review Summary

This pull request introduces a significant refactor to the OpenAIAgentsKernel by aligning it with the BaseIntegration architecture and implementing the OpenAI Agents SDK's native RunHooks lifecycle interface. The changes aim to replace the previous proxy-based wrap()/wrap_runner() approach with a more robust and maintainable solution. This refactor addresses several limitations of the old implementation, improves policy enforcement, and enhances backward compatibility.

Below is a detailed review of the changes, focusing on the specified focus areas.

CRITICAL Issues

Policy Engine Correctness:
- Issue: The _check_tool_allowed and _check_content methods return a tuple (allowed, reason) but do not enforce the policy directly. This could lead to accidental misuse if developers forget to handle the allowed flag properly.
- Actionable Feedback: Consider raising exceptions directly within these methods when a policy violation is detected. This would ensure that policy violations are always enforced and cannot be accidentally ignored.
Trust/Identity:
- Issue: The on_tool_start hook does not appear to validate the identity of the tool beyond its name. This could allow an attacker to spoof a legitimate tool by using the same name.
- Actionable Feedback: Implement additional checks to verify the authenticity of tools, such as cryptographic signatures or other forms of identity validation.
Sandbox Escape Vectors:
- Issue: The _check_content method uses simple substring matching for blocked patterns. This approach is prone to bypasses using encoding, obfuscation, or slight variations in the blocked patterns.
- Actionable Feedback: Use a more robust content filtering mechanism, such as regular expressions or a dedicated content filtering library, to mitigate potential bypasses.

WARNING: Potential Breaking Changes

Deprecation of wrap() and wrap_runner():
- While the PR includes DeprecationWarning for these methods, their eventual removal will break existing integrations.
- Actionable Feedback: Clearly document the deprecation timeline and provide migration guides for users to transition to the as_hooks() approach.
Backward Compatibility for create_tool_guard:
- The create_tool_guard() method is marked as deprecated, but its removal will break existing code that relies on it.
- Actionable Feedback: Similar to the above, provide clear documentation and migration paths for users.

Suggestions for Improvement

Thread Safety in Concurrent Agent Execution:
- Observation: The _agent_contexts dictionary is not thread-safe. Concurrent access to this dictionary could lead to race conditions.
- Suggestion: Use a thread-safe data structure, such as collections.defaultdict with a threading.Lock, or consider using concurrent.futures.ThreadPoolExecutor for managing concurrency.
OWASP Agentic Top 10 Compliance:
- Observation: The PR does not include explicit measures to address OWASP Agentic Top 10 risks, such as logging sensitive data or ensuring secure communication.
- Suggestion: Ensure that sensitive data (e.g., tool arguments, agent inputs) is redacted in logs. Additionally, verify that all communication with external tools or APIs is encrypted and authenticated.
Type Safety and Pydantic Model Validation:
- Observation: The ExecutionContext and GovernancePolicy classes are implemented as dataclass objects but lack validation for their fields.
- Suggestion: Consider migrating these classes to Pydantic models to leverage built-in validation and type enforcement.
Documentation:
- Observation: While the docstrings are detailed, the deprecation warnings for wrap(), wrap_runner(), and create_tool_guard() could be more explicit about the timeline for removal.
- Suggestion: Add a clear deprecation timeline and provide examples of how to migrate to the new as_hooks() approach.
Testing:
- Observation: The test coverage for the new GovernanceRunHooks class and its integration with the Runner is not explicitly mentioned.
- Suggestion: Ensure comprehensive test coverage for all RunHooks callbacks (on_agent_start, on_agent_end, on_tool_start, on_tool_end, on_handoff) to verify their correctness and prevent regressions.

Summary of Feedback

CRITICAL: Address potential policy bypass issues in _check_tool_allowed and _check_content. Enhance tool identity validation and improve content filtering to prevent sandbox escapes.
WARNING: Document and plan for the eventual removal of deprecated methods (wrap(), wrap_runner(), create_tool_guard()).
SUGGESTION: Improve thread safety, OWASP compliance, type safety, and documentation. Ensure comprehensive test coverage for the new lifecycle hooks.

Overall, this is a well-structured and much-needed refactor that improves the maintainability and robustness of the OpenAIAgentsKernel. Addressing the above issues will further enhance its security and usability.

github-actions · 2026-04-29T19:49:17Z

🤖 AI Agent: test-generator — `agent_os/integrations/openai_agents_sdk.py`

Test Coverage Analysis

`agent_os/integrations/openai_agents_sdk.py`

Existing coverage:
- The test suite test_openai_agents_sdk_adapter.py appears to cover basic functionality of the OpenAIAgentsKernel, including policy enforcement, content filtering, and tool governance.
- The test_coverage_boost.py suite includes additional tests for OpenAI-specific governance scenarios, such as Cedar/OPA integration and blocked tools/patterns.
Missing coverage:
1. Edge cases for policy evaluation:
  - Conflicting policies (e.g., a tool is both allowed and blocked).
  - Boundary conditions for max_tool_calls, max_handoffs, and timeout_seconds.
2. Trust scoring:
  - Expired certificates or invalid trust scores in Cedar/OPA policies.
3. Chaos experiments:
  - Partial failures during on_tool_start or on_agent_end lifecycle hooks.
  - Timeout handling for long-running tools or agents.
4. Concurrency:
  - Race conditions in shared state (e.g., _agent_contexts or _audit_events).
5. Input validation:
  - Malformed inputs to Runner.run() or tool functions.
  - Injection attempts in blocked_patterns or tool arguments.
Suggested test cases:
1. test_conflicting_policies:
  - Simulate a scenario where a tool is both in the allowed_tools list and the blocked_tools list. Verify that the stricter policy (blocking) is applied.
2. test_boundary_conditions:
  - Test max_tool_calls and max_handoffs at their exact limits and slightly above/below. Ensure correct enforcement behavior.
3. test_expired_certificates:
  - Mock Cedar/OPA policy evaluation with expired certificates or invalid trust scores. Verify that violations are logged and/or raised.
4. test_partial_failures_in_hooks:
  - Inject failures into on_tool_start or on_agent_end hooks (e.g., raise exceptions). Verify that the kernel handles these gracefully without crashing.
5. test_timeout_handling:
  - Simulate a tool or agent that exceeds timeout_seconds. Verify that the kernel enforces the timeout and logs the violation.
6. test_concurrent_context_access:
  - Use multithreading or asyncio tasks to simulate concurrent access to _agent_contexts and _audit_events. Verify that no race conditions occur.
7. test_malformed_inputs:
  - Pass malformed inputs (e.g., None, empty strings, or overly long strings) to Runner.run() and tool functions. Verify that input validation prevents crashes or policy violations.
8. test_injection_attempts:
  - Include SQL injection-like patterns in blocked_patterns and tool arguments. Verify that these are correctly detected and blocked.

These test cases will ensure robust coverage of edge cases and critical governance scenarios for the OpenAIAgentsKernel.

github-actions · 2026-04-29T19:57:53Z

PR Review Summary

Check	Status	Details
🔍 Code Review	❌ Failed	Issues detected
🛡️ Security Scan	✅ Completed	Analysis complete
🔄 Breaking Changes	⚠️ Warning	See details
📝 Docs Sync	✅ Completed	Analysis complete
🧪 Test Coverage	❌ Failed	Issues detected

Verdict: ❌ Changes needed

imran-siddique

Code Review: OpenAI RunHooks

Thanks for the thorough work here, @miyannishar. The test coverage and backward compatibility approach are excellent. Found a few issues that need addressing before merge:

Blocking (Security)

1. on_agent_start fails-open when
equire_human_approval=False
When content matches a �locked_pattern and
equire_human_approval is False, the violation is logged but execution continues. A governance framework must always block forbidden content. The
equire_human_approval flag should control whether violations go through an approval queue, not whether they're enforced at all. Compare with on_tool_start which correctly always raises.

Fix: always raise PolicyViolationError on blocked content, regardless of
equire_human_approval.

2. on_tool_end and on_agent_end fail-open on blocked output
Blocked patterns in tool/agent output are logged as warnings but don't raise PolicyViolationError or call on_violation(). Blocked content (e.g., DROP TABLE, leaked secrets) silently passes through to the caller. Output filtering must be fail-closed.

Warnings

3. Shared mutable _tool_call_count/_handoff_count across concurrent runs
Counters are on the kernel instance, not per-run. Concurrent Runner.run() calls sharing one kernel will race on these counters. Consider moving counters to GovernanceRunHooks or keying by run-id.

4. max_handoffs not propagated to GovernancePolicy
The constructor accepts max_handoffs but doesn't pass it to the policy dataclass, unlike max_tool_calls. Policy-based init will ignore handoff limits.

5. Budget check operator inconsistency
on_tool_start uses > while BaseIntegration.pre_execute uses >= for the same semantic check. These can disagree on whether the Nth call is allowed.

Nits

ExecutionContext removed from all without a re-export deprecation cycle
�s_hooks() should raise ImportError when SDK isn't installed (match ADK adapter pattern)

The fail-open issues (#1, #2) are the blockers. The rest can be addressed in follow-ups if needed.

imran-siddique

Updated review (condensed from above):

TL;DR: 2 blockers (fail-open security gaps). Fix those and this ships.

#	Sev	Issue	Where
1	Block	`on_agent_start` logs but doesn't raise on blocked content when `require_human_approval=False`	`on_agent_start`
2	Block	`on_tool_end`/`on_agent_end` warn on blocked output but never raise or call `on_violation()`	`on_tool_end`, `on_agent_end`
3	Warn	Shared mutable counters across concurrent `Runner.run()` calls	`__init__`
4	Warn	`max_handoffs` not passed to `GovernancePolicy` unlike `max_tool_calls`	`__init__`
5	Warn	Budget check uses `>` here vs `>=` in `BaseIntegration.pre_execute`	`on_tool_start`

#1: Always raise PolicyViolationError on blocked patterns. require_human_approval should gate the approval queue, not enforcement.

#2: Match on_tool_start behavior: raise + call on_violation() on blocked output.

Warnings are nice-to-haves, can be follow-up PRs.

imran-siddique

Approving native hooks migration.

…microsoft#1582) Refactor OpenAIAgentsKernel to extend BaseIntegration and implement the OpenAI Agents SDK's native RunHooks lifecycle interface, replacing the fragile proxy-based wrap()/wrap_runner() workarounds. Changes: - Implement GovernanceRunHooks(RunHooks) with all 5 lifecycle callbacks: on_agent_start, on_agent_end, on_tool_start, on_tool_end, on_handoff - Extend BaseIntegration for centralized Cedar/OPA policy evaluation via pre_execute()/post_execute() - Add as_hooks() as the primary API entry point - Thread tool_name/tool_args into Cedar context for precise policy gating - Deprecate wrap(), wrap_runner(), create_tool_guard() with warnings - Fix ctx.events -> ctx.tool_calls for ExecutionContext compatibility - Fix double-block in on_agent_start (content filter + Cedar gate) - Standardize docstrings and comments to match ADK/LangChain conventions - Update test_coverage_boost.py for new BaseIntegration API Test results: 3070 passed, 0 failures (full regression) Closes microsoft#1576 Co-authored-by: Nishar <you@example.com>

This was referenced Apr 29, 2026

OpenAI Agents SDK adapter: use native RunHooks instead of wrap() workaround #1576

Closed

refactor(openai): deprecate wrap()/wrap_runner()/create_tool_guard() in favor of GovernanceRunHooks #1583

Closed

github-actions Bot added the tests label Apr 29, 2026

github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 29, 2026

This was referenced Apr 29, 2026

refactor(langchain): implement native AgentMiddleware for governance instead of wrap() proxy #1584

Closed

feat(autogen): add native GovernanceInterventionHandler via AutoGen v0.4+ hooks #1591

Merged

imran-siddique requested changes Apr 29, 2026

View reviewed changes

imran-siddique reviewed Apr 29, 2026

View reviewed changes

imran-siddique approved these changes Apr 30, 2026

View reviewed changes

imran-siddique merged commit c07d6c9 into microsoft:main Apr 30, 2026
13 of 14 checks passed

This was referenced Apr 30, 2026

feat(adapters): add native hooks for Anthropic, SK, smolagents, PydanticAI #1605

Merged

fix(lint): remove unused imports in openai_agents_sdk and autogen_adapter #1606

Merged

fix(tests): fix 48 native-hooks test failures in docker CI #1610

Merged

Conversation

miyannishar commented Apr 29, 2026

Summary

Motivation

Changes

Core Adapter (openai_agents_sdk.py)

GovernanceRunHooks Lifecycle Coverage

Backward Compatibility

Bug Fixes

Code Quality

Test Results

Usage Example

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Docs Sync

Uh oh!

github-actions Bot commented Apr 29, 2026

API Compatibility

Uh oh!

github-actions Bot commented Apr 29, 2026

Review Summary

CRITICAL Issues

WARNING: Potential Breaking Changes

Suggestions for Improvement

Summary of Feedback

Uh oh!

github-actions Bot commented Apr 29, 2026

Test Coverage Analysis

agent_os/integrations/openai_agents_sdk.py

Uh oh!

github-actions Bot commented Apr 29, 2026

PR Review Summary

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Code Review: OpenAI RunHooks

Blocking (Security)

Warnings

Nits

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

imran-siddique left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Core Adapter (`openai_agents_sdk.py`)

`GovernanceRunHooks` Lifecycle Coverage

`agent_os/integrations/openai_agents_sdk.py`