Skip to content

feat(openai): native RunHooks lifecycle + BaseIntegration inheritance#1582

Merged
imran-siddique merged 1 commit into
microsoft:mainfrom
miyannishar:feat/openai-agents-native-runhooks
Apr 30, 2026
Merged

feat(openai): native RunHooks lifecycle + BaseIntegration inheritance#1582
imran-siddique merged 1 commit into
microsoft:mainfrom
miyannishar:feat/openai-agents-native-runhooks

Conversation

@miyannishar
Copy link
Copy Markdown
Collaborator

Summary

Refactors OpenAIAgentsKernel to extend BaseIntegration and implement the OpenAI Agents SDK's native RunHooks lifecycle interface, replacing the fragile proxy-based wrap()/wrap_runner() workarounds with first-class framework hooks.

Closes #1576

Motivation

The existing OpenAI adapter relied on proxy wrapping (wrap(), wrap_runner()) to intercept agent and tool execution. This approach:

  • Could not intercept lifecycle events between agent turns (e.g., handoffs)
  • Required fragile __getattr__ delegation that broke with SDK updates
  • Was inconsistent with the ADK adapter's native BasePlugin integration pattern
  • Did not inherit BaseIntegration's Cedar/OPA policy evaluation

This PR aligns the OpenAI adapter with the ADK adapter's architecture by using the SDK's native RunHooks interface.

Changes

Core Adapter (openai_agents_sdk.py)

Area Before After
Base class Standalone class Extends BaseIntegration
Primary API kernel.wrap(agent) + kernel.wrap_runner(Runner) Runner.run(agent, hooks=kernel.as_hooks())
Lifecycle hooks None (proxy-based interception) GovernanceRunHooks(RunHooks) with 5 callbacks
Policy evaluation Local checks only Cedar/OPA via pre_execute()/post_execute() + local checks
Tool identity Not available in Cedar context tool_name/tool_args threaded into Cedar context

GovernanceRunHooks Lifecycle Coverage

Callback Governance Action
on_agent_start Content filter, Cedar/OPA gate, pre_execute()
on_agent_end Output validation via post_execute(), audit recording
on_tool_start Tool allow/blocklist, budget enforcement, Cedar gate with tool identity
on_tool_end Output content filter, audit recording
on_handoff Handoff limit enforcement, audit trail

Backward Compatibility

All three legacy methods are preserved with DeprecationWarning:

  • wrap() → use as_hooks()
  • wrap_runner() → use as_hooks()
  • create_tool_guard() → handled by on_tool_start

Bug Fixes

  • ctx.eventsctx.tool_calls: Fixed wrap_runner() to use the correct ExecutionContext field
  • Double-block in on_agent_start: Prevented content filter and Cedar/OPA gate from both triggering on the same violation

Code Quality

  • Standardized all docstrings/comments to match ADK/LangChain adapter conventions (RST-style Args:/Returns:/Raises:, consistent section separators)

Test Results

Suite Result
test_openai_agents_sdk_adapter.py 63/63 ✅
test_coverage_boost.py (OAI section) 21/21 ✅
Full regression 3070 passed, 46 skipped, 0 failures

Usage Example

from agent_os.integrations.openai_agents_sdk import OpenAIAgentsKernel
from agents import Agent, Runner

# Basic governance
kernel = OpenAIAgentsKernel(
    blocked_tools=["shell_exec"],
    blocked_patterns=["DROP TABLE"],
)
result = await Runner.run(agent, "Analyze data", hooks=kernel.as_hooks())

# With Cedar policy evaluation
kernel = OpenAIAgentsKernel.from_cedar("policies/governance.cedar")
result = await Runner.run(agent, "...", hooks=kernel.as_hooks())

Refactor OpenAIAgentsKernel to extend BaseIntegration and implement
the OpenAI Agents SDK's native RunHooks lifecycle interface, replacing
the fragile proxy-based wrap()/wrap_runner() workarounds.

Changes:
- Implement GovernanceRunHooks(RunHooks) with all 5 lifecycle callbacks:
  on_agent_start, on_agent_end, on_tool_start, on_tool_end, on_handoff
- Extend BaseIntegration for centralized Cedar/OPA policy evaluation
  via pre_execute()/post_execute()
- Add as_hooks() as the primary API entry point
- Thread tool_name/tool_args into Cedar context for precise policy gating
- Deprecate wrap(), wrap_runner(), create_tool_guard() with warnings
- Fix ctx.events -> ctx.tool_calls for ExecutionContext compatibility
- Fix double-block in on_agent_start (content filter + Cedar gate)
- Standardize docstrings and comments to match ADK/LangChain conventions
- Update test_coverage_boost.py for new BaseIntegration API

Test results: 3070 passed, 0 failures (full regression)

Closes microsoft#1576
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: security-scanner — View details

No security issues found.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

  • OpenAIAgentsKernel in openai_agents_sdk.py -- missing docstring for as_hooks() method.
  • README.md -- update required to reflect the new as_hooks() method and deprecation of wrap() and wrap_runner().
  • CHANGELOG.md -- missing entry for the introduction of as_hooks() and deprecation of wrap() and wrap_runner() methods.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity Change Impact
High wrap(), wrap_runner(), and create_tool_guard() methods are deprecated. Existing users relying on these methods will need to migrate to the new as_hooks() method.
Medium wrap() and wrap_runner() now issue DeprecationWarning. May cause warnings in existing codebases using these methods.
Medium create_tool_guard() is deprecated in favor of as_hooks() with on_tool_start. Users relying on create_tool_guard() must transition to the new hook-based governance model.

@github-actions github-actions Bot added the tests label Apr 29, 2026
@github-actions
Copy link
Copy Markdown

🤖 AI Agent: code-reviewer — Review Summary

Review Summary

This pull request introduces a significant refactor to the OpenAIAgentsKernel by aligning it with the BaseIntegration architecture and implementing the OpenAI Agents SDK's native RunHooks lifecycle interface. The changes aim to replace the previous proxy-based wrap()/wrap_runner() approach with a more robust and maintainable solution. This refactor addresses several limitations of the old implementation, improves policy enforcement, and enhances backward compatibility.

Below is a detailed review of the changes, focusing on the specified focus areas.


CRITICAL Issues

  1. Policy Engine Correctness:

    • Issue: The _check_tool_allowed and _check_content methods return a tuple (allowed, reason) but do not enforce the policy directly. This could lead to accidental misuse if developers forget to handle the allowed flag properly.
    • Actionable Feedback: Consider raising exceptions directly within these methods when a policy violation is detected. This would ensure that policy violations are always enforced and cannot be accidentally ignored.
  2. Trust/Identity:

    • Issue: The on_tool_start hook does not appear to validate the identity of the tool beyond its name. This could allow an attacker to spoof a legitimate tool by using the same name.
    • Actionable Feedback: Implement additional checks to verify the authenticity of tools, such as cryptographic signatures or other forms of identity validation.
  3. Sandbox Escape Vectors:

    • Issue: The _check_content method uses simple substring matching for blocked patterns. This approach is prone to bypasses using encoding, obfuscation, or slight variations in the blocked patterns.
    • Actionable Feedback: Use a more robust content filtering mechanism, such as regular expressions or a dedicated content filtering library, to mitigate potential bypasses.

WARNING: Potential Breaking Changes

  1. Deprecation of wrap() and wrap_runner():

    • While the PR includes DeprecationWarning for these methods, their eventual removal will break existing integrations.
    • Actionable Feedback: Clearly document the deprecation timeline and provide migration guides for users to transition to the as_hooks() approach.
  2. Backward Compatibility for create_tool_guard:

    • The create_tool_guard() method is marked as deprecated, but its removal will break existing code that relies on it.
    • Actionable Feedback: Similar to the above, provide clear documentation and migration paths for users.

Suggestions for Improvement

  1. Thread Safety in Concurrent Agent Execution:

    • Observation: The _agent_contexts dictionary is not thread-safe. Concurrent access to this dictionary could lead to race conditions.
    • Suggestion: Use a thread-safe data structure, such as collections.defaultdict with a threading.Lock, or consider using concurrent.futures.ThreadPoolExecutor for managing concurrency.
  2. OWASP Agentic Top 10 Compliance:

    • Observation: The PR does not include explicit measures to address OWASP Agentic Top 10 risks, such as logging sensitive data or ensuring secure communication.
    • Suggestion: Ensure that sensitive data (e.g., tool arguments, agent inputs) is redacted in logs. Additionally, verify that all communication with external tools or APIs is encrypted and authenticated.
  3. Type Safety and Pydantic Model Validation:

    • Observation: The ExecutionContext and GovernancePolicy classes are implemented as dataclass objects but lack validation for their fields.
    • Suggestion: Consider migrating these classes to Pydantic models to leverage built-in validation and type enforcement.
  4. Documentation:

    • Observation: While the docstrings are detailed, the deprecation warnings for wrap(), wrap_runner(), and create_tool_guard() could be more explicit about the timeline for removal.
    • Suggestion: Add a clear deprecation timeline and provide examples of how to migrate to the new as_hooks() approach.
  5. Testing:

    • Observation: The test coverage for the new GovernanceRunHooks class and its integration with the Runner is not explicitly mentioned.
    • Suggestion: Ensure comprehensive test coverage for all RunHooks callbacks (on_agent_start, on_agent_end, on_tool_start, on_tool_end, on_handoff) to verify their correctness and prevent regressions.

Summary of Feedback

  • CRITICAL: Address potential policy bypass issues in _check_tool_allowed and _check_content. Enhance tool identity validation and improve content filtering to prevent sandbox escapes.
  • WARNING: Document and plan for the eventual removal of deprecated methods (wrap(), wrap_runner(), create_tool_guard()).
  • SUGGESTION: Improve thread safety, OWASP compliance, type safety, and documentation. Ensure comprehensive test coverage for the new lifecycle hooks.

Overall, this is a well-structured and much-needed refactor that improves the maintainability and robustness of the OpenAIAgentsKernel. Addressing the above issues will further enhance its security and usability.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: test-generator — `agent_os/integrations/openai_agents_sdk.py`

Test Coverage Analysis

agent_os/integrations/openai_agents_sdk.py

  • Existing coverage:

    • The test suite test_openai_agents_sdk_adapter.py appears to cover basic functionality of the OpenAIAgentsKernel, including policy enforcement, content filtering, and tool governance.
    • The test_coverage_boost.py suite includes additional tests for OpenAI-specific governance scenarios, such as Cedar/OPA integration and blocked tools/patterns.
  • Missing coverage:

    1. Edge cases for policy evaluation:
      • Conflicting policies (e.g., a tool is both allowed and blocked).
      • Boundary conditions for max_tool_calls, max_handoffs, and timeout_seconds.
    2. Trust scoring:
      • Expired certificates or invalid trust scores in Cedar/OPA policies.
    3. Chaos experiments:
      • Partial failures during on_tool_start or on_agent_end lifecycle hooks.
      • Timeout handling for long-running tools or agents.
    4. Concurrency:
      • Race conditions in shared state (e.g., _agent_contexts or _audit_events).
    5. Input validation:
      • Malformed inputs to Runner.run() or tool functions.
      • Injection attempts in blocked_patterns or tool arguments.
  • Suggested test cases:

    1. test_conflicting_policies:
      • Simulate a scenario where a tool is both in the allowed_tools list and the blocked_tools list. Verify that the stricter policy (blocking) is applied.
    2. test_boundary_conditions:
      • Test max_tool_calls and max_handoffs at their exact limits and slightly above/below. Ensure correct enforcement behavior.
    3. test_expired_certificates:
      • Mock Cedar/OPA policy evaluation with expired certificates or invalid trust scores. Verify that violations are logged and/or raised.
    4. test_partial_failures_in_hooks:
      • Inject failures into on_tool_start or on_agent_end hooks (e.g., raise exceptions). Verify that the kernel handles these gracefully without crashing.
    5. test_timeout_handling:
      • Simulate a tool or agent that exceeds timeout_seconds. Verify that the kernel enforces the timeout and logs the violation.
    6. test_concurrent_context_access:
      • Use multithreading or asyncio tasks to simulate concurrent access to _agent_contexts and _audit_events. Verify that no race conditions occur.
    7. test_malformed_inputs:
      • Pass malformed inputs (e.g., None, empty strings, or overly long strings) to Runner.run() and tool functions. Verify that input validation prevents crashes or policy violations.
    8. test_injection_attempts:
      • Include SQL injection-like patterns in blocked_patterns and tool arguments. Verify that these are correctly detected and blocked.

These test cases will ensure robust coverage of edge cases and critical governance scenarios for the OpenAIAgentsKernel.

@github-actions github-actions Bot added the size/XL Extra large PR (500+ lines) label Apr 29, 2026
@github-actions
Copy link
Copy Markdown

PR Review Summary

Check Status Details
🔍 Code Review ❌ Failed Issues detected
🛡️ Security Scan ✅ Completed Analysis complete
🔄 Breaking Changes ⚠️ Warning See details
📝 Docs Sync ✅ Completed Analysis complete
🧪 Test Coverage ❌ Failed Issues detected

Verdict: ❌ Changes needed

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: OpenAI RunHooks

Thanks for the thorough work here, @miyannishar. The test coverage and backward compatibility approach are excellent. Found a few issues that need addressing before merge:

Blocking (Security)

1. on_agent_start fails-open when
equire_human_approval=False

When content matches a �locked_pattern and
equire_human_approval is False, the violation is logged but execution continues. A governance framework must always block forbidden content. The
equire_human_approval flag should control whether violations go through an approval queue, not whether they're enforced at all. Compare with on_tool_start which correctly always raises.

Fix: always raise PolicyViolationError on blocked content, regardless of
equire_human_approval.

2. on_tool_end and on_agent_end fail-open on blocked output
Blocked patterns in tool/agent output are logged as warnings but don't raise PolicyViolationError or call on_violation(). Blocked content (e.g., DROP TABLE, leaked secrets) silently passes through to the caller. Output filtering must be fail-closed.

Warnings

3. Shared mutable _tool_call_count/_handoff_count across concurrent runs
Counters are on the kernel instance, not per-run. Concurrent Runner.run() calls sharing one kernel will race on these counters. Consider moving counters to GovernanceRunHooks or keying by run-id.

4. max_handoffs not propagated to GovernancePolicy
The constructor accepts max_handoffs but doesn't pass it to the policy dataclass, unlike max_tool_calls. Policy-based init will ignore handoff limits.

5. Budget check operator inconsistency
on_tool_start uses > while BaseIntegration.pre_execute uses >= for the same semantic check. These can disagree on whether the Nth call is allowed.

Nits

  • ExecutionContext removed from all without a re-export deprecation cycle
  • �s_hooks() should raise ImportError when SDK isn't installed (match ADK adapter pattern)

The fail-open issues (#1, #2) are the blockers. The rest can be addressed in follow-ups if needed.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated review (condensed from above):

TL;DR: 2 blockers (fail-open security gaps). Fix those and this ships.

# Sev Issue Where
1 Block on_agent_start logs but doesn't raise on blocked content when require_human_approval=False on_agent_start
2 Block on_tool_end/on_agent_end warn on blocked output but never raise or call on_violation() on_tool_end, on_agent_end
3 Warn Shared mutable counters across concurrent Runner.run() calls __init__
4 Warn max_handoffs not passed to GovernancePolicy unlike max_tool_calls __init__
5 Warn Budget check uses > here vs >= in BaseIntegration.pre_execute on_tool_start

#1: Always raise PolicyViolationError on blocked patterns. require_human_approval should gate the approval queue, not enforcement.

#2: Match on_tool_start behavior: raise + call on_violation() on blocked output.

Warnings are nice-to-haves, can be follow-up PRs.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving native hooks migration.

@imran-siddique imran-siddique merged commit c07d6c9 into microsoft:main Apr 30, 2026
13 of 14 checks passed
imran-siddique pushed a commit to imran-siddique/agent-governance-toolkit that referenced this pull request May 4, 2026
…microsoft#1582)

Refactor OpenAIAgentsKernel to extend BaseIntegration and implement
the OpenAI Agents SDK's native RunHooks lifecycle interface, replacing
the fragile proxy-based wrap()/wrap_runner() workarounds.

Changes:
- Implement GovernanceRunHooks(RunHooks) with all 5 lifecycle callbacks:
  on_agent_start, on_agent_end, on_tool_start, on_tool_end, on_handoff
- Extend BaseIntegration for centralized Cedar/OPA policy evaluation
  via pre_execute()/post_execute()
- Add as_hooks() as the primary API entry point
- Thread tool_name/tool_args into Cedar context for precise policy gating
- Deprecate wrap(), wrap_runner(), create_tool_guard() with warnings
- Fix ctx.events -> ctx.tool_calls for ExecutionContext compatibility
- Fix double-block in on_agent_start (content filter + Cedar gate)
- Standardize docstrings and comments to match ADK/LangChain conventions
- Update test_coverage_boost.py for new BaseIntegration API

Test results: 3070 passed, 0 failures (full regression)

Closes microsoft#1576

Co-authored-by: Nishar <you@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR (500+ lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI Agents SDK adapter: use native RunHooks instead of wrap() workaround

2 participants