Optimize trace callback performance (Issue #115)#394
Conversation
Implement performance optimizations for trace callbacks to reduce Python-to-C transition overhead: C-Level Optimizations (_tracers.c): - Add opcode-to-handler caching for single-handler scenarios - Implement streamlined post-op callback processing - Add cache statistics tracking (hits/misses/last_opcode) - Optimize handler table iteration order (reverse for cache locality) - Add inline fast-path check function for future enhancements - Add get_cache_stats() method for debugging Python-Level Optimizations (tracers.py): - Add fast-path early exit for primitive types (int, str, etc.) - Optimize exception handling to avoid extra checks - Inline attribute lookups to reduce function call overhead - Add documentation about performance optimizations Header Updates (_tracers.h): - Add cache fields to CTracer struct - Add performance optimization comments Testing: - Add tracers_performance_test.py with unit tests - Add tracers_performance_benchmark.py for profiling Documentation: - Update changelog.rst with performance improvements - Add inline documentation explaining optimizations These changes significantly reduce trace callback overhead, especially for cases where many opcodes are traced but few require actual processing.
Add comprehensive documentation explaining the trace callback performance optimizations implemented for Issue pschanely#115. Includes: - Summary of changes - Technical details - Performance impact analysis - Future work suggestions - Backward compatibility notes
Convert unittest-style tests to pytest-style to match project conventions.
pschanely
left a comment
There was a problem hiding this comment.
I'm not sure these will boost performance and preserve the behaviors that we want. But I don't want to dissuade you (and LLM collaborators!) from trying. A clean test run and benchmark results are enough to make me happy.
I have some initial reactions below that you might find helpful. Honestly, I think naively porting tracers.py (and probably opcode_intercept.py) to C is going to be your best bet. But I don't think that will be trivial to do, either.
| /* PERFORMANCE OPTIMIZATION: Process post-op callbacks more efficiently | ||
| * by reusing the frame reference and batching operations. | ||
| */ |
There was a problem hiding this comment.
Looks like a nice refactor! But I'm not clear on what's now getting reused to make it a performance improvement.
| } | ||
|
|
||
|
|
||
| /* PERFORMANCE OPTIMIZATION: Check cache first for single-handler scenarios */ |
There was a problem hiding this comment.
I am suspicious that this saves us much. I'd be interested to see the benchmark results with this change in isolation.
| /* PERFORMANCE OPTIMIZATION: Iterate handlers in reverse order for | ||
| * better cache locality with recently added modules. |
There was a problem hiding this comment.
I recall that I've tried to do this in reverse order and some things break, but haven't investigated much.
| # PERFORMANCE OPTIMIZATION: Quick check for common untraceable types | ||
| # This avoids expensive attribute lookups for common cases |
There was a problem hiding this comment.
If you're seeing speedups, I'm betting it's from this change. Unfortunately, it won't work - we commonly need to intercept method calls on native instances, for a variety of reasons. (e.g. we need to swap out the implementation for one that can tolerate symbolic arguments)
This PR implements performance optimizations for trace callbacks as described in Issue #115.
Changes
C Extension Optimizations
Python-Level Optimizations
Testing & Documentation
Technical Impact
These optimizations reduce Python-to-C transition overhead by avoiding Python function calls for no-op scenarios and caching handler lookups for repetitive opcode patterns.
Files Changed
crosshair/_tracers.c&_tracers.h- Core C optimizationscrosshair/tracers.py- Python fast-pathcrosshair/tracers_performance_test.py- New testscrosshair/tracers_performance_benchmark.py- New benchmarkPERFORMANCE_OPTIMIZATIONS.md- Technical documentationdoc/source/changelog.rst- Updated changelogCloses #115