Add SQLite cache backend #157
Open
JacksonKaunismaa wants to merge 1 commit into
Open
Conversation
372c765 to
9de867e
Compare
Replace JSON bin-file approach with per-model SQLite databases. Activated via SQLITE_CACHE=true env var or use_sqlite=True in get_cache_manager(). Key improvements: - O(1) lookup by primary key (no loading entire 28MB bin files) - WAL mode for concurrent readers without blocking - Connection pooling (reuse across calls) - zstd compression (~559MB JSON → 3MB SQLite) - Schema versioning (stale entries = clean cache miss) - Batch lookups via SQL IN clause - Built-in hit/miss/cost statistics Benchmark (3500 entries, 10k lookups with 65% miss rate, 28MB/bin): File-based: 484.9s (event loop frozen 8 min) SQLite: 2.5s (193x faster) The pathology: FileBasedCacheManager reloads the ENTIRE bin from disk on every cache miss (to check if another process wrote the entry). With 6500 misses × 28MB bins = 182GB of JSON parsing serialized on the event loop. SQLite misses are a single B-tree lookup returning NULL. Also fixes pre-existing pyright errors in cache_manager.py (nullable responses field on LLMCache, redis type annotations).
9de867e to
ecea69d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SQLiteCacheManageras a new cache backend, selectable viaCacheBackend.SQLITEenum.sqlitefiles with WAL mode, connection pooling, zstd compression, schema versioningcache_manager.py(nullableresponses, redis type annotations)use_redis: boolwithcache_backend: CacheBackendenum onInferenceAPIandBatchInferenceAPIMotivation
FileBasedCacheManagerreloads the entire bin file from disk on every cache miss — even if the bin is already in memory. With accumulated cache (e.g. 543MB across 20 bins from past runs), 10,000 concurrent lookups with 65% miss rate causes ~182GB of JSON parsing serialized on the event loop, freezing it for 8+ minutes.Benchmark (3,500 entries, 10k lookups, 65% miss rate, 28MB/bin)
Usage
Values:
CacheBackend.FILE(default, existing behavior),CacheBackend.SQLITE,CacheBackend.REDIS.Test plan
FileBasedCacheManagertests still passtest_api_cache.pyintegration tests withCacheBackend.SQLITE