Skip to content

Add SQLite cache backend #157

Open
JacksonKaunismaa wants to merge 1 commit into
mainfrom
sqlite-cache-backend
Open

Add SQLite cache backend #157
JacksonKaunismaa wants to merge 1 commit into
mainfrom
sqlite-cache-backend

Conversation

@JacksonKaunismaa

@JacksonKaunismaa JacksonKaunismaa commented Feb 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds SQLiteCacheManager as a new cache backend, selectable via CacheBackend.SQLITE enum
  • Per-model .sqlite files with WAL mode, connection pooling, zstd compression, schema versioning
  • Fixes pre-existing pyright errors in cache_manager.py (nullable responses, redis type annotations)
  • Replaces use_redis: bool with cache_backend: CacheBackend enum on InferenceAPI and BatchInferenceAPI

Motivation

FileBasedCacheManager reloads the entire bin file from disk on every cache miss — even if the bin is already in memory. With accumulated cache (e.g. 543MB across 20 bins from past runs), 10,000 concurrent lookups with 65% miss rate causes ~182GB of JSON parsing serialized on the event loop, freezing it for 8+ minutes.

Benchmark (3,500 entries, 10k lookups, 65% miss rate, 28MB/bin)

File-based SQLite
10,000 lookups 484.9s 2.5s
Event loop blocked 8 min (frozen) 2.5s
Throughput 21 lookups/s 3,985 lookups/s
Cache on disk 559 MB 3 MB
Populate 3,500 entries 341s 2.1s
Speedup 193x

Usage

from safetytooling.apis import InferenceAPI, CacheBackend

api = InferenceAPI(cache_backend=CacheBackend.SQLITE)

Values: CacheBackend.FILE (default, existing behavior), CacheBackend.SQLITE, CacheBackend.REDIS.

Test plan

  • 21 new tests covering save/load, batch, compression, schema versioning, stats, moderation, embeddings, WAL mode
  • Existing 7 FileBasedCacheManager tests still pass
  • Stress-tested with realistic entry sizes (28MB/bin, 10k concurrent lookups)
  • Run existing test_api_cache.py integration tests with CacheBackend.SQLITE

@JacksonKaunismaa JacksonKaunismaa force-pushed the sqlite-cache-backend branch 6 times, most recently from 372c765 to 9de867e Compare March 1, 2026 01:15
Replace JSON bin-file approach with per-model SQLite databases.
Activated via SQLITE_CACHE=true env var or use_sqlite=True in get_cache_manager().

Key improvements:
- O(1) lookup by primary key (no loading entire 28MB bin files)
- WAL mode for concurrent readers without blocking
- Connection pooling (reuse across calls)
- zstd compression (~559MB JSON → 3MB SQLite)
- Schema versioning (stale entries = clean cache miss)
- Batch lookups via SQL IN clause
- Built-in hit/miss/cost statistics

Benchmark (3500 entries, 10k lookups with 65% miss rate, 28MB/bin):
  File-based: 484.9s (event loop frozen 8 min)
  SQLite:       2.5s (193x faster)

The pathology: FileBasedCacheManager reloads the ENTIRE bin from disk
on every cache miss (to check if another process wrote the entry).
With 6500 misses × 28MB bins = 182GB of JSON parsing serialized on
the event loop. SQLite misses are a single B-tree lookup returning NULL.

Also fixes pre-existing pyright errors in cache_manager.py (nullable
responses field on LLMCache, redis type annotations).
@JacksonKaunismaa JacksonKaunismaa changed the title Add SQLite cache backend (193x faster at scale) Add SQLite cache backend Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant