All notable changes to Shimmy will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.1.0 - 2025-09-02
- Initial release of Shimmy - The 5MB alternative to Ollama
- Core inference engine with llama.cpp backend integration
- Full OpenAI API compatibility:
POST /v1/chat/completions- OpenAI-compatible chat endpointGET /v1/models- List available models
- Native Shimmy API:
POST /api/generate- JSON generation with optional SSE streamingGET /ws/generate- WebSocket streaming generationGET /health- Health check endpointGET /api/models- Native model listing
- CLI commands:
shimmy serve- Start the inference servershimmy list- List available modelsshimmy discover- Discover models in filesystemshimmy generate- Command-line text generationshimmy probe- Test model loading
- Model format support:
- GGUF models via llama.cpp integration
- SafeTensors detection and guidance
- Auto-discovery from filesystem
- Template system:
- ChatML template support
- Llama3 template support
- OpenChat template support
- Cross-platform support:
- Linux (x86_64, ARM64)
- Windows (x86_64)
- macOS (x86_64, ARM64)
- Performance optimizations:
- 5.1MB single binary size
- <100ms startup time
- <50MB memory overhead
- Release build with LTO and size optimization
- Integration guides:
- VSCode Copilot configuration
- Continue.dev setup
- Cursor IDE integration
- Generic OpenAI API client configuration
- Package distribution:
- GitHub Releases (direct binary downloads)
- crates.io (Rust package manager)
- npm (Node.js wrapper package)
- Docker Hub (container images)
- PyPI (Python wrapper package)
- Development infrastructure:
- Comprehensive test suite (27 unit tests + 4 integration tests)
- GitHub Actions CI/CD pipeline
- Cross-platform build automation
- Multi-package-manager release automation
- Documentation:
- Complete API documentation
- Quick start guide (30-second setup)
- Integration examples
- Performance benchmarks
- Architecture documentation
- Language: Rust 2021 edition
- Dependencies: tokio, axum, llama-cpp-2, serde, clap
- Features: Optional
llamafeature for actual inference - License: MIT (free forever)
- Minimum supported Rust version: 1.70+
- Binary size: 5.1MB (vs Ollama's 680MB)
- Startup time: <100ms (vs Ollama's 5-10s)
- Memory usage: <50MB baseline (vs Ollama's 200MB+)
- API compatibility: 100% OpenAI compatibility (vs Ollama's partial)
Shimmy is committed to being free forever with no asterisks, no "free for now" periods, and no pivot to paid services. The MIT license ensures this commitment is legally binding.