Talk. It types. Scribe is a speech-to-text CLI and tray app that pipes transcribed text straight into the focused window. It supports local and cloud-based APIs, batch and streaming workflows.
- Records from your mic and transcribes via one of five backends — Vosk (local, streaming), Whisper (local, batch), Whisper FUTO (local, batch — ACFT-tuned for short dictations), OpenAI (cloud, batch or streaming), Groq (cloud, batch).
- Delivers the transcript four ways: paste into the focused window (default), copy to clipboard, print to the terminal, or write to a file.
- Runs as a system tray icon with a single Record button, or as an interactive terminal TUI — same menu in both.
- Hooks into your DE's keyboard shortcuts via
SIGUSR1(toggle recording) andSIGUSR2(cancel). - Cross-platform: tested on Ubuntu (X11 and Wayland), macOS, Windows; works under Termux for clipboard / terminal output.
sudo apt-get install portaudio19-dev xclip # Ubuntu; macOS: brew install portaudio
pip install scribe-cli[all]
export GROQ_API_KEY=YOURAPIKEY # or OPENAI_API_KEY, or skip and run localSee documentation below for setting up keyboard input on Ubuntu Wayland.
In a terminal:
scribeThis launches the system tray icon. Press Record, speak, press Stop —
the transcription lands in the focused window. Scribe picks the first
backend whose key / dependency is present, in order groq →
openai → whisper-futo → whisper → vosk, so with GROQ_API_KEY
set the command above is equivalent to:
scribe --backend groq --model whisper-large-v3-turboYou can override the defaults or drop the tray entirely:
scribe --backend openai --model gpt-4o-mini-transcribe # OpenAI sweet spot
scribe --backend openai --model gpt-realtime-whisper # OpenAI streaming
scribe --backend whisper --model small # local, no API key
scribe --frontend terminal # interactive TUI menu
scribe --record # start recording immediately on launch (works in tray or terminal)
scribe --record --frontend terminal --mode file # one-shot batched dictation → file
scribe --record --frontend terminal --mode file --stream # streamed: chunks appended live as you speak
scribe --mode clipboard # copy to clipboard, no keystroke
scribe --mode terminal # only print to stdout
scribe --mode file -o transcript.txt # append to a file (no keystroke / clipboard)With --no-interactive (terminal frontend only), scribe skips the
interactive menu and starts recording right away — handy for scripted,
one-shot transcriptions.
Bias the recogniser toward names, jargon, or a domain glossary with
--prompt "free text hint" and --words word1 word2 ... (each also
accepts a --prompt-file / --words-file companion). See
docs/backends.md › Vocabulary biasing
for what each backend does with them.
| Backend | --backend |
Default model | Streaming model(s) | Requires |
|---|---|---|---|---|
| Groq (cloud) | groq |
whisper-large-v3-turbo |
— | GROQ_API_KEY |
| OpenAI (cloud) | openai |
gpt-4o-mini-transcribe |
gpt-realtime-whisper |
OPENAI_API_KEY |
| Whisper FUTO (local) | whisper-futo |
small |
— | pip install scribe-cli[whisper-futo] |
| Whisper (local) | whisper |
small |
— | pip install scribe-cli[whisper] |
| Vosk (local) | vosk |
language-dependent | all Vosk models | pip install scribe-cli[vosk] |
Whether a transcription appears live as you speak or all at once when you stop depends on the model picked — see docs/backends.md.
Groq is the recommended cloud backend by default — extremely fast
(by a wide margin compared to other cloud STT options, especially in
Stream mode where the per-chunk roundtrip latency dominates the
perceived speed), quite accurate, and the free tier is generous
enough for everyday dictation. Sign up at
console.groq.com, create an API key
under Settings → API Keys, and export it as GROQ_API_KEY.
I personally use OpenAI with gpt-4o-mini-transcribe as it is also fast and perhaps more accurate for my accent-tainted English.
- Installation & dependencies — PortAudio, extras, Ubuntu / GNOME tray libs.
- Backends in detail — model lists, when to pick which, the realtime model.
- Output modes & typer backends — keystroke vs
clipboard, Wayland /
eitype,--type-direct. - System tray & global hotkeys — menu tree, icon
states,
SIGUSR1/SIGUSR2. - Desktop entry & autostart (
scribe-install) — GNOME / KDE launcher integration. - Fine tuning & CLI reference — every
scribe --helpflag with examples.
- bard — TTS sibling of scribe,
same tray/CLI architecture in reverse: highlight text, hear it
spoken. Shares the
desktop-ai-corebackbone (frontends, providers, dialog helpers).
Initially developed for Python 3 on Ubuntu 24.04 (GNOME + Wayland);
works on macOS and Windows too. Wayland keystroke injection is
convoluted but solved. For dependencies of
individual subsystems, check pynput (keyboard) and pystray (tray
icon).
