Skip to content

Voice input (speech-to-text)

The desktop app has a mic button in the composer. Click it to record from your default input device (it turns red while listening), click again to stop — the audio is encoded to WAV locally and transcribed via an OpenAI-compatible /audio/transcriptions endpoint, then dropped into the input box for you to edit and send.

Capture is done natively in the Rust backend with cpal (WebKitGTK, the Linux webview, doesn't support the browser Speech APIs), so it needs the ALSA dev library at build time — see Installation.

Choosing a model

Configure the provider and model under Settings → Voice input. It defaults to the openai provider with gpt-4o-transcribe (OpenAI's current best transcription model — lower word-error-rate than whisper-1 at the same price).

ProviderExample modelsNotes
openaigpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1whisper-1 is the only Whisper id OpenAI hosts (Whisper V2)
openrouteropenai/gpt-4o-transcribe, openai/whisper-1, google/chirp-3Uses OpenRouter's JSON audio API; needs the fully-qualified slug
localwhisper-large-v3, whisper-large-v3-turboPoint a provider at a whisper.cpp / faster-whisper server for fully offline transcription

TIP

whisper-large-v3 is not a hosted OpenAI model id — it's open-source weights you self-host. Run it via a local server and use the local provider.

How the request is routed

la-core picks the request shape automatically from the provider's base URL:

  • openrouter.ai → JSON body with base64-encoded audio (input_audio.data).
  • everything else (OpenAI, whisper.cpp, faster-whisper, Groq) → a multipart/form-data upload with a file field.

Both return { "text": ... }.

Released under the MIT License.