Speaches is an OpenAI API-compatible server for speech-to-text, translation, and text-to-speech, designed to be a local “model server” for voice workflows. It supports streaming and realtime interactions so applications can transcribe or generate audio with minimal integration changes.

Key Features

OpenAI API compatibility for integrating with existing OpenAI SDKs and tools
Streaming transcription via Server-Sent Events (SSE) for incremental results
Speech-to-text powered by faster-whisper, with support for transcription and translation
Text-to-speech using Piper and Kokoro models
Realtime API support for low-latency voice interactions
Dynamic model loading and offloading based on request parameters and inactivity
CPU and GPU execution support
Deployable with Docker and Docker Compose and designed to be highly configurable

Use Cases

Replace hosted speech APIs with a self-managed, OpenAI-compatible voice backend
Build realtime voice assistants that need streaming STT and fast TTS responses
Batch transcription/translation pipelines for recordings with optional sentiment analysis

Speaches is a practical choice when you want OpenAI-style endpoints for voice features while retaining control over models and infrastructure. It fits well into existing OpenAI-oriented application stacks while focusing specifically on TTS/STT workloads.

Speaches

Key Features

Use Cases

Categories:

Tags:

Tech Stack:

Similar Services

Ollama

LocalAI

Jina

Willow

Unblink

withoutBG