Willow
Self-hosted voice assistant platform for ESP32 devices with on-device wake-word and command recognition, Home Assistant integration, and an optional inference server for STT/TTS/LLM.
Willow is an open-source, privacy-focused voice assistant platform designed for low-cost ESP32-S3 hardware. It provides fast on-device wake-word and command recognition and can optionally integrate with a self-hosted inference server for high-quality speech-to-text, TTS, and LLM tasks. (heywillow.io)
Key Features
- On-device wake-word engine and voice-activity detection with configurable wake words and up to hundreds of on-device commands. (heywillow.io)
- Integration with Home Assistant, openHAB and generic REST endpoints for home automation and custom workflows. (heywillow.io)
- Willow Inference Server (WIS) option: a performance-optimized server that supports ASR/STT (Whisper models), TTS, and optional LLM inference with REST, WebRTC and WebSocket transports. WIS targets CUDA GPUs for low-latency workloads and includes deployment scripts and Docker compose support. (github.com)
- Device management and OTA flashing via the Willow Application Server (WAS) with a provided Docker image to simplify onboarding. (heywillow.io)
Use Cases
- Privacy-first smart-home voice control: local wake-word and command recognition that triggers Home Assistant automations without cloud transcription.
- On-premises speech processing: self-hosted WIS for low-latency ASR/STT and TTS for accessibility, transcription, or edge assistant applications.
- Developer integrations: embed Willow devices into custom REST/WebRTC workflows or use WIS to add LLM-powered assistants to local networks. (github.com)
Limitations and Considerations
- Advanced WIS features (LLM, high-quality TTS) expect CUDA-capable GPUs and NVIDIA drivers; CPU-only setups are supported but significantly slower and may disable some features. (github.com)
- Primary device target is the ESP32-S3-BOX family; other hardware may require additional porting or tuning. (heywillow.io)
Willow combines a small-footprint device runtime with an optional, high-performance inference server to enable private, low-latency voice assistants and on-premises speech workflows. It is actively developed with documentation, Docker deployment options, and community discussion channels for support. (heywillow.io)


