Speechify

Best Self-hosted Alternatives to Speechify

A curated collection of the 3 best self hosted alternatives to Speechify.

Speechify is a cloud text-to-speech platform that converts text, PDFs, web pages, documents, and email into natural-sounding audio using AI voices. It provides browser and mobile clients, configurable reading speeds and voice selection and accessibility features for listening and learning.

Alternatives List

#1
ebook2audiobook

ebook2audiobook

Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

ebook2audiobook is a tool for generating audiobooks from non-DRM, legally acquired eBooks using multiple text-to-speech (TTS) engines. It can run with a Gradio web interface or in headless/CLI mode, and supports multilingual narration with optional voice cloning.

Key Features

  • Converts many input formats including EPUB, MOBI/AZW3, FB2, PDF, DOC/DOCX, HTML, RTF, TXT, and image-based documents
  • OCR support for scanned pages and image-based eBooks
  • Multiple TTS engine options (including XTTSv2 and others) with broad language coverage
  • Optional voice cloning using a provided reference voice file
  • Supports custom XTTSv2 model uploads (e.g., zipped model artifacts)
  • Outputs common audiobook/audio formats including MP3, M4B, M4A, AAC, FLAC, OGG, WAV, and WebM
  • Runs on CPU or accelerators (CUDA and other backends depending on environment)

Use Cases

  • Converting personal eBook libraries into listenable audiobooks with chapters and metadata
  • Producing multilingual narration for accessibility, language learning, or travel
  • Creating custom-voice narration for personal use using voice cloning

Limitations and Considerations

  • Intended for non-DRM, legally acquired eBooks; DRM-protected sources require separate lawful handling
  • OCR quality and document structure (especially EPUB chapter boundaries) can affect chapter splitting and narration results

It is well-suited for users who want a local web UI and batch-capable CLI for audiobook generation, while keeping flexibility in TTS engines, languages, and output formats. With GPU acceleration and suitable TTS models, it can significantly improve throughput and audio quality for larger books.

18.3kstars
1.5kforks
#2
Speakr

Speakr

Speakr is a self-hosted web app for recording or uploading audio, transcribing with AI (including diarization), and turning conversations into searchable, shareable notes.

Speakr is a personal, self-hosted web application that turns audio recordings into organized, searchable notes using AI transcription and post-processing. It supports both cloud and self-hosted ASR/LLM backends and is designed for privacy-conscious individuals and teams.

Key Features

  • In-browser recording and audio file upload
  • AI transcription with optional speaker diarization and audio-transcript sync
  • Voice profiles via speaker embeddings when using a compatible WhisperX ASR service
  • Interactive chat and semantic “inquire” mode to query recordings using natural language
  • Tag-based organization with custom prompts, ASR settings, and prompt stacking
  • Sharing and collaboration with granular permissions, groups, and group-scoped tags
  • Retention policies and automatic deletion with tag-based protection
  • REST API v1 with OpenAPI/Swagger UI
  • Single Sign-On via OIDC providers

Use Cases

  • Meeting and standup transcription with searchable summaries and action items
  • Research, interviews, and personal voice notes exported into a knowledge base
  • Team knowledge capture for architecture decisions and client calls with controlled sharing

Limitations and Considerations

  • Some advanced features (voice profiles/embeddings) require a separate WhisperX ASR service and typically a GPU
  • LLM-powered summaries/chat depend on configuring a compatible text model provider

Speakr combines transcription, organization, and collaboration in a single web UI, while keeping data under your control. Its tagging, sharing, and retention features make it suitable for both personal note-taking and team workflows around recorded conversations.

2.8kstars
220forks
#3
OpenReader WebUI

OpenReader WebUI

Next.js web app that reads EPUB, PDF, DOCX, MD and TXT using pluggable TTS providers, offering real-time read-along highlighting, word timestamps, and audiobook export.

OpenReader WebUI screenshot

OpenReader WebUI is a web application that converts documents into spoken audio using pluggable text-to-speech providers. It supports EPUB, PDF, DOCX, Markdown and plain text files and provides a read-along experience with configurable narration and export options.

Key Features

  • Supports EPUB, PDF, DOCX, MD and TXT document formats with in-page read-along highlighting
  • Multi-provider TTS support (OpenAI-compatible endpoints, Deepinfra, Kokoro/Orpheus FastAPI and other OpenAI-style APIs)
  • Word-by-word timestamps (optional) produced server-side for precise highlighting
  • Smart sentence-aware narration to merge sentences across pages/chapters for smoother playback
  • Audiobook export to m4b/mp3 with resumable, chapter-based generation and audio caching
  • Local-first storage using Dexie/IndexedDB with optional server-side /docstore for shared documents
  • Optimized Next.js TTS proxy that requests audio server-side and caches audio for repeat playback
  • Theming and UI customization options with Tailwind-based interface

Use Cases

  • Listen to ebooks and documents hands-free with synchronized read-along highlighting
  • Produce downloadable audiobooks from personal document collections with chapter structure
  • Integrate local or cloud TTS providers for accessible reading workflows and study aids

Limitations and Considerations

  • Requires an accessible TTS API provider or compatible OpenAI-style endpoint; quality and latency depend on the chosen provider
  • Word-level highlighting is optional and requires a separate whisper.cpp binary for timestamp generation
  • DOCX conversion and some exports rely on external tooling (LibreOffice for DOCX, FFmpeg for m4b creation)
  • Performance and parallel processing depend on available server hardware and TTS provider throughput

OpenReader WebUI is focused on flexible, high-quality TTS for documents with strong local-first behavior and configurable provider support. It is best suited for users who can provide or run a compatible TTS API and who need precise read-along and audiobook export features.

279stars
42forks

Why choose an open source alternative?

  • Data ownership: Keep your data on your own servers
  • No vendor lock-in: Freedom to switch or modify at any time
  • Cost savings: Reduce or eliminate subscription fees
  • Transparency: Audit the code and know exactly what's running