Transkriptor

Best Self Hosted Alternatives to Transkriptor

A curated collection of the 5 best self hosted alternatives to Transkriptor.

Transkriptor is an AI transcription service that converts audio and video into editable text, offering speaker recognition, timestamps, in-browser editing and export options for meetings, interviews, podcasts and content creation workflows.

Alternatives List

#1
ebook2audiobook

ebook2audiobook

Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

ebook2audiobook is a tool for generating audiobooks from non-DRM, legally acquired eBooks using multiple text-to-speech (TTS) engines. It can run with a Gradio web interface or in headless/CLI mode, and supports multilingual narration with optional voice cloning.

Key Features

  • Converts many input formats including EPUB, MOBI/AZW3, FB2, PDF, DOC/DOCX, HTML, RTF, TXT, and image-based documents
  • OCR support for scanned pages and image-based eBooks
  • Multiple TTS engine options (including XTTSv2 and others) with broad language coverage
  • Optional voice cloning using a provided reference voice file
  • Supports custom XTTSv2 model uploads (e.g., zipped model artifacts)
  • Outputs common audiobook/audio formats including MP3, M4B, M4A, AAC, FLAC, OGG, WAV, and WebM
  • Runs on CPU or accelerators (CUDA and other backends depending on environment)

Use Cases

  • Converting personal eBook libraries into listenable audiobooks with chapters and metadata
  • Producing multilingual narration for accessibility, language learning, or travel
  • Creating custom-voice narration for personal use using voice cloning

Limitations and Considerations

  • Intended for non-DRM, legally acquired eBooks; DRM-protected sources require separate lawful handling
  • OCR quality and document structure (especially EPUB chapter boundaries) can affect chapter splitting and narration results

It is well-suited for users who want a local web UI and batch-capable CLI for audiobook generation, while keeping flexibility in TTS engines, languages, and output formats. With GPU acceleration and suitable TTS models, it can significantly improve throughput and audio quality for larger books.

17kstars
1.4kforks
#2
Speakr

Speakr

Speakr is a self-hosted web app for recording or uploading audio, transcribing with AI (including diarization), and turning conversations into searchable, shareable notes.

Speakr is a personal, self-hosted web application that turns audio recordings into organized, searchable notes using AI transcription and post-processing. It supports both cloud and self-hosted ASR/LLM backends and is designed for privacy-conscious individuals and teams.

Key Features

  • In-browser recording and audio file upload
  • AI transcription with optional speaker diarization and audio-transcript sync
  • Voice profiles via speaker embeddings when using a compatible WhisperX ASR service
  • Interactive chat and semantic “inquire” mode to query recordings using natural language
  • Tag-based organization with custom prompts, ASR settings, and prompt stacking
  • Sharing and collaboration with granular permissions, groups, and group-scoped tags
  • Retention policies and automatic deletion with tag-based protection
  • REST API v1 with OpenAPI/Swagger UI
  • Single Sign-On via OIDC providers

Use Cases

  • Meeting and standup transcription with searchable summaries and action items
  • Research, interviews, and personal voice notes exported into a knowledge base
  • Team knowledge capture for architecture decisions and client calls with controlled sharing

Limitations and Considerations

  • Some advanced features (voice profiles/embeddings) require a separate WhisperX ASR service and typically a GPU
  • LLM-powered summaries/chat depend on configuring a compatible text model provider

Speakr combines transcription, organization, and collaboration in a single web UI, while keeping data under your control. Its tagging, sharing, and retention features make it suitable for both personal note-taking and team workflows around recorded conversations.

2.7kstars
212forks
#3
Scriberr

Scriberr

Scriberr is a self-hosted, privacy-focused AI transcription app for audio and video, with speaker diarization, word-level timestamps, summaries, and transcript chat.

Scriberr screenshot

Scriberr is an open-source application for transcribing audio and video locally, designed to keep recordings private by avoiding third-party cloud processing. It provides a web-based interface to upload, record, review, and work with transcripts, with optional integration for LLM-powered transcript chat and summaries.

Key Features

  • Local/offline transcription using modern speech-to-text models (including Whisper and newer model options)
  • Speaker diarization to separate and label different speakers
  • Word-level timestamps and transcript playback follow-along with seeking from text
  • Built-in audio recorder plus note-taking/annotation while listening
  • Transcript summarization and “chat with your audio” (supports local LLMs via Ollama and OpenAI-compatible providers)
  • Automation-friendly features such as an API and folder watcher for auto-processing new files
  • PWA support for a more native app-like experience on desktop and mobile

Use Cases

  • Transcribe meetings, interviews, and lectures without uploading sensitive audio to external services
  • Process large batches of recordings automatically via folder watching and API-driven workflows
  • Create searchable, annotated transcripts and generate summaries for personal knowledge capture

Limitations and Considerations

  • High-accuracy transcription and diarization can be resource-intensive; GPU acceleration is recommended for best performance
  • Some advanced features (like transcript chat) may require configuring external or local LLM providers

Scriberr is a strong fit for privacy-conscious users who want reliable local transcription with a polished review experience and workflow automation options. It combines transcription, organization, and AI-assisted analysis into a single self-hostable service.

1.9kstars
133forks
#4
File Wizard

File Wizard

Self-hosted web UI for file conversion, OCR for PDFs/images, and local Whisper-based audio transcription, wrapping common CLI tools with background jobs and history.

File Wizard is a browser-based utility for converting files, running OCR on PDFs/images, and transcribing audio. It provides a simple web UI that orchestrates common command-line tools and local ML models, with job tracking and a persistent history.

Key Features

  • Convert between many document, image, audio, and video formats by wrapping external tools (configurable via a YAML settings file)
  • OCR for PDFs and images using Tesseract and OCRmyPDF, including generating searchable PDFs
  • Audio transcription using local Whisper models (faster-whisper), with subtitle-style outputs supported by Whisper tooling
  • Drag-and-drop web interface with responsive dark UI
  • Background job processing with real-time status updates and stored job history
  • Optional OAuth/OIDC-based access control configuration (can run without auth in local-only mode)
  • Optional CUDA-enabled container image for GPU-accelerated transcription

Use Cases

  • Convert office documents and ebooks into consistent archival formats (PDF, EPUB, DOCX)
  • Turn scanned PDFs into searchable documents with OCR
  • Create transcripts/subtitles from meeting recordings and other audio files

Limitations and Considerations

  • Not safe to expose publicly without strong authentication and isolation; wrapping converters can introduce arbitrary command execution risk if misconfigured
  • Conversion fidelity and supported formats depend on the installed external tools and their build options
  • Transcription performance varies significantly by model size and whether GPU acceleration is available

File Wizard fits well for homelabs and internal teams that want a single, lightweight web interface to run conversions, OCR workflows, and local speech-to-text processing. Its tool-based architecture makes it extensible, but it should be deployed with careful security controls when used beyond local environments.

777stars
42forks
#5
ZipCaptions

ZipCaptions

Open-source PWA that generates live captions and transcripts in the browser; supports broadcasts, OBS/vMix integration, and optional Azure AI captions.

ZipCaptions screenshot

ZipCaptions is a browser-native, open-source application that produces live closed-captions and transcripts from audio sources. It runs as a Progressive Web App and focuses on client-side captioning with optional cloud-backed AI captioning for higher accuracy.

Key Features

  • In-browser real-time speech-to-text captioning (browser engine) without mandatory server processing.
  • Optional cloud AI captions using Azure Cognitive Services for improved accuracy (paid feature).
  • PWA installable experience; supports persistent overlay and browser integrations for live streams and broadcasts.
  • Streaming/broadcast support with joinable caption streams and direct integration guidance for OBS, vMix, and other production tools.
  • Local transcript storage with export options (SRT, VTT, TXT) for use with video or documentation workflows.
  • Multiple languages and dialect selection in settings to improve recognition quality.

Use Cases

  • Live event accessibility: provide open or closed captions for conferences, worship services, classrooms, and streamed events.
  • Broadcast/production workflows: feed live captions into OBS, vMix, or browser-source panels for real-time on-screen titles.
  • Post-session captioning: record and export session transcripts in subtitle formats for video publishing and archiving.

Limitations and Considerations

  • Cloud AI captions require Azure Cognitive Services and are restricted to paying supporters; browser engine remains the free/default option.
  • Browser and OS differences can affect microphone access and caption reliability (known issues documented for specific Chrome versions and some mobile builds).
  • Transcripts are stored locally per device by design; syncing across devices requires manual export/import.

ZipCaptions prioritizes accessibility-first, client-side captioning with optional cloud AI for higher accuracy. It is intended for event captioning and production integration where low-cost, privacy-conscious captioning is required.

56stars
8forks

Why choose an open source alternative?

  • Data ownership: Keep your data on your own servers
  • No vendor lock-in: Freedom to switch or modify at any time
  • Cost savings: Reduce or eliminate subscription fees
  • Transparency: Audit the code and know exactly what's running