
Docspell
Personal document management system with OCR and metadata suggestions

Docspell is a personal document management system designed to collect documents from scanners, email, and file uploads, then organize them for fast retrieval. It combines OCR and assisted metadata extraction to reduce manual tagging and improve searchability.
Key Features
- Document ingestion from multiple sources, including email integration and file uploads
- OCR processing (when needed) to enable searchable text from scans
- Full-text search with filters based on tags and other metadata
- Tagging and metadata management, including custom metadata fields
- Assisted metadata suggestions (for example correspondents, tags, and dates) using NLP-based extraction
- REST/HTTP API for automation and external integrations
- Mobile-friendly single-page web application interface
Use Cases
- Digitizing and organizing household paperwork (bills, letters, contracts)
- Centralizing small team or office document archives with searchable metadata
- Automating document intake from email and scanners into a searchable repository
Limitations and Considerations
- OCR and document conversion depend on external tools (for example Tesseract and related converters) and may require additional setup
- NLP/auto-suggestion capabilities rely on Stanford CoreNLP and can increase resource usage
Docspell is well-suited for individuals and small groups who want an efficient workflow for collecting, tagging, and searching documents. Its API, OCR pipeline, and assisted metadata extraction make it a practical choice for building a lightweight document archive with minimal manual effort.
Categories:
Tags:
Tech Stack:
Similar Services

Stirling PDF
Self-hosted PDF editing, conversion, OCR, and automation platform
Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Paperless-ngx
Document management system with OCR, search, and automated filing
Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Reactive Resume
Privacy-focused, open-source resume builder
Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

CyberChef
Browser-based toolkit for data decoding, encoding and analysis
CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox
Open-source self-hosted web archiving and snapshotting tool
Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.
ebook2audiobook
Convert eBooks into audiobooks with TTS and optional voice cloning
Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.




