Docspell

Docspell

Personal document management system with OCR and metadata suggestions

2.2kstars
170forks
Last commit: 14d ago
Repo age: 7y old
Docspell screenshot

Docspell is a personal document management system designed to collect documents from scanners, email, and file uploads, then organize them for fast retrieval. It combines OCR and assisted metadata extraction to reduce manual tagging and improve searchability.

Key Features

  • Document ingestion from multiple sources, including email integration and file uploads
  • OCR processing (when needed) to enable searchable text from scans
  • Full-text search with filters based on tags and other metadata
  • Tagging and metadata management, including custom metadata fields
  • Assisted metadata suggestions (for example correspondents, tags, and dates) using NLP-based extraction
  • REST/HTTP API for automation and external integrations
  • Mobile-friendly single-page web application interface

Use Cases

  • Digitizing and organizing household paperwork (bills, letters, contracts)
  • Centralizing small team or office document archives with searchable metadata
  • Automating document intake from email and scanners into a searchable repository

Limitations and Considerations

  • OCR and document conversion depend on external tools (for example Tesseract and related converters) and may require additional setup
  • NLP/auto-suggestion capabilities rely on Stanford CoreNLP and can increase resource usage

Docspell is well-suited for individuals and small groups who want an efficient workflow for collecting, tagging, and searching documents. Its API, OCR pipeline, and assisted metadata extraction make it a practical choice for building a lightweight document archive with minimal manual effort.

Categories:

Tags:

Tech Stack:

Share:

Similar Services

Stirling PDF

Stirling PDF

Self-hosted PDF editing, conversion, OCR, and automation platform

74.6k
6.3k
Last commit: 7h ago

Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Alternative to:
Adobe Acrobat
Adobe Acrobat
+19
Paperless-ngx

Paperless-ngx

Document management system with OCR, search, and automated filing

36.9k
2.3k
Last commit: 1d ago

Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Alternative to:
DocuWare
DocuWare
+6
Reactive Resume

Reactive Resume

Privacy-focused, open-source resume builder

35.4k
3.9k
Last commit: 1d ago

Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

Alternative to:
Resume.io
Resume.io
+5
CyberChef

CyberChef

Browser-based toolkit for data decoding, encoding and analysis

34.1k
3.9k
Last commit: 1d ago

CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox

ArchiveBox

Open-source self-hosted web archiving and snapshotting tool

26.9k
1.5k
Last commit: 1d ago

Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Alternative to:
Internet Archive Wayback Machine
Internet Archive Wayback Machine
+3
ebook2audiobook

ebook2audiobook

Convert eBooks into audiobooks with TTS and optional voice cloning

18.3k
1.5k
Last commit: 5d ago

Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

Alternative to:
Speechify
Speechify
+7