
Papermerge
Open-source document management system for scanned documents

Papermerge is a web-based document management system focused on scanned documents and digital archives. It extracts text via OCR, indexes documents for full-text search, and provides a desktop-like web UI for organizing and managing document collections.
Key Features
- OCR processing of scanned PDFs and images (uses open-source OCR tooling to extract searchable text).
- Full-text search with support for multiple search backends and indexing options.
- OpenAPI-compliant REST API for automation and integrations.
- Document versioning so original and processed versions (for example OCRed versions) are retained.
- Categories, tags and user-defined custom fields (metadata) per document type for structured organization.
- Page management: reorder, rotate, cut, move or extract individual pages within documents.
- Multi-user access, group ownership and share controls for documents and folders.
- Modern, responsive frontend with dual-panel browsing, drag-and-drop and internationalization.
Use Cases
- Long-term archival of scanned documents for small-to-medium organizations and personal archives.
- Processing receipts, invoices and administrative paperwork with metadata and searchable OCR text.
- Managing contract and record versioning with searchable history and page-level edits.
Limitations and Considerations
- Robust full-text search typically requires deploying an external search backend (e.g., Elasticsearch, Solr, Xapian) for large archives; bundled minimal setups may omit advanced search.
- OCR and indexing are resource-intensive at scale and commonly run in background workers; production deployments should provision worker processes and sufficient CPU/RAM.
- The public demo instance is intentionally limited (for example, OCR and full-text search may be disabled) and is reset periodically, so it is useful only for exploring the UI and basic flows.
Papermerge is a focused solution for turning scanned documents into searchable, organized archives with metadata and version control. It exposes a programmable API and can be integrated into automated ingestion pipelines for document-centric workflows.
Categories:
Tags:
Tech Stack:
Similar Services

Stirling PDF
Self-hosted PDF editing, conversion, OCR, and automation platform
Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Paperless-ngx
Document management system with OCR, search, and automated filing
Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Reactive Resume
Privacy-focused, open-source resume builder
Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

CyberChef
Browser-based toolkit for data decoding, encoding and analysis
CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox
Open-source self-hosted web archiving and snapshotting tool
Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.
ebook2audiobook
Convert eBooks into audiobooks with TTS and optional voice cloning
Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.






