Papermerge

Papermerge

Open-source document management system for scanned documents

2.9kstars
303forks
Last commit: 3mo ago
Repo age: 6y old
Papermerge screenshot

Papermerge is a web-based document management system focused on scanned documents and digital archives. It extracts text via OCR, indexes documents for full-text search, and provides a desktop-like web UI for organizing and managing document collections.

Key Features

  • OCR processing of scanned PDFs and images (uses open-source OCR tooling to extract searchable text).
  • Full-text search with support for multiple search backends and indexing options.
  • OpenAPI-compliant REST API for automation and integrations.
  • Document versioning so original and processed versions (for example OCRed versions) are retained.
  • Categories, tags and user-defined custom fields (metadata) per document type for structured organization.
  • Page management: reorder, rotate, cut, move or extract individual pages within documents.
  • Multi-user access, group ownership and share controls for documents and folders.
  • Modern, responsive frontend with dual-panel browsing, drag-and-drop and internationalization.

Use Cases

  • Long-term archival of scanned documents for small-to-medium organizations and personal archives.
  • Processing receipts, invoices and administrative paperwork with metadata and searchable OCR text.
  • Managing contract and record versioning with searchable history and page-level edits.

Limitations and Considerations

  • Robust full-text search typically requires deploying an external search backend (e.g., Elasticsearch, Solr, Xapian) for large archives; bundled minimal setups may omit advanced search.
  • OCR and indexing are resource-intensive at scale and commonly run in background workers; production deployments should provision worker processes and sufficient CPU/RAM.
  • The public demo instance is intentionally limited (for example, OCR and full-text search may be disabled) and is reset periodically, so it is useful only for exploring the UI and basic flows.

Papermerge is a focused solution for turning scanned documents into searchable, organized archives with metadata and version control. It exposes a programmable API and can be integrated into automated ingestion pipelines for document-centric workflows.

Categories:

Tags:

Tech Stack:

Share:

Similar Services

Stirling PDF

Stirling PDF

Self-hosted PDF editing, conversion, OCR, and automation platform

74.6k
6.3k
Last commit: 7h ago

Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Alternative to:
Adobe Acrobat
Adobe Acrobat
+19
Paperless-ngx

Paperless-ngx

Document management system with OCR, search, and automated filing

36.9k
2.3k
Last commit: 1d ago

Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Alternative to:
DocuWare
DocuWare
+6
Reactive Resume

Reactive Resume

Privacy-focused, open-source resume builder

35.4k
3.9k
Last commit: 1d ago

Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

Alternative to:
Resume.io
Resume.io
+5
CyberChef

CyberChef

Browser-based toolkit for data decoding, encoding and analysis

34.1k
3.9k
Last commit: 1d ago

CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox

ArchiveBox

Open-source self-hosted web archiving and snapshotting tool

26.9k
1.5k
Last commit: 1d ago

Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Alternative to:
Internet Archive Wayback Machine
Internet Archive Wayback Machine
+3
ebook2audiobook

ebook2audiobook

Convert eBooks into audiobooks with TTS and optional voice cloning

18.3k
1.5k
Last commit: 5d ago

Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

Alternative to:
Speechify
Speechify
+7