Paperless-AI
AI extension for Paperless‑ngx providing automated analysis and RAG

Paperless-AI is an AI-powered extension for Paperless‑ngx that automates document classification, metadata extraction and semantic search. It integrates with OpenAI-compatible APIs and local model backends to provide chat-style Q&A over a Paperless‑ngx archive.
Key Features
- Automated document processing: detects new documents in Paperless‑ngx and extracts title, tags, document type, and correspondent.
- Retrieval-Augmented Generation (RAG) chat: semantic search and contextual Q&A across the full document archive.
- Multi-backend model support: works with OpenAI-compatible APIs, Ollama (local models), DeepSeek-r1, Azure and several other OpenAI-format backends.
- Manual review UI: web interface to manually trigger AI processing, review results, and adjust settings.
- Smart tagging and rule engine: configurable rules to control which documents are processed and what tags are applied.
- Docker-first distribution: official Docker image and docker-compose support for containerized deployment and persistent storage.
Use Cases
- Quickly find facts across scanned bills, contracts and receipts via natural-language Q&A instead of manual search.
- Automatically tag and classify incoming documents to reduce manual filing and speed up archival workflows.
- Create structured metadata from free-text documents for downstream automation or reporting.
Limitations and Considerations
- Quality and consistency of automatic tags and correspondents varies by model and prompt; some users report noisy or incorrect tags that require manual cleanup.
- Resource behavior with local model backends (e.g., Ollama) can be heavy; users have reported long-running sessions or elevated GPU/CPU usage depending on model choice and volume.
- Processing can halt on model/API errors (for example, context-length or API failures); robust retry/monitoring may be required in large archives.
- Requires a running Paperless‑ngx instance and appropriate API credentials and model/back-end configuration to operate.
Paperless-AI provides an accessible way to add AI-driven classification and semantic search to a Paperless‑ngx archive, with flexible backend choices and a modern web UI. It is best suited for users who want automated tagging and conversational access to large document collections but should be configured and monitored to manage resource use and tag quality.
Categories:
Tags:
Tech Stack:
Similar Services

Stirling PDF
Self-hosted PDF editing, conversion, OCR, and automation platform
Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Paperless-ngx
Document management system with OCR, search, and automated filing
Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Reactive Resume
Privacy-focused, open-source resume builder
Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

CyberChef
Browser-based toolkit for data decoding, encoding and analysis
CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox
Open-source self-hosted web archiving and snapshotting tool
Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.
ebook2audiobook
Convert eBooks into audiobooks with TTS and optional voice cloning
Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

Docker
Python
npm
Node.js