Aleph
Document and data indexing, entity search, and investigative analysis

Aleph is an investigative data platform for ingesting and indexing large collections of documents and structured datasets, making them searchable and easier to analyze. It is designed to help researchers find people, companies, and connections across many sources, including watchlists and prior research.
Key Features
- Ingests and indexes documents (such as PDF, Word, and HTML) and structured data (such as CSV and spreadsheets)
- Full-text search and browsing across datasets and uploaded materials
- Entity-centric exploration focused on people, companies, and other known entities
- Cross-referencing and matching entities against watchlists and reference datasets
- Supports operational workflows for managing data imports and collections
Use Cases
- Investigative journalism: search leaks, filings, and datasets for names and relationships
- OSINT research: unify and query diverse sources (documents plus tabular data)
- Compliance or due diligence research: check entities against internal or external lists
Limitations and Considerations
- The open-source version is in a sunsetting phase, with official maintenance planned to end after December 2025
Aleph is well-suited for teams that need to turn large, heterogeneous collections of files and tables into a searchable investigative corpus. Its emphasis on entity discovery and cross-referencing makes it particularly useful for research-driven analysis workflows.
Categories:
Tags:
Tech Stack:
Similar Services

Meilisearch
Fast search engine API with full-text, vector, and hybrid search
Meilisearch is a lightning-fast search engine API for apps and websites, offering typo-tolerant full-text search plus vector and AI-ready hybrid retrieval.

ArchiveBox
Open-source self-hosted web archiving and snapshotting tool
Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Typesense
Fast, typo-tolerant search engine with keyword and vector search
Typesense is a developer-friendly search engine for instant, typo-tolerant search-as-you-type with faceting, filtering, geo search, and vector/semantic search APIs.

SearXNG
Privacy-focused metasearch engine for aggregating web results
SearXNG is a privacy-respecting metasearch engine that aggregates results from many search services without tracking or profiling users.
ZincSearch
A lightweight open-source search engine for full-text indexing.
ZincSearch is a Go-based, lightweight search engine for full-text indexing with Elasticsearch API-compatible ingestion, a Vue UI, and a schema-less document model.
Onyx Community Edition
Self-hosted AI chat and enterprise search for any LLM
Open-source platform for AI chat, RAG, agents, and enterprise search across your team’s connected knowledge sources, compatible with hosted and local LLMs.
JavaScript
Docker
TypeScript
Python
Node.js