Aleph

Aleph

Document and data indexing, entity search, and investigative analysis

2.3kstars
326forks
Last commit: 29d ago
Repo age: 12y old
Aleph screenshot

Aleph is an investigative data platform for ingesting and indexing large collections of documents and structured datasets, making them searchable and easier to analyze. It is designed to help researchers find people, companies, and connections across many sources, including watchlists and prior research.

Key Features

  • Ingests and indexes documents (such as PDF, Word, and HTML) and structured data (such as CSV and spreadsheets)
  • Full-text search and browsing across datasets and uploaded materials
  • Entity-centric exploration focused on people, companies, and other known entities
  • Cross-referencing and matching entities against watchlists and reference datasets
  • Supports operational workflows for managing data imports and collections

Use Cases

  • Investigative journalism: search leaks, filings, and datasets for names and relationships
  • OSINT research: unify and query diverse sources (documents plus tabular data)
  • Compliance or due diligence research: check entities against internal or external lists

Limitations and Considerations

  • The open-source version is in a sunsetting phase, with official maintenance planned to end after December 2025

Aleph is well-suited for teams that need to turn large, heterogeneous collections of files and tables into a searchable investigative corpus. Its emphasis on entity discovery and cross-referencing makes it particularly useful for research-driven analysis workflows.

Categories:

Tags:

Tech Stack:

Share:

Similar Services

Meilisearch

Meilisearch

Fast search engine API with full-text, vector, and hybrid search

55.4k
2.3k
Last commit: 2d ago

Meilisearch is a lightning-fast search engine API for apps and websites, offering typo-tolerant full-text search plus vector and AI-ready hybrid retrieval.

Alternative to:
Algolia
Algolia
+16
ArchiveBox

ArchiveBox

Open-source self-hosted web archiving and snapshotting tool

26.4k
1.4k
Last commit: 11d ago

Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Alternative to:
Internet Archive Wayback Machine
Internet Archive Wayback Machine
+3
Typesense

Typesense

Fast, typo-tolerant search engine with keyword and vector search

25k
850
Last commit: 2d ago

Typesense is a developer-friendly search engine for instant, typo-tolerant search-as-you-type with faceting, filtering, geo search, and vector/semantic search APIs.

Alternative to:
Algolia
Algolia
+19
SearXNG

SearXNG

Privacy-focused metasearch engine for aggregating web results

24.2k
2.4k
Last commit: 22h ago

SearXNG is a privacy-respecting metasearch engine that aggregates results from many search services without tracking or profiling users.

Alternative to:
Google Search
Google Search
+6
ZincSearch

ZincSearch

A lightweight open-source search engine for full-text indexing.

17.7k
762
Last commit: 1mo ago

ZincSearch is a Go-based, lightweight search engine for full-text indexing with Elasticsearch API-compatible ingestion, a Vue UI, and a schema-less document model.

Alternative to:
Elastic Cloud (Elasticsearch Service)
Elastic Cloud (Elasticsearch Service)
+7
Onyx Community Edition

Onyx Community Edition

Self-hosted AI chat and enterprise search for any LLM

17.1k
2.3k
Last commit: 16h ago

Open-source platform for AI chat, RAG, agents, and enterprise search across your team’s connected knowledge sources, compatible with hosted and local LLMs.

Alternative to:
Onyx
Onyx
+19