What is the best free alternative to Glean?

We have 13 open source alternatives to Glean that you can self-host for free.

Can I self-host an alternative to Glean?

Yes! All 13 alternatives listed here can be self-hosted on your own servers, giving you full control over your data and privacy.

Are these Glean alternatives really free?

Yes, all alternatives are open source and free to use. Some may offer paid hosting or premium features, but the core software is always free.

Best Self-hosted Alternatives to Glean

A curated collection of the 13 best self hosted alternatives to Glean.

Permissions-aware enterprise AI search and knowledge discovery platform that indexes data across workplace apps (Google Workspace, Microsoft 365, Slack, etc.) to surface documents, answers and people; supports conversational queries, relevance ranking and integrations for enterprise search workflows

Meilisearch

Meilisearch is a lightning-fast search engine API for apps and websites, offering typo-tolerant full-text search plus vector and AI-ready hybrid retrieval.

Meilisearch is an open source search engine exposed through an API, designed to provide fast, relevant search experiences for websites and applications. It combines traditional full-text search with optional vector-based semantic retrieval to support hybrid search and AI retrieval workflows.

Key Features

REST API for indexing documents and running searches
Search-as-you-type with low-latency results
Typo tolerance and configurable ranking/relevancy tuning
Filtering, faceting, and sorting for building rich search UIs
Geosearch for location-based filtering and ranking
Vector storage and vector search for semantic retrieval and hybrid search
API key-based access control, including tenant tokens for multi-tenancy

Use Cases

Site and application search with instant results and typo tolerance
E-commerce and catalog search with facets, filters, and sorting
AI retrieval and RAG pipelines using hybrid (full-text + vector) search

Limitations and Considerations

Some advanced capabilities (for example sharding and certain snapshot features) are reserved for the Enterprise Edition under a non-open-source license
Telemetry is enabled by default but can be disabled

Meilisearch is well-suited for teams that want a developer-friendly search API that is easy to integrate, performs well out of the box, and can evolve from classic keyword search to modern hybrid AI retrieval as needs grow.

56.1kstars

2.4kforks

View Details

AnythingLLM

AnythingLLM is an all-in-one desktop and Docker app for chatting with documents using RAG, running AI agents, and connecting to local or hosted LLMs and vector databases.

AnythingLLM is a full-stack AI application for building a private ChatGPT-like experience around your own documents and content. It supports local and hosted LLMs, integrates with multiple vector database backends, and organizes content into isolated workspaces for cleaner context management.

Key Features

Retrieval-augmented generation (RAG) to chat with PDFs, DOCX, TXT, CSV, codebases, and more
Workspace-based organization with separated context and optional document sharing
AI agents, including a no-code agent builder and MCP compatibility
Supports local and commercial LLM providers (including Ollama and llama.cpp-compatible models)
Multiple vector database options (default local-first setup, with external backends available)
Multi-user deployment with permissions (Docker deployment)
Embeddable website chat widget (Docker deployment)
Developer API for integrations and automation

Use Cases

Internal knowledge base chat for teams (policies, runbooks, product docs)
Private document Q&A for sensitive datasets and client files
Building agent-assisted workflows that reference curated business content

AnythingLLM is a strong choice when you want a configurable, privacy-conscious AI application that can run locally or on a server, while staying flexible about which LLM and vector database you use.

55kstars

5.9kforks

View Details

Khoj

Self-hostable personal AI 'second brain' for chat, semantic search, custom agents, automations and integration with local or cloud LLMs.

Khoj is an open-source personal AI platform that combines chat, semantic document search, custom agents and scheduled automations. It can run locally or as a cloud-hosted service and integrates with local or remote LLMs to answer questions, generate content and automate research.

Key Features

Multi-client access: web, desktop, Obsidian, Emacs, mobile (PWA) and chat integrations (e.g., WhatsApp).
Model-agnostic LLM support: connect local GGUF models or remote OpenAI-compatible, Anthropic and Google-compatible endpoints; supports on-device and cloud models.
Semantic search and embeddings: document ingestion (PDF, Markdown, Word, org-mode, Notion, images) with vector storage and retrieval for fast, contextual search.
Custom agents and automations: build agents with distinct personas, tools and knowledge bases; schedule research tasks and email newsletters.
Document processing and code tools: built-in extractors, simple code execution sandbox support (local Terrarium or remote sandboxes) and image generation features.
Enterprise & self-hosting options: deploy via Docker or pip, use Postgres with pgvector for embeddings, and configure authentication and domains.

Use Cases

Personal knowledge management: query a private document corpus and get grounded answers across notes, PDFs and files.
Research automation: schedule recurring research queries and receive summarized results by email.
Team/private deployments: host a private assistant for a team with custom agents, model selection and on-premise data control.

Limitations and Considerations

Some optional integrations require extra setup or external services (e.g., code sandboxes, email providers); self-hosting needs correct environment configuration.
A few plugins/integrations may be unmaintained or platform-specific; users should check the chosen connectors and follow the docs for compatibility and maintenance status.

Khoj is designed to be extensible and model-agnostic, emphasizing private data control and flexible deployment. It is suited for individuals and teams who need a searchable, automatable assistant that can run with local or cloud language models.

32.6kstars

2kforks

View Details

Typesense

Typesense is a developer-friendly search engine for instant, typo-tolerant search-as-you-type with faceting, filtering, geo search, and vector/semantic search APIs.

Typesense is an open source search engine designed for low-latency, “search-as-you-type” experiences. It focuses on developer-friendly operations and an easy-to-use API, while supporting both traditional full-text search and modern vector-based retrieval.

Key Features

Typo-tolerant fuzzy search optimized for instant results
Search-as-you-type autocomplete and relevance tuning at query time
Faceting, filtering, grouping/distinct, and dynamic sorting
Geo search for location-based queries
Synonyms and pinning/merchandising controls for curated results
Vector and semantic search, including hybrid retrieval patterns
Scoped API keys and multi-tenant access patterns
High-availability options via replication

Use Cases

Site and in-app search for documentation, content, and product catalogs
E-commerce discovery with facets, filtering, sorting, and pinned results
Semantic search and hybrid keyword+vector retrieval for knowledge bases

Typesense is well-suited for teams that want a streamlined search stack with strong defaults, low operational complexity, and an HTTP API that integrates easily into modern applications.

25.3kstars

861forks

View Details

Onyx Community Edition

Open-source platform for AI chat, RAG, agents, and enterprise search across your team’s connected knowledge sources, compatible with hosted and local LLMs.

Onyx Community Edition is an open-source, self-hostable AI platform that combines a team chat UI with enterprise search and retrieval-augmented generation (RAG). It is designed to work with a wide range of LLM providers as well as locally hosted models, including deployments in airgapped environments.

Key Features

AI chat interface designed to work with multiple LLM providers and self-hosted LLMs
RAG with hybrid retrieval and contextual grounding over ingested and uploaded content
Connectors to many external knowledge sources with metadata ingestion
Custom agents with configurable instructions, knowledge, and actions
Web search integration and deep-research style multi-step querying
Collaboration features such as chat sharing, feedback collection, and user management
Enterprise-oriented access controls including RBAC and support for SSO (depending on configuration)

Use Cases

Company-wide AI assistant grounded in internal documents and connected tools
Knowledge discovery and enterprise search across large document collections
Building task-focused AI agents that can retrieve context and trigger actions

Limitations and Considerations

Some advanced organization-focused capabilities may differ between Community and Enterprise editions
Retrieval quality and permissions mirroring depend on connector availability and configuration

Onyx CE is a strong fit for teams that want an extensible, transparent AI assistant and search layer over internal knowledge. It emphasizes configurable retrieval, integrations, and deployability across diverse infrastructure setups.

17.6kstars

2.4kforks

View Details

Paperless-AI

Extension for Paperless‑ngx that uses OpenAI-compatible backends and Ollama to auto-classify, tag, index, and enable RAG-powered document chat and semantic search.

Paperless-AI is an AI-powered extension for Paperless‑ngx that automates document classification, metadata extraction and semantic search. It integrates with OpenAI-compatible APIs and local model backends to provide chat-style Q&A over a Paperless‑ngx archive.

Key Features

Automated document processing: detects new documents in Paperless‑ngx and extracts title, tags, document type, and correspondent.
Retrieval-Augmented Generation (RAG) chat: semantic search and contextual Q&A across the full document archive.
Multi-backend model support: works with OpenAI-compatible APIs, Ollama (local models), DeepSeek-r1, Azure and several other OpenAI-format backends.
Manual review UI: web interface to manually trigger AI processing, review results, and adjust settings.
Smart tagging and rule engine: configurable rules to control which documents are processed and what tags are applied.
Docker-first distribution: official Docker image and docker-compose support for containerized deployment and persistent storage.

Use Cases

Quickly find facts across scanned bills, contracts and receipts via natural-language Q&A instead of manual search.
Automatically tag and classify incoming documents to reduce manual filing and speed up archival workflows.
Create structured metadata from free-text documents for downstream automation or reporting.

Limitations and Considerations

Quality and consistency of automatic tags and correspondents varies by model and prompt; some users report noisy or incorrect tags that require manual cleanup.
Resource behavior with local model backends (e.g., Ollama) can be heavy; users have reported long-running sessions or elevated GPU/CPU usage depending on model choice and volume.
Processing can halt on model/API errors (for example, context-length or API failures); robust retry/monitoring may be required in large archives.
Requires a running Paperless‑ngx instance and appropriate API credentials and model/back-end configuration to operate.

Paperless-AI provides an accessible way to add AI-driven classification and semantic search to a Paperless‑ngx archive, with flexible backend choices and a modern web UI. It is best suited for users who want automated tagging and conversational access to large document collections but should be configured and monitored to manage resource use and tag quality.

5.3kstars

259forks

View Details

YaCy

YaCy is a self-hostable search engine with crawler and indexing, supporting decentralized P2P search, standalone search portals, and intranet/file search.

YaCy is a self-hosted search engine stack combining a web crawler, an index, and a web UI for searching and managing content. It can run as a standalone search portal, an intranet search appliance, or as part of a decentralized peer-to-peer network that exchanges index data for web search.

Key Features

Built-in web crawler with scheduling to keep indexes fresh
Search UI plus administration interface for configuring crawls, indexes, and peers
Peer-to-peer mode for sharing index data without relying on a central operator
Standalone mode for private, local-only search results from your own index
Intranet search use case with network scanning to discover HTTP, FTP, and SMB servers
HTTP-based interfaces with XML/JSON outputs for many pages and functions

Use Cases

Run a private search portal for a curated set of websites you crawl
Provide intranet search across internal web services and shared resources
Participate in a community-operated decentralized web search network

Limitations and Considerations

Precompiled packages may be less frequent; building from source is commonly recommended
Requires Java (11+) and can be resource-intensive depending on crawl and index size

YaCy is suited to organizations and individuals who want control over crawling and indexing, and who prefer privacy-aware search without dependence on a centralized search provider. Its flexible modes make it useful both for private indexing and for distributed web search participation.

3.8kstars

476forks

View Details

Aleph

Aleph indexes documents and structured datasets to enable fast search, entity extraction, and cross-referencing for investigative research and OSINT workflows.

Aleph is an investigative data platform for ingesting and indexing large collections of documents and structured datasets, making them searchable and easier to analyze. It is designed to help researchers find people, companies, and connections across many sources, including watchlists and prior research.

Key Features

Ingests and indexes documents (such as PDF, Word, and HTML) and structured data (such as CSV and spreadsheets)
Full-text search and browsing across datasets and uploaded materials
Entity-centric exploration focused on people, companies, and other known entities
Cross-referencing and matching entities against watchlists and reference datasets
Supports operational workflows for managing data imports and collections

Use Cases

Investigative journalism: search leaks, filings, and datasets for names and relationships
OSINT research: unify and query diverse sources (documents plus tabular data)
Compliance or due diligence research: check entities against internal or external lists

Limitations and Considerations

The open-source version is in a sunsetting phase, with official maintenance planned to end after December 2025

Aleph is well-suited for teams that need to turn large, heterogeneous collections of files and tables into a searchable investigative corpus. Its emphasis on entity discovery and cross-referencing makes it particularly useful for research-driven analysis workflows.

2.3kstars

332forks

View Details

Diskover

Diskover indexes file systems with Elasticsearch to provide fast file search, metadata analytics, and storage visibility across on-prem, NAS, and cloud storage.

Diskover is a data management and analytics platform for unstructured file data that crawls storage, enriches file metadata, and indexes it for fast search and reporting. It is designed to help teams understand what they have, where it lives, and how storage is being used.

Key Features

Crawls and indexes heterogeneous storage (local file systems, NFS/SMB shares, and other supported sources)
Elasticsearch-backed indexing for fast file search and filtering
Storage usage analytics to identify cold data, growth trends, and large consumers
Duplicate file discovery and wasted-space analysis
Extensible metadata enrichment via plugins
Web UI for search, reporting, and operational visibility

Use Cases

Storage capacity planning and cost optimization by finding cold/unused or duplicate data
Rapid file discovery and investigation across large shares and mixed storage
Data hygiene initiatives such as organizing, tagging, and preparing curated datasets for analytics

Limitations and Considerations

Requires running and maintaining an Elasticsearch cluster for indexing and search
Crawling very large environments may require tuning and scheduling to manage resource usage

Diskover fits organizations and advanced homelabs that need centralized visibility into file data sprawl and want searchable metadata at scale. It pairs a crawler/indexer with a web interface to turn unstructured storage into actionable insights for cleanup, governance, and operations.

1.8kstars

182forks

View Details

#10

SecureAI Tools

Self-hosted private AI tools for chat and document Q&A, supporting local Ollama inference or OpenAI-compatible APIs, with built-in authentication and user management.

SecureAI Tools is a self-hosted web app for private AI productivity, focused on AI chat and chatting with your own documents. It can run models locally via Ollama or connect to OpenAI-compatible providers, and includes built-in access controls for multi-user use.

Key Features

Chat interface for interacting with LLMs
Document Q&A (PDF support) with offline document processing
Local model inference via Ollama, with optional GPU acceleration
Support for remote OpenAI-compatible APIs as an alternative to local inference
Built-in email/password authentication and basic user management
Optimized self-hosting experience with Docker Compose and setup scripts
Integrations including Paperless-ngx and Google Drive

Use Cases

Private, family or small-team AI assistant with account-based access
Ask questions and summarize PDFs and organized document collections
Run local LLMs on a workstation or home server to keep data on-premises

Limitations and Considerations

Document chat is currently focused on PDFs; broader file-type support is still evolving
Local inference performance depends heavily on available RAM/GPU, especially on non-Apple systems

SecureAI Tools is a practical option for users who want a privacy-oriented AI chat experience combined with document Q&A, and the flexibility to choose between local models and OpenAI-compatible providers.

1.7kstars

86forks

View Details

#11

Apache Solr

Scalable enterprise search platform supporting full-text, vector, faceted and geospatial search with SolrCloud clustering and a web admin UI.

Apache Solr is an open-source, high-performance search platform that extends the Apache Lucene library to provide full-text, vector and geospatial search capabilities. It exposes REST-like APIs, a responsive admin UI and tooling for indexing, querying and cluster management.

Key Features

Full-text search with advanced query parsing, scoring, spellcheck, highlighting and suggestions.
Dense-vector (ANN) search and text-to-vector integration for neural/semantic search workflows.
Faceting, aggregations and JSON Facet API for powerful drill-down and analytics.
Scalable SolrCloud mode with distributed indexing, replica management and centralized configuration.
Built-in admin UI, metrics (JMX), plugin/extension points and rich document parsing (Apache Tika integration).

Use Cases

Site and application search for e-commerce, media catalogs and documentation with faceted navigation and relevance tuning.
Semantic search and recommendations using dense-vector indexing and external embedding providers.
Large-scale, multi-tenant search deployments requiring distributed indexing, high availability and automated failover (SolrCloud).

Limitations and Considerations

SolrCloud relies on ZooKeeper for cluster coordination, which adds an operational component to manage and monitor.
Vector search and "text-to-vector" features typically require external embedding services or model integrations to produce vectors; performance and storage costs should be evaluated for large vector collections.

Apache Solr is a mature, extensible search engine suited for both small projects and massive, production search clusters. It combines Lucene search primitives with cluster orchestration, extensibility and modern features like neural search to support a wide range of search and discovery applications.

1.6kstars

810forks

View Details

#12

Fess

Fess is an open-source enterprise search server with a built-in crawler, web-based administration, and OpenSearch/Elasticsearch-backed full-text search.

Fess is an enterprise full-text search server designed to index and search content from multiple sources such as websites, file systems, and data stores. It provides a browser-based administration UI and can run anywhere a Java runtime (or Docker) is available.

Key Features

Web-based admin console to configure crawlers, indexing, and search UI settings
Built-in crawler for web content, file systems, and network shares, with support for many document formats (for example PDF and Microsoft Office)
Search backed by OpenSearch (and can also utilize Elasticsearch)
Faceted search, drill-down, and result labeling to improve discovery
Search and click log collection for analysis and relevance tuning
Extensible architecture with plugins and integrations, including JSON-based API output
Secure crawling and search options, including authenticated content and SSO integrations

Use Cases

Internal enterprise search across intranet sites, shared folders, and document repositories
Site search for public or private websites with embeddable JavaScript integration
Unified search portal across multiple business systems via connectors and plugins

Fess is a practical choice when you need a deployable, configurable search server with crawling, administration, and extensibility packaged into a single solution. It fits well for organizations that want full control over indexing pipelines and search behavior while relying on OpenSearch-compatible search capabilities.

1.1kstars

171forks

View Details

#13

Amurex

Amurex is an open-source AI copilot that unifies knowledge search across Notion, Drive and Obsidian, automates meetings, and triages email.

Amurex is an open-source AI copilot designed to live in the background of your existing tools. It unifies knowledge search across connected apps and automates meeting capture and follow-up tasks. It can be self-hosted for data control and privacy.

Key Features

Unified search across connected apps (Notion, Google Drive, Obsidian, and more)
Meetings on Autopilot: records, transcribes, summarizes, and tracks action items
Inbox categorization and email prioritization
Self-hosted, open-source architecture for data control
Local-mode/inference options (Ollama) with flexible AI backends (OpenAI, Groq, Mistral)

Use Cases

Personal knowledge management: search across notes, docs, and sources from multiple tools
Meeting automation: capture transcripts, generate summaries, and assign follow-ups
Email triage and workflow orchestration within your existing toolchain

Limitations and Considerations

Requires self-hosted backend and proper API keys (OpenAI, Groq, etc.) to run online or LOCAL mode for local inference
Setup can be complex: involves Supabase/PostgreSQL, Redis, and containerized services
Local inference requires Ollama and related local dependencies
Open-source licensing (AGPL-3.0) and data residency depend on how you deploy

Conclusion Amurex combines cross-tool search with automated meeting workflows in a privacy-conscious, open-source package and supports flexible self-hosted deployments. It is actively developed with multiple AI backends and deployment options to suit professional workflows.

146stars

49forks

View Details

Why choose an open source alternative?

•Data ownership: Keep your data on your own servers
•No vendor lock-in: Freedom to switch or modify at any time
•Cost savings: Reduce or eliminate subscription fees
•Transparency: Audit the code and know exactly what's running

Alternatives List

Meilisearch

Key Features

Use Cases

Limitations and Considerations

AnythingLLM

Key Features

Use Cases

Khoj

Key Features

Use Cases

Limitations and Considerations

Typesense

Key Features

Use Cases

Onyx Community Edition

Key Features

Use Cases

Limitations and Considerations

Paperless-AI

Key Features

Use Cases

Limitations and Considerations

YaCy

Key Features

Use Cases

Limitations and Considerations

Aleph

Key Features

Use Cases

Limitations and Considerations

Diskover

Key Features

Use Cases

Limitations and Considerations

SecureAI Tools

Key Features

Use Cases

Limitations and Considerations

Apache Solr

Key Features

Use Cases

Limitations and Considerations

Fess

Key Features

Use Cases

Amurex

Key Features

Use Cases

Limitations and Considerations

Why choose an open source alternative?