Elastic Cloud (Elasticsearch Service)

Best Self Hosted Alternatives to Elastic Cloud (Elasticsearch Service)

A curated collection of the 18 best self hosted alternatives to Elastic Cloud (Elasticsearch Service).

Managed Elasticsearch on Elastic Cloud for deploying, scaling, and operating Elasticsearch clusters on AWS, Azure, and Google Cloud. Provides full-text search, indexing, analytics, ingest pipelines, and integrated Kibana visualization and management tools.

Alternatives List

#1
Meilisearch

Meilisearch

Meilisearch is a lightning-fast search engine API for apps and websites, offering typo-tolerant full-text search plus vector and AI-ready hybrid retrieval.

Meilisearch screenshot

Meilisearch is an open source search engine exposed through an API, designed to provide fast, relevant search experiences for websites and applications. It combines traditional full-text search with optional vector-based semantic retrieval to support hybrid search and AI retrieval workflows.

Key Features

  • REST API for indexing documents and running searches
  • Search-as-you-type with low-latency results
  • Typo tolerance and configurable ranking/relevancy tuning
  • Filtering, faceting, and sorting for building rich search UIs
  • Geosearch for location-based filtering and ranking
  • Vector storage and vector search for semantic retrieval and hybrid search
  • API key-based access control, including tenant tokens for multi-tenancy

Use Cases

  • Site and application search with instant results and typo tolerance
  • E-commerce and catalog search with facets, filters, and sorting
  • AI retrieval and RAG pipelines using hybrid (full-text + vector) search

Limitations and Considerations

  • Some advanced capabilities (for example sharding and certain snapshot features) are reserved for the Enterprise Edition under a non-open-source license
  • Telemetry is enabled by default but can be disabled

Meilisearch is well-suited for teams that want a developer-friendly search API that is easy to integrate, performs well out of the box, and can evolve from classic keyword search to modern hybrid AI retrieval as needs grow.

55.4kstars
2.3kforks
#2
ClickHouse

ClickHouse

Open-source OLAP database designed for real-time analytics at scale.

ClickHouse is an open-source, column-oriented SQL database designed for real-time analytics. It scales from a laptop deployment to hundreds of servers and supports real-time ingestion, high concurrency, and petabyte-scale workloads.

Key Features

  • Full JOIN support with advanced join algorithms for fast analytics across normalized datasets
  • Built for high concurrency with cloud-native architecture for scalable, low-latency queries
  • Lightweight data mutations that update/delete only affected rows without rewriting large datasets
  • Flexible schema-on-write with JSON ingestion for semi-structured data
  • Infinitely scalable to handle petabyte-scale workloads with sharding and replication
  • Pluggable storage architecture supporting SSDs, spinning disks, and object storage
  • Backups to object storage and point-in-time snapshots for data protection
  • Interoperability with 70+ file formats and open lake formats for reporting and analytics
  • Complete SQL support with an optimizer, nested data structures, and hundreds of analytical functions

Use Cases

  • Real-time analytics and observability dashboards for applications and infrastructure
  • Data warehousing and large-scale analytical reporting
  • ML and GenAI data preparation and feature engineering pipelines

Conclusion

ClickHouse delivers blazing-fast analytics at scale with strong SQL support, real-time ingestion, and a resilient, distributed architecture. It is suitable for observability, data warehousing, and GenAI workloads across on-premises and cloud environments.

Sources: official site evidence and repository references. (clickhouse.com)

45.2kstars
8kforks
#3
Typesense

Typesense

Typesense is a developer-friendly search engine for instant, typo-tolerant search-as-you-type with faceting, filtering, geo search, and vector/semantic search APIs.

Typesense screenshot

Typesense is an open source search engine designed for low-latency, “search-as-you-type” experiences. It focuses on developer-friendly operations and an easy-to-use API, while supporting both traditional full-text search and modern vector-based retrieval.

Key Features

  • Typo-tolerant fuzzy search optimized for instant results
  • Search-as-you-type autocomplete and relevance tuning at query time
  • Faceting, filtering, grouping/distinct, and dynamic sorting
  • Geo search for location-based queries
  • Synonyms and pinning/merchandising controls for curated results
  • Vector and semantic search, including hybrid retrieval patterns
  • Scoped API keys and multi-tenant access patterns
  • High-availability options via replication

Use Cases

  • Site and in-app search for documentation, content, and product catalogs
  • E-commerce discovery with facets, filtering, sorting, and pinned results
  • Semantic search and hybrid keyword+vector retrieval for knowledge bases

Typesense is well-suited for teams that want a streamlined search stack with strong defaults, low operational complexity, and an HTTP API that integrates easily into modern applications.

25kstars
850forks
#4
Vector

Vector

Open-source observability pipeline to collect, transform, and route logs and metrics with a single, high-performance binary and programmable transforms.

Vector screenshot

Vector is an open-source, high-performance observability data pipeline for collecting, transforming, and routing logs and metrics. It is implemented as a single, memory-safe binary and supports agent, sidecar, and aggregator deployment modes. (vector.dev)

Key Features

  • Built in Rust for memory safety and high throughput (single binary distribution).
  • Programmable transforms using the Vector Remap Language (VRL) for flexible data enrichment and parsing.
  • Wide list of first-class components: dozens of sources, transforms, and sinks (e.g., Kafka, S3, Elasticsearch, Prometheus integrations).
  • GraphQL API with a built-in playground for inspecting topology, metrics, and live queries.
  • Delivery and buffering guarantees designed for reliability in production pipelines.

(vector.dev)

Use Cases

  • Centralize logs and metrics from heterogeneous systems and route them to vendors or long-term stores.
  • Perform in-pipeline enrichment, filtering, and redaction to improve data quality and privacy before export.
  • Replace or consolidate multiple agents/forwarders to reduce operational cost and complexity.

(github.com)

Limitations and Considerations

  • Metrics support is marked as beta; traces are indicated as forthcoming, so full unified telemetry coverage may be incomplete for some users.
  • Some advanced integrations and vendor-specific capabilities may require configuration tuning; large-scale deployments should validate topology and buffering settings for their workload.

(github.com)

Vector provides a compact, performant toolkit for observability pipelines focused on reliability, vendor neutrality, and powerful in-flight transforms. It is widely used in production and maintained by an active open-source community.

21.1kstars
2kforks
#5
ZincSearch

ZincSearch

ZincSearch is a Go-based, lightweight search engine for full-text indexing with Elasticsearch API-compatible ingestion, a Vue UI, and a schema-less document model.

ZincSearch screenshot

ZincSearch is a lightweight, self-hosted search engine written in Go that provides full-text indexing with an Elasticsearch-compatible ingestion API and a dedicated Vue-based UI. It is designed to be simple to install and resource-efficient, making it suitable for app search and small-scale search workloads.

Key Features

  • Full-text indexing capability
  • Single binary distribution with multi-platform releases
  • Web UI for querying data (built with Vue)
  • Compatibility with Elasticsearch APIs for data ingestion (single-record and bulk)
  • Out-of-the-box authentication
  • Schema-less data model: different documents in the same index can have different fields
  • Index storage on disk
  • Aggregation support
  • Built on the Bluge indexing library for efficient search

Use Cases

  • App search and site search for applications and websites
  • Lightweight indexing of documents, emails, product catalogs, or similar data
  • Quick, self-hosted search deployments for small teams or private environments

Limitations and Considerations

  • Kibana is not supported; ZincSearch provides its own Vue-based UI

Conclusion

ZincSearch offers a compact, Go-based search solution for full-text indexing with Elasticsearch API compatibility and a native UI. It is well-suited for simple app search workloads and smaller on-premise deployments that require self-hosted indexing. (github.com)

17.7kstars
762forks
#6
Onyx Community Edition

Onyx Community Edition

Open-source platform for AI chat, RAG, agents, and enterprise search across your team’s connected knowledge sources, compatible with hosted and local LLMs.

Onyx Community Edition screenshot

Onyx Community Edition is an open-source, self-hostable AI platform that combines a team chat UI with enterprise search and retrieval-augmented generation (RAG). It is designed to work with a wide range of LLM providers as well as locally hosted models, including deployments in airgapped environments.

Key Features

  • AI chat interface designed to work with multiple LLM providers and self-hosted LLMs
  • RAG with hybrid retrieval and contextual grounding over ingested and uploaded content
  • Connectors to many external knowledge sources with metadata ingestion
  • Custom agents with configurable instructions, knowledge, and actions
  • Web search integration and deep-research style multi-step querying
  • Collaboration features such as chat sharing, feedback collection, and user management
  • Enterprise-oriented access controls including RBAC and support for SSO (depending on configuration)

Use Cases

  • Company-wide AI assistant grounded in internal documents and connected tools
  • Knowledge discovery and enterprise search across large document collections
  • Building task-focused AI agents that can retrieve context and trigger actions

Limitations and Considerations

  • Some advanced organization-focused capabilities may differ between Community and Enterprise editions
  • Retrieval quality and permissions mirroring depend on connector availability and configuration

Onyx CE is a strong fit for teams that want an extensible, transparent AI assistant and search layer over internal knowledge. It emphasizes configurable retrieval, integrations, and deployability across diverse infrastructure setups.

17.1kstars
2.3kforks
#7
Apache Druid

Apache Druid

Apache Druid is a real-time analytics (OLAP) database delivering sub-second queries on streaming and batch data with high concurrency at scale.

Apache Druid screenshot

Apache Druid is a high-performance real-time analytics database designed for interactive OLAP queries on large, high-cardinality datasets. It supports both streaming and batch ingestion and is optimized for low-latency queries under high concurrency.

Key Features

  • Sub-second interactive query engine optimized for high-dimensional, high-cardinality data
  • Native streaming ingestion designed for query-on-arrival use cases
  • Columnar storage with time indexing, dictionary encoding, bitmap indexes, and compression
  • SQL API plus native query APIs over HTTP, including JDBC connectivity
  • Built-in web console for ingestion setup, query exploration, and cluster visibility
  • Elastic, loosely coupled architecture separating ingestion, query, and coordination services
  • Tiering and quality-of-service controls to prioritize mixed workloads

Use Cases

  • Powering real-time analytics dashboards and embedded analytics in user-facing applications
  • Ad-hoc operational analytics on event, clickstream, and observability-style data
  • High-concurrency OLAP analytics on time-series and event data from streaming platforms

Limitations and Considerations

  • Operates as a distributed system with multiple service types, which can increase operational complexity compared to single-node databases
  • Designed primarily for analytics workloads; it is not a general-purpose OLTP database

Apache Druid is well-suited for organizations that need fast, consistent analytical queries on continuously arriving data. Its storage format and distributed architecture make it effective for high-scale, high-concurrency real-time analytics applications.

13.9kstars
3.8kforks
#8
OpenSearch

OpenSearch

OpenSearch is an Apache 2.0 open source distributed search and analytics engine for indexing, querying, and analyzing large-scale data with REST APIs.

OpenSearch is an Apache 2.0-licensed, community-driven distributed search and analytics engine designed for indexing and querying large volumes of data. It provides a RESTful API and is commonly used as the core search backend for applications and as a foundation for log and event analytics.

Key Features

  • Distributed indexing and search for horizontal scalability and high availability
  • RESTful API for indexing, querying, and cluster operations
  • Full-text search and relevance scoring for unstructured and semi-structured data
  • Aggregations for analytical queries over large datasets
  • Extensible architecture with plugins for additional capabilities

Use Cases

  • Powering application search for websites, product catalogs, and documentation
  • Centralized log search and analytics for infrastructure and applications
  • Building analytics experiences over event, text, and time-based datasets

Limitations and Considerations

  • Operational complexity can be significant for large clusters (sizing, tuning, shard management)
  • Query performance and cost depend heavily on index design and workload patterns

OpenSearch is a strong fit when you need scalable search and analytics with an open ecosystem and a well-known REST interface. It can serve as a primary search backend or as a core component in broader observability and analytics pipelines.

12.2kstars
2.4kforks
#9
Manticore Search

Manticore Search

Manticore Search is a fast open-source search database for full-text, faceted, and vector search with SQL (MySQL protocol) and HTTP JSON APIs.

Manticore Search screenshot

Manticore Search is an open-source search database designed for building fast full-text and hybrid (text + filters) search applications. It provides a SQL-first experience with MySQL protocol compatibility and an HTTP JSON API for programmatic indexing and querying.

Key Features

  • Full-text search with relevance ranking (BM25-style), highlighting, and many match operators
  • SQL interface with MySQL protocol support for querying and management
  • HTTP JSON API, including Elasticsearch-compatible bulk writes for easier ingestion
  • Real-time indexing so newly inserted or updated documents are searchable immediately
  • Advanced search capabilities such as faceting, geo-spatial search, autocomplete, fuzzy search, and spell correction
  • Vector search (KNN) to support semantic and similarity search scenarios
  • Multiple storage modes, including row-wise and optional columnar storage for larger datasets
  • High-availability options including built-in replication and load balancing
  • Built-in backup and restore tooling (including SQL BACKUP)

Use Cases

  • Application search for catalogs, marketplaces, documentation, and knowledge bases
  • Log/event search and analytics-style querying on large datasets
  • Hybrid search combining keyword relevance with filters, geo, and vector similarity

Limitations and Considerations

  • Not fully ACID-compliant; transaction semantics differ from general-purpose relational databases
  • Some features (such as columnar storage) may require additional components and tuning depending on workload

Manticore Search is well-suited when you need a high-performance, resource-efficient search engine with familiar SQL workflows and flexible APIs. It aims to be an approachable alternative to Elasticsearch for many search and analytics scenarios.

11.6kstars
622forks
#10
Quickwit

Quickwit

Open-source cloud-native search engine for observability data on object storage with an Elasticsearch/OpenSearch-compatible API.

Quickwit is a cloud-native open-source search engine built for observability data, including logs and traces. It runs compute separately from storage and supports querying data directly on object storage for scalable, cost-efficient search.

Key Features

  • Full-text search and aggregation queries
  • Elasticsearch-compatible API, use Quickwit with Elasticsearch or OpenSearch clients
  • Jaeger-native and OTEL-native support for logs and traces
  • Schemaless indexing and analytics
  • Sub-second search on cloud storage (e.g., S3, Azure Blob, Google Cloud Storage)
  • Decoupled compute and storage with stateless indexers & searchers
  • Grafana data source
  • Kubernetes-ready with a Helm chart
  • RESTful API

Use Cases

  • Log management across large-scale deployments
  • Distributed tracing analytics for microservices
  • Real-time search and exploration of observability data to troubleshoot incidents

Conclusion

Quickwit is a scalable, open-source solution designed to search and analyze vast observability datasets directly on cloud storage. Its architecture emphasizes decoupled compute/storage, compatibility with popular tooling, and ease of deployment on Kubernetes.

10.8kstars
506forks
#11
Graylog

Graylog

Graylog is an open source platform for collecting, indexing, searching, and alerting on logs and machine data from many sources in one place.

Graylog screenshot

Graylog is a centralized log management platform for ingesting, storing, and analyzing logs and machine data at scale. It helps teams search across multiple data sources, detect operational issues, and support security monitoring workflows.

Key Features

  • Centralized collection of logs via common inputs such as Syslog and GELF
  • Search, filtering, and field extraction for structured log analysis
  • Streams and pipelines to route, transform, and enrich messages
  • Dashboards and visualizations for operational and security monitoring
  • Alerting and notifications based on queries and event conditions
  • Integrations for common log shippers and message brokers (for example Kafka and AMQP)

Use Cases

  • Troubleshooting application and infrastructure incidents using centralized search
  • Building operational dashboards for service health and error tracking
  • Security monitoring and investigations using aggregated log data

Limitations and Considerations

  • Typically relies on an external search backend (commonly Elasticsearch or OpenSearch), which adds operational complexity
  • License is SSPL, which can be a consideration for some organizations

Graylog is a strong fit for teams that need a mature log analysis workflow with flexible ingestion options and powerful search. It is commonly used to improve observability, incident response, and security-focused log monitoring in a single system.

7.9kstars
1.1kforks
#12
Aleph

Aleph

Aleph indexes documents and structured datasets to enable fast search, entity extraction, and cross-referencing for investigative research and OSINT workflows.

Aleph screenshot

Aleph is an investigative data platform for ingesting and indexing large collections of documents and structured datasets, making them searchable and easier to analyze. It is designed to help researchers find people, companies, and connections across many sources, including watchlists and prior research.

Key Features

  • Ingests and indexes documents (such as PDF, Word, and HTML) and structured data (such as CSV and spreadsheets)
  • Full-text search and browsing across datasets and uploaded materials
  • Entity-centric exploration focused on people, companies, and other known entities
  • Cross-referencing and matching entities against watchlists and reference datasets
  • Supports operational workflows for managing data imports and collections

Use Cases

  • Investigative journalism: search leaks, filings, and datasets for names and relationships
  • OSINT research: unify and query diverse sources (documents plus tabular data)
  • Compliance or due diligence research: check entities against internal or external lists

Limitations and Considerations

  • The open-source version is in a sunsetting phase, with official maintenance planned to end after December 2025

Aleph is well-suited for teams that need to turn large, heterogeneous collections of files and tables into a searchable investigative corpus. Its emphasis on entity discovery and cross-referencing makes it particularly useful for research-driven analysis workflows.

2.3kstars
326forks
#13
Parseable

Parseable

Parseable ingests, analyzes, and extracts insights from MELT telemetry data with predictive analytics and a unified SQL/NL querying interface.

Parseable screenshot

Parseable is a full-stack observability platform built to ingest, analyze and extract insights from all types of telemetry (MELT) data. It can run locally, in the cloud, or as a managed service, providing a unified way to explore signals across the stack.

Key Features

  • Unified signals across MELT data for a single source of truth
  • Predictive analytics and anomaly forecasting to anticipate issues
  • Natural language and SQL querying across telemetry
  • Hybrid execution engine with columnar storage and indexing for fast queries
  • Granular access control and federated IAM
  • Open standards and vendor-neutral design (OTel, Parquet compatibility)
  • Cloud-ready with BYOC options

Use Cases

  • Full-stack observability of applications, databases, infrastructure and networks
  • AI workloads observability for telemetry from AI models and LLMs
  • Product observability to analyze user behavior, feature adoption, and performance

Conclusion Parseable provides predictive observability with a unified data model, enabling faster insights and proactive incident response across the full telemetry stack.

2.3kstars
158forks
#14
Diskover

Diskover

Diskover indexes file systems with Elasticsearch to provide fast file search, metadata analytics, and storage visibility across on-prem, NAS, and cloud storage.

Diskover is a data management and analytics platform for unstructured file data that crawls storage, enriches file metadata, and indexes it for fast search and reporting. It is designed to help teams understand what they have, where it lives, and how storage is being used.

Key Features

  • Crawls and indexes heterogeneous storage (local file systems, NFS/SMB shares, and other supported sources)
  • Elasticsearch-backed indexing for fast file search and filtering
  • Storage usage analytics to identify cold data, growth trends, and large consumers
  • Duplicate file discovery and wasted-space analysis
  • Extensible metadata enrichment via plugins
  • Web UI for search, reporting, and operational visibility

Use Cases

  • Storage capacity planning and cost optimization by finding cold/unused or duplicate data
  • Rapid file discovery and investigation across large shares and mixed storage
  • Data hygiene initiatives such as organizing, tagging, and preparing curated datasets for analytics

Limitations and Considerations

  • Requires running and maintaining an Elasticsearch cluster for indexing and search
  • Crawling very large environments may require tuning and scheduling to manage resource usage

Diskover fits organizations and advanced homelabs that need centralized visibility into file data sprawl and want searchable metadata at scale. It pairs a crawler/indexer with a web interface to turn unstructured storage into actionable insights for cleanup, governance, and operations.

1.7kstars
180forks
#15
Apache Solr

Apache Solr

Scalable enterprise search platform supporting full-text, vector, faceted and geospatial search with SolrCloud clustering and a web admin UI.

Apache Solr screenshot

Apache Solr is an open-source, high-performance search platform that extends the Apache Lucene library to provide full-text, vector and geospatial search capabilities. It exposes REST-like APIs, a responsive admin UI and tooling for indexing, querying and cluster management. (lucene.apache.org)

Key Features

  • Full-text search with advanced query parsing, scoring, spellcheck, highlighting and suggestions. (solr.apache.org)
  • Dense-vector (ANN) search and text-to-vector integration for neural/semantic search workflows. (solr.apache.org)
  • Faceting, aggregations and JSON Facet API for powerful drill-down and analytics. (solr.apache.org)
  • Scalable SolrCloud mode with distributed indexing, replica management and centralized configuration. (solr.apache.org)
  • Built-in admin UI, metrics (JMX), plugin/extension points and rich document parsing (Apache Tika integration). (solr.apache.org)

Use Cases

  • Site and application search for e-commerce, media catalogs and documentation with faceted navigation and relevance tuning.
  • Semantic search and recommendations using dense-vector indexing and external embedding providers.
  • Large-scale, multi-tenant search deployments requiring distributed indexing, high availability and automated failover (SolrCloud).

Limitations and Considerations

  • SolrCloud relies on ZooKeeper for cluster coordination, which adds an operational component to manage and monitor. (solr.apache.org)
  • Vector search and "text-to-vector" features typically require external embedding services or model integrations to produce vectors; performance and storage costs should be evaluated for large vector collections. (solr.apache.org)

Apache Solr is a mature, extensible search engine suited for both small projects and massive, production search clusters. It combines Lucene search primitives with cluster orchestration, extensibility and modern features like neural search to support a wide range of search and discovery applications. (lucene.apache.org)

1.5kstars
804forks
#16
sist2

sist2

sist2 is a fast, low-memory file system indexer with a web UI for searching file contents and metadata, with Elasticsearch or SQLite backends.

sist2 (Simple incremental search tool) is a lightning-fast file system indexer that scans directories and builds a searchable index of file contents and metadata. It provides a mobile-friendly web interface and supports either Elasticsearch or a lightweight SQLite (FTS5) search backend.

Key Features

  • Incremental, multi-threaded scanning optimized for speed and low memory usage
  • Web UI for searching and browsing results, including thumbnails and metadata
  • Supports Elasticsearch indexing or a simpler SQLite-based search backend
  • Content extraction and metadata parsing for many common formats (documents, media, ebooks)
  • Recursive scanning inside archive files (including archives within archives)
  • Optional OCR via Tesseract for images and supported ebook/document formats
  • Manual tagging in the UI and automatic tagging via user scripts
  • Basic statistics and disk utilization visualizations

Use Cases

  • Personal or team “desktop search” for large document and media collections
  • Building a searchable archive of mixed file types (PDFs, photos, videos, ebooks)
  • Indexing NAS or server directories to quickly locate files by content or metadata

Limitations and Considerations

  • Elasticsearch provides more features but has a significantly higher resource footprint than SQLite
  • Archive scanning is single-threaded and some seek-heavy media formats in archives may be limited

sist2 is well-suited for users who want fast local file indexing with a modern web search experience and flexible backend options depending on resources and feature needs.

1.2kstars
72forks
#17
Fess

Fess

Fess is an open-source enterprise search server with a built-in crawler, web-based administration, and OpenSearch/Elasticsearch-backed full-text search.

Fess screenshot

Fess is an enterprise full-text search server designed to index and search content from multiple sources such as websites, file systems, and data stores. It provides a browser-based administration UI and can run anywhere a Java runtime (or Docker) is available.

Key Features

  • Web-based admin console to configure crawlers, indexing, and search UI settings
  • Built-in crawler for web content, file systems, and network shares, with support for many document formats (for example PDF and Microsoft Office)
  • Search backed by OpenSearch (and can also utilize Elasticsearch)
  • Faceted search, drill-down, and result labeling to improve discovery
  • Search and click log collection for analysis and relevance tuning
  • Extensible architecture with plugins and integrations, including JSON-based API output
  • Secure crawling and search options, including authenticated content and SSO integrations

Use Cases

  • Internal enterprise search across intranet sites, shared folders, and document repositories
  • Site search for public or private websites with embeddable JavaScript integration
  • Unified search portal across multiple business systems via connectors and plugins

Fess is a practical choice when you need a deployable, configurable search server with crawling, administration, and extensibility packaged into a single solution. It fits well for organizations that want full control over indexing pipelines and search behavior while relying on OpenSearch-compatible search capabilities.

1.1kstars
170forks
#18
Nimtable

Nimtable

Lightweight web UI and REST control plane for exploring, inspecting, and managing Apache Iceberg catalogs and tables with Docker deployment and engine integrations.

Nimtable is a lightweight control plane and observability platform for Apache Iceberg lakehouses. It provides a browser-based console and REST API to browse catalog metadata, inspect table layouts, run ad-hoc metadata queries, and orchestrate maintenance tasks delegated to compute engines.

Key Features

  • Browser console to explore catalogs, schemas, tables, partitions, snapshots, and manifests
  • REST API and optional Iceberg REST Catalog endpoint for query engines
  • Run SQL from the browser for quick metadata inspection
  • Visualizations of file and snapshot distribution to surface optimization opportunities
  • Integrations to delegate compaction/maintenance to external engines (e.g., Spark, RisingWave)
  • Docker Compose deployment and PostgreSQL metadata storage by default

(Feature details and deployment guidance documented in the project README and RisingWave docs). (github.com)

Use Cases

  • Inspect and troubleshoot Iceberg table metadata, snapshots, and file layout to find optimization targets
  • Operate and orchestrate compaction/maintenance jobs by delegating work to Spark, RisingWave, or other engines
  • Provide a standards-compliant Iceberg REST Catalog endpoint for query engines and interactive exploration

Limitations and Considerations

  • Fine-grained RBAC and advanced access-control features are listed as roadmap items and may be limited or absent in current releases
  • Caching, some monitoring/analytics features, and advanced scheduling/compaction strategies are planned but may not be production-complete

(Roadmap and known feature gaps are described in the repository documentation). (github.com)

Nimtable is intended as a lightweight, developer-facing control plane to simplify catalog inspection and routine maintenance for Iceberg lakehouses. It is designed to be run alongside existing catalogs and compute engines and to provide a consolidated UI and REST API for metadata operations.

429stars
24forks

Why choose an open source alternative?

  • Data ownership: Keep your data on your own servers
  • No vendor lock-in: Freedom to switch or modify at any time
  • Cost savings: Reduce or eliminate subscription fees
  • Transparency: Audit the code and know exactly what's running