Diskover

Diskover

File system indexing, search, and storage analytics platform

1.7kstars
180forks
Last commit: 4mo ago
Repo age: 9y old

Diskover is a data management and analytics platform for unstructured file data that crawls storage, enriches file metadata, and indexes it for fast search and reporting. It is designed to help teams understand what they have, where it lives, and how storage is being used.

Key Features

  • Crawls and indexes heterogeneous storage (local file systems, NFS/SMB shares, and other supported sources)
  • Elasticsearch-backed indexing for fast file search and filtering
  • Storage usage analytics to identify cold data, growth trends, and large consumers
  • Duplicate file discovery and wasted-space analysis
  • Extensible metadata enrichment via plugins
  • Web UI for search, reporting, and operational visibility

Use Cases

  • Storage capacity planning and cost optimization by finding cold/unused or duplicate data
  • Rapid file discovery and investigation across large shares and mixed storage
  • Data hygiene initiatives such as organizing, tagging, and preparing curated datasets for analytics

Limitations and Considerations

  • Requires running and maintaining an Elasticsearch cluster for indexing and search
  • Crawling very large environments may require tuning and scheduling to manage resource usage

Diskover fits organizations and advanced homelabs that need centralized visibility into file data sprawl and want searchable metadata at scale. It pairs a crawler/indexer with a web interface to turn unstructured storage into actionable insights for cleanup, governance, and operations.

Categories:

Tags:

Tech Stack:

Share:

Similar Services

Meilisearch

Meilisearch

Fast search engine API with full-text, vector, and hybrid search

55.4k
2.3k
Last commit: 2d ago

Meilisearch is a lightning-fast search engine API for apps and websites, offering typo-tolerant full-text search plus vector and AI-ready hybrid retrieval.

Alternative to:
Algolia
Algolia
+16
ArchiveBox

ArchiveBox

Open-source self-hosted web archiving and snapshotting tool

26.4k
1.4k
Last commit: 11d ago

Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Alternative to:
Internet Archive Wayback Machine
Internet Archive Wayback Machine
+3
Typesense

Typesense

Fast, typo-tolerant search engine with keyword and vector search

25k
850
Last commit: 2d ago

Typesense is a developer-friendly search engine for instant, typo-tolerant search-as-you-type with faceting, filtering, geo search, and vector/semantic search APIs.

Alternative to:
Algolia
Algolia
+19
SearXNG

SearXNG

Privacy-focused metasearch engine for aggregating web results

24.2k
2.4k
Last commit: 22h ago

SearXNG is a privacy-respecting metasearch engine that aggregates results from many search services without tracking or profiling users.

Alternative to:
Google Search
Google Search
+6
ZincSearch

ZincSearch

A lightweight open-source search engine for full-text indexing.

17.7k
762
Last commit: 1mo ago

ZincSearch is a Go-based, lightweight search engine for full-text indexing with Elasticsearch API-compatible ingestion, a Vue UI, and a schema-less document model.

Alternative to:
Elastic Cloud (Elasticsearch Service)
Elastic Cloud (Elasticsearch Service)
+7
Onyx Community Edition

Onyx Community Edition

Self-hosted AI chat and enterprise search for any LLM

17.1k
2.3k
Last commit: 16h ago

Open-source platform for AI chat, RAG, agents, and enterprise search across your team’s connected knowledge sources, compatible with hosted and local LLMs.

Alternative to:
Onyx
Onyx
+19