Grafana Cloud

Best Self-hosted Alternatives to Grafana Cloud

A curated collection of the 19 best self hosted alternatives to Grafana Cloud.

Grafana Cloud is a managed observability platform providing hosted Grafana plus managed backends (Mimir for metrics, Loki for logs, Tempo for traces) to collect, store, visualize, and alert on metrics, logs, and traces for monitoring and APM.

Alternatives List

#1
Netdata

Netdata

Open-source, agent-based monitoring platform delivering per-second metrics, edge ML anomaly detection, tiered time-series storage and centralized cloud UI.

Netdata screenshot

Netdata is an open-source, agent-based observability platform that collects, stores, and visualizes per-second metrics across infrastructure and applications. It combines a lightweight edge agent, a tiered time-series store, and optional centralized Cloud/Parent components for unified views and collaboration.

Key Features

  • Per-second, real-time metrics collection with millisecond responsiveness and auto-generated dashboards.
  • Edge-based machine learning: unsupervised anomaly detection and per-metric ML models running on the agent.
  • Tiered, high-efficiency time-series storage (compact samples, ZSTD compression) with configurable retention and archiving.
  • Distributed Parent–Child streaming pipeline for horizontal scaling, multi-node aggregation, and long-term retention.
  • Broad integrations (800+ collectors) and export/archival targets including Prometheus, InfluxDB, OpenTSDB, and Graphite.
  • Low resource footprint (designed for minimal CPU/RAM impact) and zero-configuration auto-discovery on supported platforms.

Use Cases

  • Infrastructure and system monitoring: per-second visibility into CPU, memory, disks, network, sensors, and kernel metrics.
  • Container and Kubernetes observability: native containerd/Docker and Kubernetes integrations for pod, node, and cluster troubleshooting.
  • Incident troubleshooting and AIOps: anomaly detection, root-cause analysis, blast-radius identification, and automated reporting to accelerate incident resolution.

Limitations and Considerations

  • The Netdata UI and Netdata Cloud components are delivered as closed-source offerings while the Agent is open-source; organizations requiring fully open-source stacks should evaluate this split.
  • OpenTelemetry support is noted as "coming soon" in documentation; users relying heavily on OpenTelemetry may need to plan integrations or use exporters.
  • Feature parity varies by platform (Linux has the most comprehensive coverage); some platform-specific collectors or deep kernel metrics are not available everywhere.

Netdata offers a high-resolution, low-overhead approach to full-stack monitoring with built-in ML and flexible scaling via Parents and Netdata Cloud. It is well-suited for teams needing real-time troubleshooting, container/Kubernetes visibility, and efficient time-series retention while weighing the tradeoffs of closed-source UI/cloud components.

77.9kstars
6.4kforks
#2
Grafana

Grafana

Grafana is an open source observability and data visualization platform for querying, graphing, and alerting on metrics, logs, and traces across many data sources.

Grafana screenshot

Grafana is an open source observability and data visualization platform for querying, visualizing, and alerting on metrics, logs, and traces across many backends. It provides interactive dashboards and exploration workflows so teams can monitor systems and troubleshoot issues from a single interface.

Key Features

  • Dashboards with flexible visualizations and templating for reusable views
  • Explore workflows for ad-hoc querying and drilldowns across time ranges and data sources
  • Unified alerting with rule evaluation and multi-channel notifications
  • Pluggable data source and panel ecosystem to integrate with many metrics, log, and trace systems
  • Sharing and collaboration features for teams (dashboards, annotations, and permissions)

Use Cases

  • Infrastructure and Kubernetes monitoring using time-series backends
  • Centralized log exploration and correlation with metrics for incident response
  • Application observability by visualizing traces and service performance trends

Limitations and Considerations

  • The experience and capabilities depend heavily on the chosen data sources and plugins
  • Operating at very large scale can require careful tuning of storage backends and dashboard/query design

Grafana is well-suited for organizations that want a single “pane of glass” across diverse telemetry sources. Its extensible plugin model and alerting make it a common foundation for observability stacks in both homelabs and enterprise environments.

72.4kstars
13.5kforks
#3
Prometheus

Prometheus

Prometheus is an open-source monitoring and time-series database for collecting metrics, querying with PromQL, and alerting on system and application health.

Prometheus screenshot

Prometheus is an open-source systems and service monitoring platform built around a time-series database. It collects metrics from instrumented targets, lets you query them with PromQL, and supports alerting based on rules.

Key Features

  • Multi-dimensional time series data model using labels for flexible filtering and aggregation
  • PromQL query language for ad-hoc analysis, dashboards, and alert conditions
  • Pull-based metric scraping over HTTP with support for static configs and service discovery
  • Alert rule evaluation with alert generation (commonly paired with Alertmanager)
  • Federation support for hierarchical and cross-environment aggregation
  • Remote write/read integrations for long-term storage and interoperability

Use Cases

  • Monitoring Kubernetes clusters and cloud-native services via dynamic service discovery
  • Application and infrastructure telemetry for SRE/DevOps dashboards and alerting
  • Central metrics collection for microservices, batch jobs (via push gateway patterns), and exporters

Limitations and Considerations

  • Built-in storage is optimized for a single-node TSDB; long-term retention and global scale typically require external remote storage integrations

Prometheus is a strong fit when you want a reliable, standards-based metrics platform with powerful querying and a broad ecosystem of exporters and integrations. It is widely used for cloud-native monitoring and alert-driven operations.

62.9kstars
10.2kforks
#4
Grafana Loki

Grafana Loki

Grafana Loki is a Prometheus-inspired log aggregation system that indexes labels (not log contents) for cost-effective storage and fast querying, with Grafana integration.

Grafana Loki screenshot

Grafana Loki is a horizontally scalable, highly available log aggregation system inspired by Prometheus. It stores logs efficiently by indexing only metadata labels for each log stream, rather than performing full-text indexing.

Key Features

  • Label-based log indexing and querying aligned with Prometheus-style labels
  • Horizontally scalable architectures (single binary or microservices) with multi-tenancy support
  • Cost-efficient storage by keeping logs compressed and indexing only metadata
  • Native integration with Grafana for exploration, dashboards, and correlation with metrics
  • Multiple ingestion options via agents and clients (including Grafana Alloy and legacy Promtail)

Use Cases

  • Centralized aggregation of Kubernetes and container logs with label-based filtering
  • Incident investigation by correlating metrics and logs using shared labels
  • Multi-team or multi-environment log collection with tenant isolation

Limitations and Considerations

  • Not designed for full-text indexing; queries are primarily optimized around labels and structured metadata

Loki is a strong fit when you want an operationally simpler, Prometheus-like approach to logs with efficient storage and fast label-based queries. It is commonly deployed as part of a Grafana-centric observability stack for monitoring and troubleshooting.

27.7kstars
3.9kforks
#5
SigNoz

SigNoz

SigNoz is an open-source platform that collects and correlates logs, metrics, and traces using OpenTelemetry for unified observability.

SigNoz screenshot

SigNoz is an open-source observability platform designed to collect, store, and visualize logs, metrics, and traces in a single interface. Built on OpenTelemetry, SigNoz enables correlated signals and unified dashboards, with ClickHouse serving as the log datastore.

Key Features

  • Unified observability across logs, metrics, and traces
  • OpenTelemetry-native ingestion with semantic conventions
  • ClickHouse-backed log storage for fast queries
  • DIY query builder, PromQL support, and flexible dashboards
  • Alerts across signals with anomaly detection capabilities
  • Tracing visuals including flamegraphs and detailed span views

Use Cases

  • Instrumenting applications with OpenTelemetry to achieve end-to-end visibility across services
  • Correlating logs, metrics, and traces to troubleshoot microservices and distributed systems
  • Providing centralized observability for cloud-native environments with unified dashboards

Conclusion: SigNoz offers a single, OpenTelemetry-native platform to observe modern applications through correlated signals, scalable storage, and flexible visualization and alerting capabilities. It emphasizes openness, data correlation, and end-to-end debugging across logs, metrics, and traces.

25.9kstars
2kforks
#6
Vector

Vector

Open-source observability pipeline to collect, transform, and route logs and metrics with a single, high-performance binary and programmable transforms.

Vector screenshot

Vector is an open-source, high-performance observability data pipeline for collecting, transforming, and routing logs and metrics. It is implemented as a single, memory-safe binary and supports agent, sidecar, and aggregator deployment modes.

Key Features

  • Built in Rust for memory safety and high throughput (single binary distribution).
  • Programmable transforms using the Vector Remap Language (VRL) for flexible data enrichment and parsing.
  • Wide list of first-class components: dozens of sources, transforms, and sinks (e.g., Kafka, S3, Elasticsearch, Prometheus integrations).
  • GraphQL API with a built-in playground for inspecting topology, metrics, and live queries.
  • Delivery and buffering guarantees designed for reliability in production pipelines.

Use Cases

  • Centralize logs and metrics from heterogeneous systems and route them to vendors or long-term stores.
  • Perform in-pipeline enrichment, filtering, and redaction to improve data quality and privacy before export.
  • Replace or consolidate multiple agents/forwarders to reduce operational cost and complexity.

Limitations and Considerations

  • Metrics support is marked as beta; traces are indicated as forthcoming, so full unified telemetry coverage may be incomplete for some users.
  • Some advanced integrations and vendor-specific capabilities may require configuration tuning; large-scale deployments should validate topology and buffering settings for their workload.

Vector provides a compact, performant toolkit for observability pipelines focused on reliability, vendor neutrality, and powerful in-flight transforms. It is widely used in production and maintained by an active open-source community.

21.4kstars
2kforks
#7
Beszel

Beszel

Beszel is a lightweight server monitoring platform with historical metrics, Docker/Podman container stats, configurable alerts, multi-user access, and an API.

Beszel screenshot

Beszel is a lightweight server monitoring platform built around a central hub and per-host agents. It collects historical system metrics and container statistics and presents them in a simple web interface with alerting.

Key Features

  • Hub-and-agent architecture for monitoring multiple systems from a single dashboard
  • Historical metrics for host resources (CPU, memory, disk usage and I/O, network, load)
  • Docker and Podman container stats with per-container CPU, memory, and network history
  • Configurable alerts for CPU, memory, disk, bandwidth, temperature, load average, and system status
  • Multi-user support with admin sharing of monitored systems
  • OAuth2/OIDC authentication with optional password-auth disablement
  • Automatic backups with restore support, including S3-compatible storage targets
  • REST API for integrating metrics and management into scripts and applications

Use Cases

  • Homelab or small fleet monitoring with minimal resource overhead
  • Tracking server and container performance trends over time
  • Basic alerting for capacity and health signals (disk, bandwidth, temperature, uptime)

Beszel fits teams and individuals who want straightforward monitoring without the complexity of larger observability stacks. Its small footprint, container awareness, and built-in backups make it practical for self-managed environments.

19.6kstars
641forks
#8
VictoriaMetrics

VictoriaMetrics

Fast, resource-efficient time series database compatible with Prometheus and Grafana, for scalable monitoring and long-term metrics storage.

VictoriaMetrics screenshot

VictoriaMetrics is a high-performance time series database designed for monitoring and observability workloads. It can act as long-term storage for Prometheus and integrates well with common metrics ecosystems such as Grafana.

Key Features

  • Single-node and clustered deployment options
  • Prometheus-compatible ingestion (including remote write) and querying, with support for PromQL and MetricsQL
  • Multi-protocol ingestion support, including Graphite, InfluxDB line protocol, OpenTSDB, CSV, and JSON line formats
  • High ingestion throughput and efficient storage compression for large cardinality metrics
  • Stream aggregation for transforming and aggregating incoming metrics
  • Built-in features for operational safety such as relabeling and cardinality limiting

Use Cases

  • Cost-effective long-term storage backend for Prometheus metrics
  • Centralized metrics ingestion from many sources (Kubernetes, IoT, APM) with unified querying
  • High-volume telemetry storage and analytics where resource efficiency is critical

VictoriaMetrics is well-suited for teams that need a Prometheus-compatible TSDB with strong performance characteristics, flexible ingestion options, and scalable deployment models.

16.4kstars
1.6kforks
#9
WatchYourLAN

WatchYourLAN

Self-hosted lightweight LAN IP/ARP scanner with web dashboard, new-host notifications, online/offline history, and metrics export to Prometheus or InfluxDB for Grafana.

WatchYourLAN screenshot

WatchYourLAN is a lightweight network IP scanner that continuously discovers devices on your local network using ARP scanning and presents results in a web interface. It helps you keep an inventory of hosts, track availability over time, and get notified when new devices appear.

Key Features

  • Web GUI to view discovered hosts and their status
  • New-host detection with notifications via Shoutrrr-supported channels
  • Online/offline history tracking with configurable retention
  • Host list/inventory for the scanned interfaces
  • Optional metrics export via Prometheus endpoint and InfluxDB 2.x integration for Grafana dashboards
  • Supports SQLite by default, with optional PostgreSQL storage
  • Configurable via web UI, config file, or environment variables

Use Cases

  • Detect unknown devices joining a home or office LAN
  • Track device availability and troubleshoot intermittent connectivity
  • Feed LAN device presence metrics into Prometheus/InfluxDB and visualize in Grafana

Limitations and Considerations

  • No built-in authentication; access should be protected with a reverse proxy/SSO layer or network controls
  • Requires ARP scanning capabilities and typically needs host networking to scan the LAN effectively

WatchYourLAN is a practical choice for lightweight LAN discovery and basic intrusion awareness, with a simple UI and integrations for notifications and observability stacks. It fits well in homelabs and small networks where continuous device monitoring is needed.

6.8kstars
235forks
#10
OneUptime

OneUptime

Self-hostable observability platform for uptime monitoring, alerting, incident management, on-call, status pages, logs, and APM in one integrated suite.

OneUptime screenshot

OneUptime is a self-hostable, open-source platform for monitoring and managing online services. It combines uptime monitoring, alerting and on-call, incident workflows, and customer-facing status pages, alongside broader observability capabilities.

Key Features

  • Uptime and response-time monitoring for websites and APIs with alerting
  • On-call scheduling and escalation policies
  • Incident management workflows (creation, assignment, updates, postmortems)
  • Public status pages to communicate outages and maintenance
  • Logs management with search and analysis
  • Application performance monitoring (metrics/traces-focused observability)
  • Integrations and workflow automation with external tools

Use Cases

  • Monitor production services and notify responders when availability or latency degrades
  • Run a structured incident response process with on-call rotations and escalation
  • Keep customers informed during outages via a hosted or self-managed status page

OneUptime is designed to replace multiple point solutions with a single integrated platform, helping teams reduce operational toil and respond to downtime more effectively.

6.5kstars
323forks
#11
Zabbix

Zabbix

Zabbix is an open-source monitoring and observability platform for networks, servers, VMs, applications, and cloud infrastructure, with alerting and dashboards.

Zabbix screenshot

Zabbix is an enterprise-class, open-source distributed monitoring and observability solution for tracking performance and availability across IT and OT environments. It collects metrics from agents and agentless sources and provides centralized visibility, alerting, and reporting.

Key Features

  • Agent-based and agentless metric collection for servers, network devices, services, and applications
  • Automatic discovery and template-based monitoring for rapid onboarding
  • Real-time problem detection, correlation, and root-cause analysis workflows
  • Flexible alerting and notifications with multiple delivery channels and integrations
  • Dashboards and visualizations including graphs, maps, and topology views
  • Distributed monitoring for remote sites and large environments, including multi-tenant use
  • Built-in reporting, auditing, SLA calculations, and HTTP-based data streaming

Use Cases

  • Infrastructure monitoring for networks, servers, virtual machines, and container platforms
  • Application and service monitoring with proactive alerting and SLA tracking
  • Centralized observability for multi-site or managed service provider environments

Zabbix is a mature, scalable platform suited for organizations that need deep visibility across diverse systems with strong alerting and flexible data collection options. It can serve as a unified monitoring backbone for both small deployments and large, distributed environments.

5.7kstars
1.2kforks
#12
Pulse

Pulse

Real-time monitoring dashboard for Proxmox, Docker/Podman, and Kubernetes with smart alerts, agent auto-discovery, metrics history, and optional AI insights.

Pulse screenshot

Pulse is a unified monitoring platform that brings Proxmox (VE/PBS/PMG), Docker/Podman, and Kubernetes visibility into a single dashboard. It combines real-time health, historical metrics, and alerting, with optional AI-assisted insights for troubleshooting and root-cause analysis.

Key Features

  • Unified dashboard for nodes, VMs, containers, and Kubernetes workloads
  • Agent-based monitoring with platform auto-detection
  • Persistent metrics history with configurable retention
  • Smart alerting with webhook-based notifications and integrations
  • Proxmox-focused capabilities like backup visibility (PBS) and related infrastructure views
  • Optional AI assistant features for natural-language querying and alert/finding analysis
  • Security-oriented design including credential encryption at rest and scoped access
  • SSO support via OIDC for centralized authentication

Use Cases

  • Monitor a homelab or SMB stack running Proxmox plus Docker and/or Kubernetes
  • Consolidate multiple hosts/clusters into a “single pane of glass” dashboard
  • Reduce noisy alerting by correlating issues and investigating incidents faster

Pulse is well-suited for operators who want practical infrastructure monitoring without building a large, complex observability stack. Its unified agent and Proxmox-first focus make it particularly attractive for Proxmox-centric environments.

4.7kstars
195forks
#13
dashdot

dashdot

Dashdot is a modern server dashboard built with React and Node.js for real-time server monitoring on self-hosted systems.

dashdot screenshot

Dashdot is a modern server dashboard designed for smaller private servers. It provides a real-time overview of host metrics and system status via a polished glassy UI.

Key Features

  • Real-time system metrics including CPU, memory, disk, and network usage presented in a responsive dashboard
  • Web-based UI built with React and Node.js, designed for easy self-hosted deployment
  • Docker-based quick-install with multi-architecture images (AMD64 and ARM)
  • Lightweight, glassmorphism design with customizable widgets
  • Comprehensive installation and configuration options documented on the official site
  • Live demo available for exploration in the project’s official repository's demo

Use Cases

  • Monitoring small private servers and home labs
  • Observability of multiple VPS or private servers from a single dashboard
  • Quick on-boarding for admins needing at-a-glance status of disks, networks, memory, and CPU

Limitations and Considerations

  • The speed test feature can consume significant bandwidth; you can reduce impact by adjusting the speed test interval via an environment variable as described in the installation docs

Conclusion

Dashdot provides real-time server metrics through a modern, self-hosted dashboard. It can be deployed via Docker and explored via a live demo; official docs cover installation and configuration.

3.4kstars
121forks
#14
CheckCle

CheckCle

Self-hosted monitoring for servers, services, SSL, incidents and status pages; React frontend with a Go backend and embedded SQLite data store.

CheckCle screenshot

CheckCle is an open-source monitoring platform that provides real-time uptime, service and infrastructure monitoring, incident tracking, and alerts for full‑stack systems and applications. It combines a modern web UI with an embedded backend to offer status pages, distributed checks, and reports for operators and DevOps teams.

Key Features

  • Uptime and service monitoring for HTTP, DNS, Ping, and TCP-based services (API/SMTP/FTP etc.).
  • Distributed regional checks and incident history (UP/DOWN/WARNING/PAUSE) with maintenance scheduling.
  • SSL and domain monitoring (issuer, expiration date, days left, notifications).
  • Infrastructure server monitoring (agent-based metrics for CPU, RAM, disk, network) with one-line agent install; Windows support listed as beta.
  • Operational/public status pages, reports and analytics, and a web-based admin panel with user management, data retention and theme/language settings.
  • Built for containerized deployment: Docker Compose and single-container run commands provided for quick installs.

Use Cases

  • DevOps and SRE teams needing an on‑premises/open‑source solution for uptime and incident tracking across services and regions.
  • Small to medium organizations wanting self-hosted status pages and SSL/expiration monitoring without third‑party SaaS.
  • Community projects, labs, or training environments that require a simple deployable monitoring stack with a web UI.

Limitations and Considerations

  • CheckCle uses an embedded PocketBase/SQLite datastore for persistence; this design simplifies deployment but can limit horizontal scaling and very high‑volume telemetry scenarios — plan backups and host volume persistence accordingly.
  • Windows server/agent support is marked as beta in the documentation and may lack feature parity or production stability compared to Linux agents.
  • Built‑in notification integrations are focused on email, Telegram, Discord and Slack; larger ecosystems or enterprise integrations may require custom work.

In summary, CheckCle is a focused open‑source monitoring tool designed for teams that prefer a self‑hosted, container-friendly system with a React frontend and Go/embedded datastore backend. It is well suited for uptime, SSL, and server metric monitoring, with tradeoffs around embedded datastore scale and Windows agent maturity.

2.4kstars
156forks
#15
Emoncms

Emoncms

Open-source web app to collect, process, store, and visualize energy, temperature, and other environmental time-series data with dashboards, graphs, and an API.

Emoncms screenshot

Emoncms is an open-source web application for processing, logging, and visualizing energy, temperature, and other environmental sensor data. It is part of the OpenEnergyMonitor ecosystem and is commonly used to build local energy monitoring and reporting systems.

Key Features

  • Input processing pipeline to transform, scale, filter, and route incoming measurements into stored feeds
  • Time-series feed storage optimized for sensor data logging, including built-in PHP-based engines (e.g., PHPFina and PHPTimeSeries)
  • Dashboards and advanced graphing via modular components (dashboard and graph modules)
  • HTTP API for posting data and querying feeds for integration with external devices and systems
  • Optional Redis buffering and processing to reduce disk writes and support certain input processors
  • CSV export and tools for backups/imports depending on installed modules

Use Cases

  • Home and building energy monitoring (electricity, solar PV, heat, hot water)
  • Logging and visualization of temperature, humidity, and other environmental metrics
  • Creating shareable dashboards for energy and sustainability reporting

Limitations and Considerations

  • Some features and workflows depend on optional modules and background workers; deployments without Redis may have reduced functionality for certain processors
  • Official installation guidance and testing focus on Linux environments (notably Debian/Ubuntu and Raspberry Pi OS)

Emoncms is a practical choice when you need a customizable, self-managed platform to ingest sensor readings, store them as time series, and present them through dashboards and graphs. Its API- and module-driven design makes it suitable for both DIY monitoring setups and more integrated energy data systems.

1.3kstars
534forks
#16
Fitbit Fetch Script and InfluxDB Grafana Integration

Fitbit Fetch Script and InfluxDB Grafana Integration

Python service that pulls Fitbit health metrics via the Fitbit Web API, stores them in InfluxDB, and provides Grafana dashboards for long-term trend visualization.

A Python-based data collection service that retrieves personal health and activity metrics from the Fitbit Web API, writes them into a local InfluxDB time-series database, and visualizes the results in Grafana. It is designed for ongoing automatic syncing as well as historical backfilling to build long-term health trends.

Key Features

  • Automatic data collection from the Fitbit API with OAuth 2.0 token refresh
  • Stores metrics in InfluxDB for time-series analysis (best supported on InfluxDB 1.11)
  • Grafana dashboard support, including heatmaps and long-term trend panels
  • Collects a broad set of metrics such as heart rate (including intraday), steps, sleep, SpO2, HRV, breathing rate, activity minutes, and device battery
  • Historical backfilling mode designed to respect Fitbit rate limits and handle 429 responses
  • Docker Compose stack for running the fetcher, InfluxDB, and Grafana together

Use Cases

  • Personal health and fitness dashboard with long-term trends and daily summaries
  • Homelab time-series tracking of wearable metrics in InfluxDB with Grafana
  • Historical analysis by backfilling months/years of Fitbit data for reporting

Limitations and Considerations

  • Requires creating a Fitbit developer application and configuring OAuth tokens
  • InfluxDB 2.x support is described as limited and may produce a less detailed dashboard; InfluxDB 1.11 is strongly recommended
  • InfluxDB 3 OSS has query-time limitations that can make long-term visualization harder

It works well for users who want ownership of their Fitbit-derived metrics in their own database and prefer Grafana for visualization. The included schema and dashboards make it practical to deploy as a repeatable, automated pipeline.

828stars
66forks
#17
Nimtable

Nimtable

Lightweight web UI and REST control plane for exploring, inspecting, and managing Apache Iceberg catalogs and tables with Docker deployment and engine integrations.

Nimtable is a lightweight control plane and observability platform for Apache Iceberg lakehouses. It provides a browser-based console and REST API to browse catalog metadata, inspect table layouts, run ad-hoc metadata queries, and orchestrate maintenance tasks delegated to compute engines.

Key Features

  • Browser console to explore catalogs, schemas, tables, partitions, snapshots, and manifests
  • REST API and optional Iceberg REST Catalog endpoint for query engines
  • Run SQL from the browser for quick metadata inspection
  • Visualizations of file and snapshot distribution to surface optimization opportunities
  • Integrations to delegate compaction/maintenance to external engines (e.g., Spark, RisingWave)
  • Docker Compose deployment and PostgreSQL metadata storage by default

(Feature details and deployment guidance documented in the project README and RisingWave docs).

Use Cases

  • Inspect and troubleshoot Iceberg table metadata, snapshots, and file layout to find optimization targets
  • Operate and orchestrate compaction/maintenance jobs by delegating work to Spark, RisingWave, or other engines
  • Provide a standards-compliant Iceberg REST Catalog endpoint for query engines and interactive exploration

Limitations and Considerations

  • Fine-grained RBAC and advanced access-control features are listed as roadmap items and may be limited or absent in current releases
  • Caching, some monitoring/analytics features, and advanced scheduling/compaction strategies are planned but may not be production-complete

(Roadmap and known feature gaps are described in the repository documentation).

Nimtable is intended as a lightweight, developer-facing control plane to simplify catalog inspection and routine maintenance for Iceberg lakehouses. It is designed to be run alongside existing catalogs and compute engines and to provide a consolidated UI and REST API for metadata operations.

439stars
23forks
#18
Scraparr

Scraparr

Lightweight Prometheus exporter that exposes metrics from the *arr suite (Sonarr, Radarr, Lidarr, etc.) for monitoring and Grafana dashboards.

Scraparr is a Prometheus exporter that collects and exposes metrics from the *arr suite (Sonarr, Radarr, Lidarr and similar services). It provides a scrapeable HTTP metrics endpoint intended for integration with Prometheus and visualization with Grafana.

Key Features

  • Exposes detailed metrics for *arr services (requests, queue, backlog, import/scan status, per-series details when enabled)
  • Prometheus-compatible /metrics HTTP endpoint (default port 7100)
  • Configurable via config.yaml or environment variables; supports multiple service instances via config file aliases
  • Lightweight Python implementation with Docker and Docker Compose deployment options
  • Built for extensibility and community contributions; supports detailed per-series metrics when enabled
  • Suitable for integration into alerting and dashboarding stacks (Prometheus + Grafana)

Use Cases

  • Monitor health, API availability, and backlog of Sonarr/Radarr/Lidarr instances
  • Feed metrics into Prometheus for alerting on failed downloads, stalled queues, or connectivity issues
  • Provide a Grafana dashboard view of *arr performance and activity across multiple instances

Limitations and Considerations

  • Environment variables do not support configuring multiple instances; multiple services require the config.yaml with aliases to avoid metric name collisions
  • Requires proper API keys and reachable URLs for each *arr service; Docker variants may need host network adjustments for local service access
  • Community-maintained Helm and Unraid templates exist but may not be officially maintained by the project

Scraparr is a focused tool for exporting *arr application metrics to Prometheus. It is lightweight and configuration-driven, making it easy to add to existing monitoring stacks for visibility into media automation components.

372stars
15forks
#19
LogForge

LogForge

Self-hosted Docker monitoring: real-time logs, per-container terminals, rules-based alerts and safe auto-remediation for developer teams.

LogForge screenshot

LogForge is a developer-focused monitoring and alerting dashboard for Docker environments. It autodetects containers, streams live logs and provides UI-driven rules, notifications and safe remediation actions for containerised services.

Key Features

  • Automatic Docker service discovery and status (running, crashed, stopped)
  • Real-time log streaming and filtering per container
  • Interactive per-container terminal access and file system viewer
  • UI-driven Alert Engine with one-click rule templates and scoped rules
  • Safe auto-remediation (restart/stop/kill/start/run scripts) with cooldowns, backoff and verification delays
  • Multi-step actions and notification channels (Email, Slack, Discord, Telegram, Gotify and others)
  • Alert history, acknowledgement, duplicate-rule protection and noise controls (case sensitivity, AND/OR matches, ignore lists)
  • Test notifications, health/self-check endpoints and configurable container grouping
  • Docker Compose friendly deployment and minimal operational overhead

Use Cases

  • Local development and staging: tail container logs, open interactive shells, and diagnose crashes without SSH.
  • Small teams running Dockerized services: set up keyword- and event-based alerts to detect regressions and performance issues quickly.
  • Automated incident response: define safe, guardrailed remediation workflows to restart or run validated scripts when containers fail.

Limitations and Considerations

  • Core backend is source-available and interacts directly with the Docker socket; several non-core components (Alert Engine, Notifier and other tooling) are proprietary/restricted per the project's licensing notes.
  • Designed primarily for Docker-first workflows; integrations with large-scale observability stacks (e.g., Loki/ELK) may require additional tooling or customization.

LogForge provides a compact, self-hosted alternative to heavyweight observability stacks with an emphasis on developer workflows and safe automation. It is intended for teams that want quick visibility and guarded remediation for Docker container fleets.

285stars
16forks

Why choose an open source alternative?

  • Data ownership: Keep your data on your own servers
  • No vendor lock-in: Freedom to switch or modify at any time
  • Cost savings: Reduce or eliminate subscription fees
  • Transparency: Audit the code and know exactly what's running