Datadog

Best Self-hosted Alternatives to Datadog

A curated collection of the 12 best self hosted alternatives to Datadog.

Cloud monitoring and observability platform that collects metrics, logs, traces and security signals from infrastructure and applications. Provides dashboards, alerts, APM, log management, synthetic monitoring and analytics for incident response.

Alternatives List

#1
Netdata

Netdata

Open-source, agent-based monitoring platform delivering per-second metrics, edge ML anomaly detection, tiered time-series storage and centralized cloud UI.

Netdata screenshot

Netdata is an open-source, agent-based observability platform that collects, stores, and visualizes per-second metrics across infrastructure and applications. It combines a lightweight edge agent, a tiered time-series store, and optional centralized Cloud/Parent components for unified views and collaboration.

Key Features

  • Per-second, real-time metrics collection with millisecond responsiveness and auto-generated dashboards.
  • Edge-based machine learning: unsupervised anomaly detection and per-metric ML models running on the agent.
  • Tiered, high-efficiency time-series storage (compact samples, ZSTD compression) with configurable retention and archiving.
  • Distributed Parent–Child streaming pipeline for horizontal scaling, multi-node aggregation, and long-term retention.
  • Broad integrations (800+ collectors) and export/archival targets including Prometheus, InfluxDB, OpenTSDB, and Graphite.
  • Low resource footprint (designed for minimal CPU/RAM impact) and zero-configuration auto-discovery on supported platforms.

Use Cases

  • Infrastructure and system monitoring: per-second visibility into CPU, memory, disks, network, sensors, and kernel metrics.
  • Container and Kubernetes observability: native containerd/Docker and Kubernetes integrations for pod, node, and cluster troubleshooting.
  • Incident troubleshooting and AIOps: anomaly detection, root-cause analysis, blast-radius identification, and automated reporting to accelerate incident resolution.

Limitations and Considerations

  • The Netdata UI and Netdata Cloud components are delivered as closed-source offerings while the Agent is open-source; organizations requiring fully open-source stacks should evaluate this split.
  • OpenTelemetry support is noted as "coming soon" in documentation; users relying heavily on OpenTelemetry may need to plan integrations or use exporters.
  • Feature parity varies by platform (Linux has the most comprehensive coverage); some platform-specific collectors or deep kernel metrics are not available everywhere.

Netdata offers a high-resolution, low-overhead approach to full-stack monitoring with built-in ML and flexible scaling via Parents and Netdata Cloud. It is well-suited for teams needing real-time troubleshooting, container/Kubernetes visibility, and efficient time-series retention while weighing the tradeoffs of closed-source UI/cloud components.

77.9kstars
6.4kforks
#2
Grafana

Grafana

Grafana is an open source observability and data visualization platform for querying, graphing, and alerting on metrics, logs, and traces across many data sources.

Grafana screenshot

Grafana is an open source observability and data visualization platform for querying, visualizing, and alerting on metrics, logs, and traces across many backends. It provides interactive dashboards and exploration workflows so teams can monitor systems and troubleshoot issues from a single interface.

Key Features

  • Dashboards with flexible visualizations and templating for reusable views
  • Explore workflows for ad-hoc querying and drilldowns across time ranges and data sources
  • Unified alerting with rule evaluation and multi-channel notifications
  • Pluggable data source and panel ecosystem to integrate with many metrics, log, and trace systems
  • Sharing and collaboration features for teams (dashboards, annotations, and permissions)

Use Cases

  • Infrastructure and Kubernetes monitoring using time-series backends
  • Centralized log exploration and correlation with metrics for incident response
  • Application observability by visualizing traces and service performance trends

Limitations and Considerations

  • The experience and capabilities depend heavily on the chosen data sources and plugins
  • Operating at very large scale can require careful tuning of storage backends and dashboard/query design

Grafana is well-suited for organizations that want a single “pane of glass” across diverse telemetry sources. Its extensible plugin model and alerting make it a common foundation for observability stacks in both homelabs and enterprise environments.

72.4kstars
13.5kforks
#3
Prometheus

Prometheus

Prometheus is an open-source monitoring and time-series database for collecting metrics, querying with PromQL, and alerting on system and application health.

Prometheus screenshot

Prometheus is an open-source systems and service monitoring platform built around a time-series database. It collects metrics from instrumented targets, lets you query them with PromQL, and supports alerting based on rules.

Key Features

  • Multi-dimensional time series data model using labels for flexible filtering and aggregation
  • PromQL query language for ad-hoc analysis, dashboards, and alert conditions
  • Pull-based metric scraping over HTTP with support for static configs and service discovery
  • Alert rule evaluation with alert generation (commonly paired with Alertmanager)
  • Federation support for hierarchical and cross-environment aggregation
  • Remote write/read integrations for long-term storage and interoperability

Use Cases

  • Monitoring Kubernetes clusters and cloud-native services via dynamic service discovery
  • Application and infrastructure telemetry for SRE/DevOps dashboards and alerting
  • Central metrics collection for microservices, batch jobs (via push gateway patterns), and exporters

Limitations and Considerations

  • Built-in storage is optimized for a single-node TSDB; long-term retention and global scale typically require external remote storage integrations

Prometheus is a strong fit when you want a reliable, standards-based metrics platform with powerful querying and a broad ecosystem of exporters and integrations. It is widely used for cloud-native monitoring and alert-driven operations.

62.9kstars
10.2kforks
#4
Grafana Loki

Grafana Loki

Grafana Loki is a Prometheus-inspired log aggregation system that indexes labels (not log contents) for cost-effective storage and fast querying, with Grafana integration.

Grafana Loki screenshot

Grafana Loki is a horizontally scalable, highly available log aggregation system inspired by Prometheus. It stores logs efficiently by indexing only metadata labels for each log stream, rather than performing full-text indexing.

Key Features

  • Label-based log indexing and querying aligned with Prometheus-style labels
  • Horizontally scalable architectures (single binary or microservices) with multi-tenancy support
  • Cost-efficient storage by keeping logs compressed and indexing only metadata
  • Native integration with Grafana for exploration, dashboards, and correlation with metrics
  • Multiple ingestion options via agents and clients (including Grafana Alloy and legacy Promtail)

Use Cases

  • Centralized aggregation of Kubernetes and container logs with label-based filtering
  • Incident investigation by correlating metrics and logs using shared labels
  • Multi-team or multi-environment log collection with tenant isolation

Limitations and Considerations

  • Not designed for full-text indexing; queries are primarily optimized around labels and structured metadata

Loki is a strong fit when you want an operationally simpler, Prometheus-like approach to logs with efficient storage and fast label-based queries. It is commonly deployed as part of a Grafana-centric observability stack for monitoring and troubleshooting.

27.7kstars
3.9kforks
#5
SigNoz

SigNoz

SigNoz is an open-source platform that collects and correlates logs, metrics, and traces using OpenTelemetry for unified observability.

SigNoz screenshot

SigNoz is an open-source observability platform designed to collect, store, and visualize logs, metrics, and traces in a single interface. Built on OpenTelemetry, SigNoz enables correlated signals and unified dashboards, with ClickHouse serving as the log datastore.

Key Features

  • Unified observability across logs, metrics, and traces
  • OpenTelemetry-native ingestion with semantic conventions
  • ClickHouse-backed log storage for fast queries
  • DIY query builder, PromQL support, and flexible dashboards
  • Alerts across signals with anomaly detection capabilities
  • Tracing visuals including flamegraphs and detailed span views

Use Cases

  • Instrumenting applications with OpenTelemetry to achieve end-to-end visibility across services
  • Correlating logs, metrics, and traces to troubleshoot microservices and distributed systems
  • Providing centralized observability for cloud-native environments with unified dashboards

Conclusion: SigNoz offers a single, OpenTelemetry-native platform to observe modern applications through correlated signals, scalable storage, and flexible visualization and alerting capabilities. It emphasizes openness, data correlation, and end-to-end debugging across logs, metrics, and traces.

25.9kstars
2kforks
#6
VictoriaMetrics

VictoriaMetrics

Fast, resource-efficient time series database compatible with Prometheus and Grafana, for scalable monitoring and long-term metrics storage.

VictoriaMetrics screenshot

VictoriaMetrics is a high-performance time series database designed for monitoring and observability workloads. It can act as long-term storage for Prometheus and integrates well with common metrics ecosystems such as Grafana.

Key Features

  • Single-node and clustered deployment options
  • Prometheus-compatible ingestion (including remote write) and querying, with support for PromQL and MetricsQL
  • Multi-protocol ingestion support, including Graphite, InfluxDB line protocol, OpenTSDB, CSV, and JSON line formats
  • High ingestion throughput and efficient storage compression for large cardinality metrics
  • Stream aggregation for transforming and aggregating incoming metrics
  • Built-in features for operational safety such as relabeling and cardinality limiting

Use Cases

  • Cost-effective long-term storage backend for Prometheus metrics
  • Centralized metrics ingestion from many sources (Kubernetes, IoT, APM) with unified querying
  • High-volume telemetry storage and analytics where resource efficiency is critical

VictoriaMetrics is well-suited for teams that need a Prometheus-compatible TSDB with strong performance characteristics, flexible ingestion options, and scalable deployment models.

16.4kstars
1.6kforks
#7
Pulse

Pulse

Real-time monitoring dashboard for Proxmox, Docker/Podman, and Kubernetes with smart alerts, agent auto-discovery, metrics history, and optional AI insights.

Pulse screenshot

Pulse is a unified monitoring platform that brings Proxmox (VE/PBS/PMG), Docker/Podman, and Kubernetes visibility into a single dashboard. It combines real-time health, historical metrics, and alerting, with optional AI-assisted insights for troubleshooting and root-cause analysis.

Key Features

  • Unified dashboard for nodes, VMs, containers, and Kubernetes workloads
  • Agent-based monitoring with platform auto-detection
  • Persistent metrics history with configurable retention
  • Smart alerting with webhook-based notifications and integrations
  • Proxmox-focused capabilities like backup visibility (PBS) and related infrastructure views
  • Optional AI assistant features for natural-language querying and alert/finding analysis
  • Security-oriented design including credential encryption at rest and scoped access
  • SSO support via OIDC for centralized authentication

Use Cases

  • Monitor a homelab or SMB stack running Proxmox plus Docker and/or Kubernetes
  • Consolidate multiple hosts/clusters into a “single pane of glass” dashboard
  • Reduce noisy alerting by correlating issues and investigating incidents faster

Pulse is well-suited for operators who want practical infrastructure monitoring without building a large, complex observability stack. Its unified agent and Proxmox-first focus make it particularly attractive for Proxmox-centric environments.

4.7kstars
195forks
#8
Parseable

Parseable

Parseable ingests, analyzes, and extracts insights from MELT telemetry data with predictive analytics and a unified SQL/NL querying interface.

Parseable screenshot

Parseable is a full-stack observability platform built to ingest, analyze and extract insights from all types of telemetry (MELT) data. It can run locally, in the cloud, or as a managed service, providing a unified way to explore signals across the stack.

Key Features

  • Unified signals across MELT data for a single source of truth
  • Predictive analytics and anomaly forecasting to anticipate issues
  • Natural language and SQL querying across telemetry
  • Hybrid execution engine with columnar storage and indexing for fast queries
  • Granular access control and federated IAM
  • Open standards and vendor-neutral design (OTel, Parquet compatibility)
  • Cloud-ready with BYOC options

Use Cases

  • Full-stack observability of applications, databases, infrastructure and networks
  • AI workloads observability for telemetry from AI models and LLMs
  • Product observability to analyze user behavior, feature adoption, and performance

Conclusion Parseable provides predictive observability with a unified data model, enabling faster insights and proactive incident response across the full telemetry stack.

2.3kstars
159forks
#9
Kubetail

Kubetail

Kubetail is a real-time Kubernetes logging dashboard and CLI that merges multi-container workload logs into a single timeline, running on desktop or inside your cluster.

Kubetail screenshot

Kubetail is a real-time logging dashboard for Kubernetes, optimized for tailing logs across multi-container workloads. It merges container logs into a single chronological timeline and can be used from a web UI or directly in the terminal.

Key Features

  • Merge logs from all containers in a workload (e.g., Deployments, DaemonSets, StatefulSets, CronJobs) into one unified timeline
  • Real-time streaming in a browser dashboard or via a CLI output mode
  • Filtering by workload, absolute/relative time range, node properties, and grep-style searching
  • Tracks container lifecycle changes to keep the log stream consistent as pods/containers are replaced
  • Uses the Kubernetes API to fetch logs directly (no requirement to forward logs to an external service)
  • Can run locally on a desktop or be installed into a cluster
  • Desktop mode supports switching between multiple clusters

Use Cases

  • Debugging production incidents by tailing logs across multiple pods and containers in real time
  • Following request flows across ephemeral containers during rollouts or autoscaling events
  • Day-to-day Kubernetes workload troubleshooting without setting up a full log shipping pipeline

Limitations and Considerations

  • Primarily focused on real-time tailing; historic log retention and advanced analytics depend on additional components and are still evolving

Kubetail provides a practical, privacy-friendly way to explore Kubernetes logs in real time using a polished dashboard and CLI. It is well-suited for teams that want immediate visibility into workload logs without introducing a separate logging backend.

1.6kstars
111forks
#10
Nimtable

Nimtable

Lightweight web UI and REST control plane for exploring, inspecting, and managing Apache Iceberg catalogs and tables with Docker deployment and engine integrations.

Nimtable is a lightweight control plane and observability platform for Apache Iceberg lakehouses. It provides a browser-based console and REST API to browse catalog metadata, inspect table layouts, run ad-hoc metadata queries, and orchestrate maintenance tasks delegated to compute engines.

Key Features

  • Browser console to explore catalogs, schemas, tables, partitions, snapshots, and manifests
  • REST API and optional Iceberg REST Catalog endpoint for query engines
  • Run SQL from the browser for quick metadata inspection
  • Visualizations of file and snapshot distribution to surface optimization opportunities
  • Integrations to delegate compaction/maintenance to external engines (e.g., Spark, RisingWave)
  • Docker Compose deployment and PostgreSQL metadata storage by default

(Feature details and deployment guidance documented in the project README and RisingWave docs).

Use Cases

  • Inspect and troubleshoot Iceberg table metadata, snapshots, and file layout to find optimization targets
  • Operate and orchestrate compaction/maintenance jobs by delegating work to Spark, RisingWave, or other engines
  • Provide a standards-compliant Iceberg REST Catalog endpoint for query engines and interactive exploration

Limitations and Considerations

  • Fine-grained RBAC and advanced access-control features are listed as roadmap items and may be limited or absent in current releases
  • Caching, some monitoring/analytics features, and advanced scheduling/compaction strategies are planned but may not be production-complete

(Roadmap and known feature gaps are described in the repository documentation).

Nimtable is intended as a lightweight, developer-facing control plane to simplify catalog inspection and routine maintenance for Iceberg lakehouses. It is designed to be run alongside existing catalogs and compute engines and to provide a consolidated UI and REST API for metadata operations.

439stars
23forks
#11
Scraparr

Scraparr

Lightweight Prometheus exporter that exposes metrics from the *arr suite (Sonarr, Radarr, Lidarr, etc.) for monitoring and Grafana dashboards.

Scraparr is a Prometheus exporter that collects and exposes metrics from the *arr suite (Sonarr, Radarr, Lidarr and similar services). It provides a scrapeable HTTP metrics endpoint intended for integration with Prometheus and visualization with Grafana.

Key Features

  • Exposes detailed metrics for *arr services (requests, queue, backlog, import/scan status, per-series details when enabled)
  • Prometheus-compatible /metrics HTTP endpoint (default port 7100)
  • Configurable via config.yaml or environment variables; supports multiple service instances via config file aliases
  • Lightweight Python implementation with Docker and Docker Compose deployment options
  • Built for extensibility and community contributions; supports detailed per-series metrics when enabled
  • Suitable for integration into alerting and dashboarding stacks (Prometheus + Grafana)

Use Cases

  • Monitor health, API availability, and backlog of Sonarr/Radarr/Lidarr instances
  • Feed metrics into Prometheus for alerting on failed downloads, stalled queues, or connectivity issues
  • Provide a Grafana dashboard view of *arr performance and activity across multiple instances

Limitations and Considerations

  • Environment variables do not support configuring multiple instances; multiple services require the config.yaml with aliases to avoid metric name collisions
  • Requires proper API keys and reachable URLs for each *arr service; Docker variants may need host network adjustments for local service access
  • Community-maintained Helm and Unraid templates exist but may not be officially maintained by the project

Scraparr is a focused tool for exporting *arr application metrics to Prometheus. It is lightweight and configuration-driven, making it easy to add to existing monitoring stacks for visibility into media automation components.

372stars
15forks
#12
GlitchTip

GlitchTip

Open-source error tracking, performance monitoring and uptime checks compatible with Sentry SDKs; available self-hosted or as a hosted SaaS.

GlitchTip screenshot

GlitchTip is an open-source error tracking and observability platform that implements a Sentry-compatible intake API. It provides error aggregation, basic APM-style transaction visibility, and uptime monitoring via a Django backend paired with an Angular frontend.

Key Features

  • Sentry-compatible event intake allowing existing Sentry client SDKs to report errors and transactions.
  • Error aggregation and issue grouping with searchable issue lists and event details.
  • Application performance monitoring that surfaces slow requests, database calls, and transaction traces.
  • Uptime monitoring (ping-style checks) with alerts delivered via email or webhooks.
  • Deployable with Docker and Docker Compose, Kubernetes Helm chart available for cluster installs.
  • Backend built on Django with worker tasks via Celery; PostgreSQL is the primary data store.
  • Optional cache/message broker usage of Valkey/Redis for improved performance and Celery brokering.
  • Hosted SaaS offering available alongside comprehensive self-hosting docs and Docker images.

Use Cases

  • Centralize and triage runtime exceptions and stack traces from web and mobile apps using existing Sentry SDKs.
  • Monitor web application latency and identify slow endpoints and database calls for performance troubleshooting.
  • Keep track of site uptime with scheduled pings and receive alerts when endpoints fail to respond.

Limitations and Considerations

  • Some enterprise SSO workflows (notably SAML multi-tenant SSO) are a known area of ongoing discussion and work; available social/OAuth providers are supported via django-allauth but full SAML multi-tenant support is not yet standard.
  • For larger deployments, Valkey/Redis is recommended for Celery brokering, caching, and sessions; Postgres-only mode is experimental and may yield lower performance.
  • Feature parity with commercial Sentry varies; a few advanced grouping, fingerprinting and analytics features are under active development or improvement.

GlitchTip is suited for teams that need a budget-friendly, open-source alternative for error tracking and basic observability while retaining compatibility with Sentry client tooling. It supports both small single-server installs and larger containerized deployments with documented configuration and upgrade paths.

Why choose an open source alternative?

  • Data ownership: Keep your data on your own servers
  • No vendor lock-in: Freedom to switch or modify at any time
  • Cost savings: Reduce or eliminate subscription fees
  • Transparency: Audit the code and know exactly what's running