Datadog APM

Best Self-hosted Alternatives to Datadog APM

A curated collection of the 11 best self hosted alternatives to Datadog APM.

Datadog APM provides distributed tracing and application performance monitoring, collecting traces, service metrics, latency and error data, visualizing request flows, and enabling root-cause analysis across microservices and underlying infrastructure.

Alternatives List

#1
Grafana

Grafana

Grafana is an open source observability and data visualization platform for querying, graphing, and alerting on metrics, logs, and traces across many data sources.

Grafana screenshot

Grafana is an open source observability and data visualization platform for querying, visualizing, and alerting on metrics, logs, and traces across many backends. It provides interactive dashboards and exploration workflows so teams can monitor systems and troubleshoot issues from a single interface.

Key Features

  • Dashboards with flexible visualizations and templating for reusable views
  • Explore workflows for ad-hoc querying and drilldowns across time ranges and data sources
  • Unified alerting with rule evaluation and multi-channel notifications
  • Pluggable data source and panel ecosystem to integrate with many metrics, log, and trace systems
  • Sharing and collaboration features for teams (dashboards, annotations, and permissions)

Use Cases

  • Infrastructure and Kubernetes monitoring using time-series backends
  • Centralized log exploration and correlation with metrics for incident response
  • Application observability by visualizing traces and service performance trends

Limitations and Considerations

  • The experience and capabilities depend heavily on the chosen data sources and plugins
  • Operating at very large scale can require careful tuning of storage backends and dashboard/query design

Grafana is well-suited for organizations that want a single “pane of glass” across diverse telemetry sources. Its extensible plugin model and alerting make it a common foundation for observability stacks in both homelabs and enterprise environments.

72.4kstars
13.5kforks
#2
Prometheus

Prometheus

Prometheus is an open-source monitoring and time-series database for collecting metrics, querying with PromQL, and alerting on system and application health.

Prometheus screenshot

Prometheus is an open-source systems and service monitoring platform built around a time-series database. It collects metrics from instrumented targets, lets you query them with PromQL, and supports alerting based on rules.

Key Features

  • Multi-dimensional time series data model using labels for flexible filtering and aggregation
  • PromQL query language for ad-hoc analysis, dashboards, and alert conditions
  • Pull-based metric scraping over HTTP with support for static configs and service discovery
  • Alert rule evaluation with alert generation (commonly paired with Alertmanager)
  • Federation support for hierarchical and cross-environment aggregation
  • Remote write/read integrations for long-term storage and interoperability

Use Cases

  • Monitoring Kubernetes clusters and cloud-native services via dynamic service discovery
  • Application and infrastructure telemetry for SRE/DevOps dashboards and alerting
  • Central metrics collection for microservices, batch jobs (via push gateway patterns), and exporters

Limitations and Considerations

  • Built-in storage is optimized for a single-node TSDB; long-term retention and global scale typically require external remote storage integrations

Prometheus is a strong fit when you want a reliable, standards-based metrics platform with powerful querying and a broad ecosystem of exporters and integrations. It is widely used for cloud-native monitoring and alert-driven operations.

62.9kstars
10.2kforks
#3
Sentry

Sentry

Sentry is a developer-focused platform for error tracking, performance monitoring, and tracing to help teams detect, investigate, and fix issues faster.

Sentry screenshot

Sentry is a debugging platform that helps developers detect, trace, and fix application issues by connecting errors with performance and runtime context. It supports many SDKs and integrates with common development workflows to speed up investigation and resolution.

Key Features

  • Error and exception aggregation with stack traces and release context
  • Application Performance Monitoring (APM) with distributed tracing and transaction breakdowns
  • Alerting and issue triage tools to prioritize impactful problems
  • Source code and deployment context support (for example commits and releases)
  • Broad SDK ecosystem across languages and frameworks for capturing events and traces

Use Cases

  • Monitor production applications for crashes and regressions after releases
  • Investigate latency and bottlenecks using traces and transaction performance data
  • Centralize error reporting across multi-service, multi-language environments

Limitations and Considerations

  • Full-feature deployments typically require multiple components and supporting services, increasing operational complexity

Sentry is well-suited for teams that want a single platform to correlate errors, traces, and performance signals. It provides actionable context to reduce time-to-diagnosis and improve application reliability.

43.2kstars
4.6kforks
#4
SigNoz

SigNoz

SigNoz is an open-source platform that collects and correlates logs, metrics, and traces using OpenTelemetry for unified observability.

SigNoz screenshot

SigNoz is an open-source observability platform designed to collect, store, and visualize logs, metrics, and traces in a single interface. Built on OpenTelemetry, SigNoz enables correlated signals and unified dashboards, with ClickHouse serving as the log datastore.

Key Features

  • Unified observability across logs, metrics, and traces
  • OpenTelemetry-native ingestion with semantic conventions
  • ClickHouse-backed log storage for fast queries
  • DIY query builder, PromQL support, and flexible dashboards
  • Alerts across signals with anomaly detection capabilities
  • Tracing visuals including flamegraphs and detailed span views

Use Cases

  • Instrumenting applications with OpenTelemetry to achieve end-to-end visibility across services
  • Correlating logs, metrics, and traces to troubleshoot microservices and distributed systems
  • Providing centralized observability for cloud-native environments with unified dashboards

Conclusion: SigNoz offers a single, OpenTelemetry-native platform to observe modern applications through correlated signals, scalable storage, and flexible visualization and alerting capabilities. It emphasizes openness, data correlation, and end-to-end debugging across logs, metrics, and traces.

25.9kstars
2kforks
#5
OneUptime

OneUptime

Self-hostable observability platform for uptime monitoring, alerting, incident management, on-call, status pages, logs, and APM in one integrated suite.

OneUptime screenshot

OneUptime is a self-hostable, open-source platform for monitoring and managing online services. It combines uptime monitoring, alerting and on-call, incident workflows, and customer-facing status pages, alongside broader observability capabilities.

Key Features

  • Uptime and response-time monitoring for websites and APIs with alerting
  • On-call scheduling and escalation policies
  • Incident management workflows (creation, assignment, updates, postmortems)
  • Public status pages to communicate outages and maintenance
  • Logs management with search and analysis
  • Application performance monitoring (metrics/traces-focused observability)
  • Integrations and workflow automation with external tools

Use Cases

  • Monitor production services and notify responders when availability or latency degrades
  • Run a structured incident response process with on-call rotations and escalation
  • Keep customers informed during outages via a hosted or self-managed status page

OneUptime is designed to replace multiple point solutions with a single integrated platform, helping teams reduce operational toil and respond to downtime more effectively.

6.5kstars
323forks
#6
Parseable

Parseable

Parseable ingests, analyzes, and extracts insights from MELT telemetry data with predictive analytics and a unified SQL/NL querying interface.

Parseable screenshot

Parseable is a full-stack observability platform built to ingest, analyze and extract insights from all types of telemetry (MELT) data. It can run locally, in the cloud, or as a managed service, providing a unified way to explore signals across the stack.

Key Features

  • Unified signals across MELT data for a single source of truth
  • Predictive analytics and anomaly forecasting to anticipate issues
  • Natural language and SQL querying across telemetry
  • Hybrid execution engine with columnar storage and indexing for fast queries
  • Granular access control and federated IAM
  • Open standards and vendor-neutral design (OTel, Parquet compatibility)
  • Cloud-ready with BYOC options

Use Cases

  • Full-stack observability of applications, databases, infrastructure and networks
  • AI workloads observability for telemetry from AI models and LLMs
  • Product observability to analyze user behavior, feature adoption, and performance

Conclusion Parseable provides predictive observability with a unified data model, enabling faster insights and proactive incident response across the full telemetry stack.

2.3kstars
159forks
#7
Swetrix

Swetrix

Privacy-first, cookie-less open-source web analytics with session analysis, real-user performance monitoring, error tracking and feature flags. Self-hostable or available as managed cloud.

Swetrix screenshot

Swetrix is an open-source, privacy-focused web analytics platform that collects anonymised, cookie-less metrics about website traffic, sessions, performance and client-side errors. It provides both a self-hostable Community Edition and a managed cloud offering with additional features.

Key Features

  • Cookie-less, privacy-first tracking that collects anonymised pageviews, events and session data without cross-device identifiers
  • Core analytics: top pages, traffic sources, UTM campaigns, geolocation and device/browser breakdowns
  • Session analytics and user flows to visualise journeys and pageview sequences
  • Funnels, goals and custom events for conversion tracking and behaviour analysis
  • Real-user performance monitoring (TTFB, DNS, TLS, render and other frontend timing metrics)
  • Client-side error tracking with aggregation by page, browser, device and geolocation
  • Feature flags and rollout controls to target segments and measure feature impact
  • Experiments / A/B testing (managed cloud) with exposure tracking and statistical comparisons
  • Revenue analytics integrations (Stripe, Paddle) and CSV/API data export for portability
  • Lightweight TypeScript tracking script and real-time React dashboard built for low overhead
  • Deployable via Docker with a backend API, MySQL for core data, ClickHouse for analytics storage and Redis for caching

Use Cases

  • Privacy-compliant website analytics for small businesses, blogs and SaaS sites that want to avoid cookie banners
  • Monitoring frontend performance and client-side errors to detect regressions and improve page speed
  • Running feature flags and A/B experiments (cloud) to optimize conversions and measure feature impact

Limitations and Considerations

  • The Community Edition (self-hosted) provides core analytics, sessions, funnels, performance and error tracking but lacks some managed-cloud features (experiments, revenue analytics, AI insights, and built-in alert/email reports)
  • GeoIP accuracy in self-hosted deployments depends on the chosen GeoIP database and may be less precise than the managed cloud's premium DB
  • Scaling analytics requires appropriate ClickHouse and infrastructure configuration; self-hosters must manage upgrades, backups and operational costs

Swetrix bundles core web analytics, performance monitoring and error tracking in a privacy-first package suitable for self-hosting or using a managed cloud. It focuses on essential, low-footprint analytics while offering expanded features in its cloud offering.

889stars
48forks
#8
Traefik Log Dashboard

Traefik Log Dashboard

Real-time dashboard to analyze Traefik logs with GeoIP, status code breakdowns, filters, and multi-agent metrics via a Go agent and web UI.

Traefik Log Dashboard screenshot

Traefik Log Dashboard is a real-time analytics platform for Traefik reverse proxy access and error logs. It combines a lightweight agent that parses logs and exposes metrics with a web dashboard that visualizes traffic, status codes, and geographic origin of requests.

Key Features

  • Multi-agent architecture to monitor multiple Traefik instances from one dashboard
  • Real-time log parsing with position tracking for efficient tailing
  • Automatic GeoIP enrichment for IP geolocation out of the box
  • Status code and service-level metrics to spot errors and hot paths
  • Advanced filtering (include/exclude), including geographic and custom filters
  • Background alerting support via Discord webhooks and summary/threshold alerts
  • Optional terminal-based dashboard (CLI)

Use Cases

  • Troubleshoot Traefik routing issues by inspecting recent access and error logs
  • Monitor reverse proxy traffic patterns, error rates, and service utilization
  • Identify suspicious or unexpected traffic sources using geographic insights

Limitations and Considerations

  • Some features (such as alerting integrations) may require additional external services (for example Discord webhooks)
  • GeoIP accuracy depends on the bundled GeoIP dataset and may not be perfect

Traefik Log Dashboard is well-suited for operators who want a focused, Traefik-specific view of proxy activity without adopting a full log aggregation stack. Its agent-plus-dashboard design keeps log ingestion lightweight while still enabling rich, near real-time visibility.

734stars
21forks
#9
Scraparr

Scraparr

Lightweight Prometheus exporter that exposes metrics from the *arr suite (Sonarr, Radarr, Lidarr, etc.) for monitoring and Grafana dashboards.

Scraparr is a Prometheus exporter that collects and exposes metrics from the *arr suite (Sonarr, Radarr, Lidarr and similar services). It provides a scrapeable HTTP metrics endpoint intended for integration with Prometheus and visualization with Grafana.

Key Features

  • Exposes detailed metrics for *arr services (requests, queue, backlog, import/scan status, per-series details when enabled)
  • Prometheus-compatible /metrics HTTP endpoint (default port 7100)
  • Configurable via config.yaml or environment variables; supports multiple service instances via config file aliases
  • Lightweight Python implementation with Docker and Docker Compose deployment options
  • Built for extensibility and community contributions; supports detailed per-series metrics when enabled
  • Suitable for integration into alerting and dashboarding stacks (Prometheus + Grafana)

Use Cases

  • Monitor health, API availability, and backlog of Sonarr/Radarr/Lidarr instances
  • Feed metrics into Prometheus for alerting on failed downloads, stalled queues, or connectivity issues
  • Provide a Grafana dashboard view of *arr performance and activity across multiple instances

Limitations and Considerations

  • Environment variables do not support configuring multiple instances; multiple services require the config.yaml with aliases to avoid metric name collisions
  • Requires proper API keys and reachable URLs for each *arr service; Docker variants may need host network adjustments for local service access
  • Community-maintained Helm and Unraid templates exist but may not be officially maintained by the project

Scraparr is a focused tool for exporting *arr application metrics to Prometheus. It is lightweight and configuration-driven, making it easy to add to existing monitoring stacks for visibility into media automation components.

372stars
15forks
#10
LogForge

LogForge

Self-hosted Docker monitoring: real-time logs, per-container terminals, rules-based alerts and safe auto-remediation for developer teams.

LogForge screenshot

LogForge is a developer-focused monitoring and alerting dashboard for Docker environments. It autodetects containers, streams live logs and provides UI-driven rules, notifications and safe remediation actions for containerised services.

Key Features

  • Automatic Docker service discovery and status (running, crashed, stopped)
  • Real-time log streaming and filtering per container
  • Interactive per-container terminal access and file system viewer
  • UI-driven Alert Engine with one-click rule templates and scoped rules
  • Safe auto-remediation (restart/stop/kill/start/run scripts) with cooldowns, backoff and verification delays
  • Multi-step actions and notification channels (Email, Slack, Discord, Telegram, Gotify and others)
  • Alert history, acknowledgement, duplicate-rule protection and noise controls (case sensitivity, AND/OR matches, ignore lists)
  • Test notifications, health/self-check endpoints and configurable container grouping
  • Docker Compose friendly deployment and minimal operational overhead

Use Cases

  • Local development and staging: tail container logs, open interactive shells, and diagnose crashes without SSH.
  • Small teams running Dockerized services: set up keyword- and event-based alerts to detect regressions and performance issues quickly.
  • Automated incident response: define safe, guardrailed remediation workflows to restart or run validated scripts when containers fail.

Limitations and Considerations

  • Core backend is source-available and interacts directly with the Docker socket; several non-core components (Alert Engine, Notifier and other tooling) are proprietary/restricted per the project's licensing notes.
  • Designed primarily for Docker-first workflows; integrations with large-scale observability stacks (e.g., Loki/ELK) may require additional tooling or customization.

LogForge provides a compact, self-hosted alternative to heavyweight observability stacks with an emphasis on developer workflows and safe automation. It is intended for teams that want quick visibility and guarded remediation for Docker container fleets.

285stars
16forks
#11
GlitchTip

GlitchTip

Open-source error tracking, performance monitoring and uptime checks compatible with Sentry SDKs; available self-hosted or as a hosted SaaS.

GlitchTip screenshot

GlitchTip is an open-source error tracking and observability platform that implements a Sentry-compatible intake API. It provides error aggregation, basic APM-style transaction visibility, and uptime monitoring via a Django backend paired with an Angular frontend.

Key Features

  • Sentry-compatible event intake allowing existing Sentry client SDKs to report errors and transactions.
  • Error aggregation and issue grouping with searchable issue lists and event details.
  • Application performance monitoring that surfaces slow requests, database calls, and transaction traces.
  • Uptime monitoring (ping-style checks) with alerts delivered via email or webhooks.
  • Deployable with Docker and Docker Compose, Kubernetes Helm chart available for cluster installs.
  • Backend built on Django with worker tasks via Celery; PostgreSQL is the primary data store.
  • Optional cache/message broker usage of Valkey/Redis for improved performance and Celery brokering.
  • Hosted SaaS offering available alongside comprehensive self-hosting docs and Docker images.

Use Cases

  • Centralize and triage runtime exceptions and stack traces from web and mobile apps using existing Sentry SDKs.
  • Monitor web application latency and identify slow endpoints and database calls for performance troubleshooting.
  • Keep track of site uptime with scheduled pings and receive alerts when endpoints fail to respond.

Limitations and Considerations

  • Some enterprise SSO workflows (notably SAML multi-tenant SSO) are a known area of ongoing discussion and work; available social/OAuth providers are supported via django-allauth but full SAML multi-tenant support is not yet standard.
  • For larger deployments, Valkey/Redis is recommended for Celery brokering, caching, and sessions; Postgres-only mode is experimental and may yield lower performance.
  • Feature parity with commercial Sentry varies; a few advanced grouping, fingerprinting and analytics features are under active development or improvement.

GlitchTip is suited for teams that need a budget-friendly, open-source alternative for error tracking and basic observability while retaining compatibility with Sentry client tooling. It supports both small single-server installs and larger containerized deployments with documented configuration and upgrade paths.

Why choose an open source alternative?

  • Data ownership: Keep your data on your own servers
  • No vendor lock-in: Freedom to switch or modify at any time
  • Cost savings: Reduce or eliminate subscription fees
  • Transparency: Audit the code and know exactly what's running