
Langfuse
Open-source platform for LLM observability, evals, and prompt management

Langfuse is an open-source LLM engineering platform that helps teams develop, monitor, evaluate, and debug LLM-powered applications. It provides end-to-end visibility into LLM calls and related app logic (RAG, embeddings, agent steps), alongside tools to iterate on prompts and measure quality over time.
Key Features
- End-to-end tracing and observability for LLM applications, including nested operations and user sessions
- Metrics and analytics to monitor model behavior and application performance
- Evaluation workflows, including LLM-as-a-judge, human labeling, user feedback, and custom eval pipelines via API/SDK
- Prompt management with versioning and collaborative iteration
- Datasets and dataset runs for benchmarks, regression testing, and continuous improvement
- Playground for testing prompts and model configurations, connected to production traces
- Broad integrations (e.g., OpenTelemetry, OpenAI SDK wrappers, LangChain, LlamaIndex, LiteLLM) and a comprehensive API with typed SDKs
Use Cases
- Debugging production LLM apps by inspecting traces across retrieval, tools, and agent actions
- Running systematic evaluations and regression tests on prompts and model changes
- Building internal LLMOps workflows using Langfuse datasets, APIs, and metrics
Limitations and Considerations
- Full value typically requires instrumentation (SDKs or integration hooks) in the LLM application
Langfuse combines observability with prompt and evaluation tooling to shorten the iteration loop for LLM applications. It fits teams that need both operational insight and a structured workflow for improving quality over time.
Categories:
Tags:
Tech Stack:
Similar Services

Opik
LLM observability and evaluation platform for traces, tests, and dashboards
Opik is an open-source platform to trace, evaluate, and monitor LLM apps, RAG pipelines, and agent workflows with automated evaluations and production dashboards.

Agenta
Open-source LLMOps platform for prompts, evals, and observability
Agenta is an open-source LLMOps platform with a prompt playground, prompt/version management, LLM evaluation, and production observability for LLM apps.
BirdNET-Analyzer
Machine-learning tool for analyzing bird vocalizations in audio
Open-source BirdNET toolkit to batch-process audio recordings and identify bird species from their vocalizations using machine learning models.
OpenTelemetry
Docker
TypeScript
Node.js