Self-hosted projects tagged “Data Ingestion”
24 open source projects with this tag
24 open source projects with this tag
24 services found

Open-source platform for self-hosted automation agents
Huginn is an open-source automation platform that runs agents to monitor web data, process events, and trigger actions — self-hosted and extensible.


Open-source, event-driven workflow orchestration and scheduling platform
Declarative, API-first orchestration platform for scheduled and event-driven workflows with a plugin ecosystem, UI editor, CI/CD and Terraform integration.

High-performance observability data pipeline written in Rust
Open-source observability pipeline to collect, transform, and route logs and metrics with a single, high-performance binary and programmable transforms.
A lightweight open-source search engine for full-text indexing.
ZincSearch is a Go-based, lightweight search engine for full-text indexing with Elasticsearch API-compatible ingestion, a Vue UI, and a schema-less document model.

Centralized log management and analysis platform
Graylog is an open source platform for collecting, indexing, searching, and alerting on logs and machine data from many sources in one place.

Self-hosted data logger and analytics for Tesla vehicles.
Open-source Tesla telemetry logger that records driving, charging and location data to PostgreSQL and provides Grafana dashboards plus MQTT integration.
High-performance routing engine for OpenStreetMap data
OSRM is a high-performance routing engine for OpenStreetMap data, providing an HTTP API for routing, map matching, distance tables, and more.
Web UI for SMART monitoring of hard drives
Self-hosted S.M.A.R.T monitoring dashboard that collects SMART data, visualizes historical trends, and alerts on drive health.
Open-source geocoding and reverse-geocoding using OpenStreetMap data
Nominatim provides geocoding (name/address → coordinates) and reverse geocoding (coordinates → address) powered by OpenStreetMap, with import tooling and a public API.

Open-source self-hosted data visualization dashboards
Chartbrew is an open-source platform to build live dashboards and reports by connecting SQL/NoSQL databases and REST APIs.

Multimodal trip planning and transit routing server
OpenTripPlanner (OTP) is an open source multimodal routing engine that builds networks from GTFS and OpenStreetMap to produce itineraries and real-time transit trip plans...
Document and data indexing, entity search, and investigative analysis
Aleph indexes documents and structured datasets to enable fast search, entity extraction, and cross-referencing for investigative research and OSINT workflows.
An observability platform for predictive insights across MELT telemetry.
Parseable ingests, analyzes, and extracts insights from MELT telemetry data with predictive analytics and a unified SQL/NL querying interface.


Open-source platform for legally compliant email archiving
Self-hosted email archiving platform for ingesting, storing, indexing and searching emails from Gmail, Microsoft 365, IMAP, PST and more.


Energy and environmental time-series logging and visualization
Open-source web app to collect, process, store, and visualize energy, temperature, and other environmental time-series data with dashboards, graphs, and an API.

AI-powered back-office automation for purchase order entry
Automates purchase order ingestion, validation, and ERP posting for distributors, manufacturers and wholesalers using AI-driven item matching and configurable workflows.
Fetch Fitbit API data into InfluxDB and visualize it with Grafana
Python service that pulls Fitbit health metrics via the Fitbit Web API, stores them in InfluxDB, and provides Grafana dashboards for long-term trend visualization.


VFS-based automated media management and streaming platform
Open-source media management system that exposes a FUSE-based virtual filesystem, automates discovery/scraping/downloading, and integrates with Plex/Jellyfin/Emby.
Sharable, self-hosted wishlist application for friends and family
Self-hosted SvelteKit wishlist app that scrapes product metadata, supports groups, registry mode, PWA, OpenID Connect, and Docker deployment.
Minne: a graph-powered read-it-later and personal knowledge base.
Self-hosted graph-powered personal knowledge base with AI search, chat, and multi-format ingestion.

Ruby on Rails engine for building digital repository applications
Open-source repository engine from the Samvera community for building institutional digital repositories with flexible metadata, workflows, and search integration.

Self-hosted manga tracker aggregating metadata from multiple sources.
Mantium is a self-hosted manga tracker that collects manga metadata (not images) from multiple sources and provides a dashboard and iFrame for embedding.


Multi-source threat intelligence and IOC aggregation platform
Mistborn aggregates threat intelligence from multiple sources to enrich, normalize, and distribute IOCs for security analysis and incident response workflows.

Self-hosted Spotify/Deezer music download manager
Self-hosted music download manager that fetches Spotify content and falls back to Deezer for lossless sources; FastAPI backend, Celery tasks and Redis caching.