Self-hosted projects tagged “Data Ingestion”
24 open source projects with this tag
24 services found

Huginn
Open-source platform for self-hosted automation agents
Huginn is an open-source automation platform that runs agents to monitor web data, process events, and trigger actions — self-hosted and extensible.


Kestra
Open-source, event-driven workflow orchestration and scheduling platform
Declarative, API-first orchestration platform for scheduled and event-driven workflows with a plugin ecosystem, UI editor, CI/CD and Terraform integration.

Vector
High-performance observability data pipeline written in Rust
Open-source observability pipeline to collect, transform, and route logs and metrics with a single, high-performance binary and programmable transforms.
ZincSearch
A lightweight open-source search engine for full-text indexing.
ZincSearch is a Go-based, lightweight search engine for full-text indexing with Elasticsearch API-compatible ingestion, a Vue UI, and a schema-less document model.

Graylog
Centralized log management and analysis platform
Graylog is an open source platform for collecting, indexing, searching, and alerting on logs and machine data from many sources in one place.

TeslaMate
Self-hosted data logger and analytics for Tesla vehicles.
Open-source Tesla telemetry logger that records driving, charging and location data to PostgreSQL and provides Grafana dashboards plus MQTT integration.
Open Source Routing Machine (OSRM)
High-performance routing engine for OpenStreetMap data
OSRM is a high-performance routing engine for OpenStreetMap data, providing an HTTP API for routing, map matching, distance tables, and more.
Scrutiny
Web UI for SMART monitoring of hard drives
Self-hosted S.M.A.R.T monitoring dashboard that collects SMART data, visualizes historical trends, and alerts on drive health.
Nominatim
Open-source geocoding and reverse-geocoding using OpenStreetMap data
Nominatim provides geocoding (name/address → coordinates) and reverse geocoding (coordinates → address) powered by OpenStreetMap, with import tooling and a public API.

Chartbrew
Open-source self-hosted data visualization dashboards
Chartbrew is an open-source platform to build live dashboards and reports by connecting SQL/NoSQL databases and REST APIs.

OpenTripPlanner
Multimodal trip planning and transit routing server
OpenTripPlanner (OTP) is an open source multimodal routing engine that builds networks from GTFS and OpenStreetMap to produce itineraries and real-time transit trip plans...
Aleph
Document and data indexing, entity search, and investigative analysis
Aleph indexes documents and structured datasets to enable fast search, entity extraction, and cross-referencing for investigative research and OSINT workflows.
Parseable
An observability platform for predictive insights across MELT telemetry.
Parseable ingests, analyzes, and extracts insights from MELT telemetry data with predictive analytics and a unified SQL/NL querying interface.


Open Archiver
Open-source platform for legally compliant email archiving
Self-hosted email archiving platform for ingesting, storing, indexing and searching emails from Gmail, Microsoft 365, IMAP, PST and more.


Emoncms
Energy and environmental time-series logging and visualization
Open-source web app to collect, process, store, and visualize energy, temperature, and other environmental time-series data with dashboards, graphs, and an API.

Panora
AI-powered back-office automation for purchase order entry
Automates purchase order ingestion, validation, and ERP posting for distributors, manufacturers and wholesalers using AI-driven item matching and configurable workflows.
Fitbit Fetch Script and InfluxDB Grafana Integration
Fetch Fitbit API data into InfluxDB and visualize it with Grafana
Python service that pulls Fitbit health metrics via the Fitbit Web API, stores them in InfluxDB, and provides Grafana dashboards for long-term trend visualization.


Riven
VFS-based automated media management and streaming platform
Open-source media management system that exposes a FUSE-based virtual filesystem, automates discovery/scraping/downloading, and integrates with Plex/Jellyfin/Emby.
Wishlist
Sharable, self-hosted wishlist application for friends and family
Self-hosted SvelteKit wishlist app that scrapes product metadata, supports groups, registry mode, PWA, OpenID Connect, and Docker deployment.
Minne
Minne: a graph-powered read-it-later and personal knowledge base.
Self-hosted graph-powered personal knowledge base with AI search, chat, and multi-format ingestion.

Hyrax
Ruby on Rails engine for building digital repository applications
Open-source repository engine from the Samvera community for building institutional digital repositories with flexible metadata, workflows, and search integration.

Mantium
Self-hosted manga tracker aggregating metadata from multiple sources.
Mantium is a self-hosted manga tracker that collects manga metadata (not images) from multiple sources and provides a dashboard and iFrame for embedding.


Mistborn
Multi-source threat intelligence and IOC aggregation platform
Mistborn aggregates threat intelligence from multiple sources to enrich, normalize, and distribute IOCs for security analysis and incident response workflows.

Spotizerr
Self-hosted Spotify/Deezer music download manager
Self-hosted music download manager that fetches Spotify content and falls back to Deezer for lossless sources; FastAPI backend, Celery tasks and Redis caching.