Webarchive
Simple web archive: save pages as PDF, headers, or single-file HTML
Webarchive is a lightweight, Go-based web archiving service designed for personal or home-network use. It exposes a REST API and optional web UI to capture and store web pages in multiple formats for offline access and evidence preservation.
Key Features
- Save pages in multiple formats: PDF, captured HTTP headers, and single-file HTML with embedded resources
- REST API for adding pages, listing archives, retrieving page metadata, and downloading stored files
- Built-in basic web UI (configurable via environment variables) and Docker Compose support for easy deployment
- Configurable via environment variables (DB path, API address, UI options, PDF rendering settings)
- PDF generation via an external wkhtmltopdf binary with configurable viewport, DPI, and print/media options
- Local file storage with per-page result IDs and simple file retrieval endpoints
Use Cases
- Personal archival of articles, issue threads, or documentation for offline reading and long-term reference
- Creating PDF snapshots of web pages for records, reporting, or legal evidence
- Capturing HTTP response headers and a single-file HTML version for debugging, change tracking, or lightweight backups
Limitations and Considerations
- PDF export requires an external wkhtmltopdf binary available in PATH; PDF fidelity depends on that tool
- No built-in authentication or multi-user controls; access control and multi-tenant use are not implemented yet
- UI is minimal (single basic theme) and feature set is intentionally simple; advanced browsing/search features are limited
- Storage backends are basic/local by default; SQL-backed or multi-storage options are listed as roadmap items and not yet available
Webarchive is suited for users who need a compact, API-driven archiver they can run locally. It focuses on reliability and simplicity rather than advanced multi-user features or full enterprise workflows.
Categories:
Tags:
Tech Stack:
Similar Services

Stirling PDF
Self-hosted PDF editing, conversion, OCR, and automation platform
Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Paperless-ngx
Document management system with OCR, search, and automated filing
Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Reactive Resume
Privacy-focused, open-source resume builder
Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

CyberChef
Browser-based toolkit for data decoding, encoding and analysis
CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox
Open-source self-hosted web archiving and snapshotting tool
Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.
ebook2audiobook
Convert eBooks into audiobooks with TTS and optional voice cloning
Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

Go
HTML
Docker