Webarchive

Webarchive

Simple web archive: save pages as PDF, headers, or single-file HTML

176stars
3forks
Last commit: 10mo ago
Repo age: 3y old

Webarchive is a lightweight, Go-based web archiving service designed for personal or home-network use. It exposes a REST API and optional web UI to capture and store web pages in multiple formats for offline access and evidence preservation.

Key Features

  • Save pages in multiple formats: PDF, captured HTTP headers, and single-file HTML with embedded resources
  • REST API for adding pages, listing archives, retrieving page metadata, and downloading stored files
  • Built-in basic web UI (configurable via environment variables) and Docker Compose support for easy deployment
  • Configurable via environment variables (DB path, API address, UI options, PDF rendering settings)
  • PDF generation via an external wkhtmltopdf binary with configurable viewport, DPI, and print/media options
  • Local file storage with per-page result IDs and simple file retrieval endpoints

Use Cases

  • Personal archival of articles, issue threads, or documentation for offline reading and long-term reference
  • Creating PDF snapshots of web pages for records, reporting, or legal evidence
  • Capturing HTTP response headers and a single-file HTML version for debugging, change tracking, or lightweight backups

Limitations and Considerations

  • PDF export requires an external wkhtmltopdf binary available in PATH; PDF fidelity depends on that tool
  • No built-in authentication or multi-user controls; access control and multi-tenant use are not implemented yet
  • UI is minimal (single basic theme) and feature set is intentionally simple; advanced browsing/search features are limited
  • Storage backends are basic/local by default; SQL-backed or multi-storage options are listed as roadmap items and not yet available

Webarchive is suited for users who need a compact, API-driven archiver they can run locally. It focuses on reliability and simplicity rather than advanced multi-user features or full enterprise workflows.

Categories:

Tags:

Tech Stack:

Share:

Similar Services

Stirling PDF

Stirling PDF

Self-hosted PDF editing, conversion, OCR, and automation platform

73.1k
6.2k
Last commit: 16h ago

Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Alternative to:
Adobe Acrobat
Adobe Acrobat
+19
Paperless-ngx

Paperless-ngx

Document management system with OCR, search, and automated filing

35.7k
2.3k
Last commit: 17h ago

Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Alternative to:
DocuWare
DocuWare
+6
Reactive Resume

Reactive Resume

Privacy-focused, open-source resume builder

34.5k
3.8k
Last commit: 9d ago

Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

Alternative to:
Resume.io
Resume.io
+5
CyberChef

CyberChef

Browser-based toolkit for data decoding, encoding and analysis

33.8k
3.8k
Last commit: 5mo ago

CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox

ArchiveBox

Open-source self-hosted web archiving and snapshotting tool

26.4k
1.4k
Last commit: 11d ago

Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Alternative to:
Internet Archive Wayback Machine
Internet Archive Wayback Machine
+3
ebook2audiobook

ebook2audiobook

Convert eBooks into audiobooks with TTS and optional voice cloning

17k
1.4k
Last commit: 1d ago

Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

Alternative to:
Speechify
Speechify
+7