Webarchive

Simple web archive: save pages as PDF, headers, or single-file HTML

Repository

185stars

3forks

Last commit: 11mo ago

Repo age: 3y old

Webarchive is a lightweight, Go-based web archiving service designed for personal or home-network use. It exposes a REST API and optional web UI to capture and store web pages in multiple formats for offline access and evidence preservation.

Key Features

Save pages in multiple formats: PDF, captured HTTP headers, and single-file HTML with embedded resources
REST API for adding pages, listing archives, retrieving page metadata, and downloading stored files
Built-in basic web UI (configurable via environment variables) and Docker Compose support for easy deployment
Configurable via environment variables (DB path, API address, UI options, PDF rendering settings)
PDF generation via an external wkhtmltopdf binary with configurable viewport, DPI, and print/media options
Local file storage with per-page result IDs and simple file retrieval endpoints

Use Cases

Personal archival of articles, issue threads, or documentation for offline reading and long-term reference
Creating PDF snapshots of web pages for records, reporting, or legal evidence
Capturing HTTP response headers and a single-file HTML version for debugging, change tracking, or lightweight backups

Limitations and Considerations

PDF export requires an external wkhtmltopdf binary available in PATH; PDF fidelity depends on that tool
No built-in authentication or multi-user controls; access control and multi-tenant use are not implemented yet
UI is minimal (single basic theme) and feature set is intentionally simple; advanced browsing/search features are limited
Storage backends are basic/local by default; SQL-backed or multi-storage options are listed as roadmap items and not yet available

Webarchive is suited for users who need a compact, API-driven archiver they can run locally. It focuses on reliability and simplicity rather than advanced multi-user features or full enterprise workflows.

Stirling PDF

Self-hosted PDF editing, conversion, OCR, and automation platform

74.6k

6.3k

Last commit: 7h ago

Open-source PDF platform to edit, convert, OCR, sign, redact, and automate PDF workflows via a web UI and REST API.

Alternative to:

Adobe Acrobat+19

Paperless-ngx

Document management system with OCR, search, and automated filing

36.9k

2.3k

Last commit: 1d ago

Paperless-ngx is an open-source document management system that ingests scans and files, runs OCR, and turns them into a searchable, taggable document archive.

Alternative to:

DocuWare+6

Reactive Resume

Privacy-focused, open-source resume builder

35.4k

3.9k

Last commit: 1d ago

Open-source resume builder for creating, customizing, exporting and publishing resumes with templates, PDF export, public sharing and optional OpenAI assistance.

Alternative to:

Resume.io+5

CyberChef

Browser-based toolkit for data decoding, encoding and analysis

34.1k

3.9k

Last commit: 1d ago

CyberChef is a web-based “cyber” toolkit for encoding/decoding, encryption/decryption, compression, hashing, parsing, and data transformation using drag-and-drop recipes.

ArchiveBox

Open-source self-hosted web archiving and snapshotting tool

26.9k

1.5k

Last commit: 1d ago

Self-hosted tool to collect and preserve webpages, media, and bookmarks in durable formats (HTML, PDF, WARC, MP4) with a CLI, web UI, and search.

Alternative to:

Internet Archive Wayback Machine+3

ebook2audiobook

Convert eBooks into audiobooks with TTS and optional voice cloning

18.3k

1.5k

Last commit: 5d ago

Self-hostable tool to convert non-DRM eBooks into audiobooks with chapter support, metadata, multilingual TTS engines, and optional voice cloning via a web UI or CLI.

Alternative to:

Speechify+7

Webarchive

Key Features

Use Cases

Limitations and Considerations

Categories:

Tags:

Tech Stack:

Similar Services

Stirling PDF

Paperless-ngx

Reactive Resume

CyberChef

ArchiveBox

ebook2audiobook