Open source self-hosted web archiving

ArchiveBox is a powerful self-hosted internet archiving solution written in Python. You feed it URLs of pages you want to archive, and it saves them to disk in a variety of formats depending on setup and content within.

?
  Run ArchiveBox via Docker Compose (recommended), Docker, Apt, Brew, or Pip (see below).

apt/brew/pip3 install archivebox

archivebox init                       # run this in an empty folder
archivebox add 'https://example.com'  # start adding URLs to archive
curl https://example.com/rss.xml | archivebox add  # or add via stdin
archivebox schedule --every=day https://example.com/rss.xml

For each URL added, ArchiveBox saves several types of HTML snapshot (wget, Chrome headless, singlefile), a PDF, a screenshot, a WARC archive, any git repositories,

 

 

 

To finish reading, please visit source site