Dave Heinemann

Archiving a Website with Wget

Web pages and even entire websites disappear every day, never to be seen again. Sometimes the author decides to take them offline. Sometimes they pass away. Sometimes (especially with free services), the hosting provider simply purges everything to save a buck. The good news is that you can do something about it, at least on a personal level.

If you aren't too tech savvy, you can easily archive individual pages for free using the Wayback Machine. Or if you're prepared to pay a very reasonable price for the privilege, Pinboard will archive all of your bookmarks for you.

But what happens if you need to archive an entire website? Nobody wants to manually copy and paste individual URLs into an archiving service. If you're prepared to get your hands dirty, there's Wget.

Wget is a command line tool for fetching content from web servers. Using the right command line arguments, it is very effective at downloading a whole website. Wget comes installed on most Linux distributions. Windows users will need to find a Windows build online such as the (now quite old) GnuWin32 packages, or use Cygwin.

Archiving a Website

To archive a website with Wget, use:

Explanation

The smallest websites only take a few minutes to download. Others can take significantly longer. When Wget is done, you'll have a folder structure matching the website that you downloaded. Open it up, and then open the index.html page to see the finished result.

Further Information

Do you have any thoughts or feedback? Let me know via email!

#Linux