I'm looking for an application that will allow me to download the entire contents of a web site in one throw. In the old days, I would have used GoZilla for this purpose, but GoZilla seems to have fallen on hard times.
Ideally, I'd like to be able to download just the pages on a site that contain particular keywords, but the whole site would do.
Can anybody recommend an application that does this that (a) runs on Windows XP, (b) ain't spyware, and (c) is free- or shareware?
Thanks much, folks!
Have you tried Internet Explorer's offline favorites feature?
Posted by: Theocoid | October 17, 2006 at 10:30 AM
Looks to me like the Scrapbook extension for Firefox might be what you're looking for:
https://addons.mozilla.org/firefox/427/
Posted by: Mike Koenecke | October 17, 2006 at 10:57 AM
Secret project #4?
Posted by: David B. | October 17, 2006 at 11:38 AM
No, this has nothing to do with secret project #4. That's something I am working on at the moment (with a bunch of other people), though.
Posted by: Jimmy Akin | October 17, 2006 at 11:46 AM
Jimmy,
If you don't mind using the command line, you could try Wget. The main page is here and the Windows port is here.
Posted by: Publius | October 17, 2006 at 11:50 AM
At THIS MOMENT!!!??? WOW!! ;-)
Posted by: David B. | October 17, 2006 at 11:51 AM
Oops. Forgot a closing tag there. Anyway, both links still work (in Firefox at least), only the first spills over to the edge of the second.
Posted by: Publius | October 17, 2006 at 11:52 AM
Yes, I second the suggestion of Publius. Try wget. It's a command-line utility, which means that it can also be easily run from within a shell script (or batch file on MS-Windows).
Posted by: Thome E. Vaughan | October 17, 2006 at 01:02 PM
I'd look into Jplucker. I don't know if it would exactly fit your needs.
Posted by: Tony | October 17, 2006 at 01:26 PM
IsiloX? Freeware I believe.
Posted by: Martin | October 17, 2006 at 01:50 PM
Jimmy,
Last summer, we had a job managing some campgrounds. We used the free software at ~
http://www.httrack.com/ and got to read an awful lot of This Rock magazine articles...
enjoy!
Posted by: DennisE | October 17, 2006 at 03:05 PM
If you use Firefox, you can give the "Down Them All!" download manager extension a try and according to the website: "DownThemAll is absolutely freeware and open-source. No Adware, no Spyware."
https://addons.mozilla.org/firefox/201/
Posted by: Wil | October 17, 2006 at 03:24 PM
Take a look at http://www.webstripper.net
Posted by: | October 17, 2006 at 03:42 PM
WinHTTrack.
http://www.httrack.com/page/2/en/index.html
Posted by: Geoff | October 17, 2006 at 06:17 PM
I use Firefox with the Spiderzilla extension based on WinHTTrack from www.httrack.com. WinHTTrack can be used without Firefox.
I don't think DownThemAll will capture entire websites.
Posted by: Leo | October 17, 2006 at 07:25 PM
I have used WinHTTrack with good success to capture whole sights several links deep if you want, and all links are made reletive so they are browsable offline.
God bless, www.shiningpeak.com
Posted by: ShiningPeak | October 17, 2006 at 09:03 PM
I've used winhttrack too. Very good product.
Posted by: David Hart | October 17, 2006 at 10:20 PM
I use webzip from Spidersoft. It works great and downloads everything exactly as it is on the server (which it sounds like you want). It has multiple options so you can set it to take what you want. The program is shareware with a free trial period (if you get your hands on an old version, which works just fine if you are gathering text sites not data base or php driven sites, the trial period doesn't end)
http://www.spidersoft.com/
Posted by: Lurker #59 | October 17, 2006 at 10:22 PM
Jimmy, it sounds like you want a web mirroring program, not just a program that downloads chosen files or downloads all linked files in a particular page, some levels deep. A program that can do web mirroring synchronizes every file hosted in an online directory with a backup location on your computer.
So I don't think you want Firefox's extension, Download Them All, because all that does is download everything linked on a given page. Scrapbook is nice, but similarly, it needs you to start on a given page and then you tell it to get what is linked up to X levels deep, so it's still possible to not get the entire site with Scrapbook, because it depends on HTML links to be aware of content and grab it. I seriously doubt Scrapbook would be able to get everything from a site like catholic.com even if it was set to go many levels deep.
A cursory search on Tucows gave me the shareware result "AJC Directory Synchronizer". I'm sure there are many other site mirroring tools. I think you'll want to use the terms "site" and "mirror" in your search criteria on download sites such as Tucows or Download.com, to get specifically what you're looking for as opposed to the rest.
Posted by: Karen | October 18, 2006 at 12:27 AM
Oh, just wanted to add: Besides "site" and "mirror", try additionally the search term "synchronize".
I also found this page for you which can give you an intro to mirroring: http://www.boutell.com/newfaq/creating/mirroring.html
They recommend wget. Here's how it says to use wget to mirror a site:
where http://xxx.yyy of course would be replaced with, say, http://www.catholic.com
I'd create a new folder just for this first, to keep your downloads tidy, and then enter into the new folder from your command prompt, before issuing the wget command.
(If Catholic.com is the site you want to mirror, make sure to use the "www" in the URL to avoid downloading a bunch of forum posts from forums.catholic.com, assuming you don't want to grab those.)
Posted by: Karen | October 18, 2006 at 01:06 AM
http://pagesucker.com/ is shareware that has a free demo. It is easy to run and will put the website into a file.
Posted by: Julene | October 18, 2006 at 05:00 PM
I'd second the recommendation for wget if you're looking for something scriptable/repeatable. It has loads of parameters/switches for tweaking how it crawls through a website and what it does with the resulting files. I've only ever played with it in the Unix/Linux/OSX domain, but I'm sure there are Windows ports available.
Posted by: Scott | October 19, 2006 at 11:57 AM
I would like to find something that would accomplish the same job, but for Safari (on a Mac of course). Any ideas?
Posted by: Michael Sullivan | October 19, 2006 at 01:06 PM
Free utility that I've used for years
http://www.webreaper.net/
Michael
Posted by: Michael | October 20, 2006 at 06:46 AM
See www.isilo.com. It does what you ask for and its output can be read on pc, pocketpc, palm, linux (with appropriate reader for each platform).
Posted by: T. Skrovanek | October 21, 2006 at 03:29 PM
I think Adobe Acrobat Pro v7.0 can do what you're talking about. I do something similar with Adobe Acrobat Standard 5.0, and I think 7.0 pro has this feature and then some.
Of course, I'm pretty sure it will download them as PDFs, and that would present some difficulty if you wanted to convert them over to another format, I think. Also, Adobe software is pricey.
So, FWIW,
Posted by: Anonymous | October 22, 2006 at 10:33 AM