Overview
Wget is a network utility to retrieve files from the Web using http and
ftp, the two most widely used Internet protocols . It works non-interactively, so it will
work in the background, after having logged off. The program supports recursive retrieval
of web-authoring pages as well as ftp sites. You can use wget
to make mirrors of archives and home pages or to travel the Web like a WWW robot.
Examples
The examples are classified into three sections,
because of clarity. The first section is a tutorial for beginners. The second section
explains some of the more complex program features. The third section contains advice for
mirror administrators, as well as even more complex features (that some would call
perverted).
wget http://foo.bar.com/
-
But what will happen if the connection is slow,
and the file is lengthy? The connection will probably fail before the whole file is
retrieved, more than once. In this case, Wget will try getting the file until it either
gets the whole of it, or exceeds the default number of retries (this being 20). It is
easy to change the number of tries to 45, to insure that the whole file will arrive
safely:
wget --tries=45
http://foo.bar.com/jpg/flyweb.jpg
wget -t 45 -o log
http://foo.bar.com/jpg/flyweb.jpg &
The ampersand at the end of the line makes sure that
Wget works in the background. To unlimit the number of retries, use ' -t inf
'.
wget
ftp://foo.bar.com/welcome.msg
ftp://foo.download.com/welcome.msg
=> 'welcome.msg'
Connecting to foo.download.com:21... connected!
Logging in as anonymous ... Logged in!
==> TYPE I ... done. ==> CWD not needed.
==> PORT ... done. ==> RETR welcome.msg ... done.
wget -q --tries=45 -r \
http://download-east.oracle.com/otndoc/oracle9i/901_doc
wget -i file
If you specify ' - ' as file name, the URLs will be
read from standard input.
wget -r -t1 http://foo.bar.com/ -o
gnulog
wget -r -l1 http://www.yahoo.com/
wget -S http://www.lycos.com/
wget -r -l1 --no-parent -A.gif
http://host/dir/
It is a bit of a kludge, but it works perfectly. '
-r -l1 ' means to retrieve recursively, with maximum
depth of 1. ' --no-parent ' means that references to the
parent directory are ignored, and ' -A.gif ' means to
download only the GIF files. ' -A " *.gif " ' would have
worked too.
wget -nc -r http://foo.bar.com/
wget
ftp://name:password@foo.bar.com/myfile
-
If you wish Wget to keep a mirror of a page (or
FTP subdirectories), use ' --mirror ', which is the shorthand for ' -r -N '. You can
put Wget in the crontab file asking it to recheck a site each Sunday:
0 0 * * 0 wget --mirror ftp://x.y.z/pub -o
/var/weeklog
wget --mirror -A.html
http://www.w3.org/
You find the sources of wget with all the
documentation under the following links
http://www.gnu.org/software/wget/wget.html
http://www.lns.cornell.edu/public/COMP/info/wget/wget_toc.html
http://www.interlog.com/~tcharron/wgetwin.html
|