양쪽 이전 판이전 판다음 판 | 이전 판 |
tech:wget [2016/11/01 09:44] – V_L | tech:wget [2023/07/23 00:04] (현재) – [사이트 전체 받기] V_L |
---|
wget은 느리거나 불안정한 네트워크 환경에서도 매우 잘 작동하는 견고한 프로그램이다. 네트워크 환경이 불안해서 도중에 연결이 끊겼다면, 연결이 끊긴 시점부터 다운로드 받는 기능도 가지고 있다. | wget은 느리거나 불안정한 네트워크 환경에서도 매우 잘 작동하는 견고한 프로그램이다. 네트워크 환경이 불안해서 도중에 연결이 끊겼다면, 연결이 끊긴 시점부터 다운로드 받는 기능도 가지고 있다. |
| |
http://coffeenix.net/board_print.php?bd_code=168 | * [[http://coffeenix.net/board_print.php?bd_code=168|도움말]] |
| * [[https://www.gnu.org/software/wget/manual/wget.html|GNU Wget 1.18 Manual]] |
| |
| [[웹집]]보다 나을 수도 ... |
| |
wget -r -nv -nH -N ftp://211.45.156.111/public_html/data/pages -P /var | wget -r -nv -nH -N ftp://211.45.156.111/public_html/data/pages -P /var |
| |
wget -r -nv -nH -N ftp://id:[email protected]/html/data/pages/info.txt -P /home/www | wget -r -nv -nH -N ftp://id:[email protected]/html/data/pages/info.txt -P /home/www |
| |
<code> | |
GNU Wget 1.12, a non-interactive network retriever. | |
Usage: wget [OPTION]... [URL]... | |
| |
Mandatory arguments to long options are mandatory for short options too. | =====옵션===== |
| |
Startup: | |
-V, --version display the version of Wget and exit. | |
-h, --help print this help. | |
-b, --background go to background after startup. | |
-e, --execute=COMMAND execute a `.wgetrc'-style command. | |
| |
Logging and input file: | |
-o, --output-file=FILE log messages to FILE. | |
-a, --append-output=FILE append messages to FILE. | |
-d, --debug print lots of debugging information. | |
-q, --quiet quiet (no output). | |
-v, --verbose be verbose (this is the default). | |
-nv, --no-verbose turn off verboseness, without being quiet. | |
-i, --input-file=FILE download URLs found in local or external FILE. | |
-F, --force-html treat input file as HTML. | |
-B, --base=URL resolves HTML input-file links (-i -F) | |
relative to URL. | |
| |
Download: | |
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits). | |
--retry-connrefused retry even if connection is refused. | |
-O, --output-document=FILE write documents to FILE. | |
-nc, --no-clobber skip downloads that would download to | |
existing files. | |
-c, --continue resume getting a partially-downloaded file. | |
--progress=TYPE select progress gauge type. | |
-N, --timestamping don't re-retrieve files unless newer than | |
local. | |
-S, --server-response print server response. | |
--spider don't download anything. | |
-T, --timeout=SECONDS set all timeout values to SECONDS. | |
--dns-timeout=SECS set the DNS lookup timeout to SECS. | |
--connect-timeout=SECS set the connect timeout to SECS. | |
--read-timeout=SECS set the read timeout to SECS. | |
-w, --wait=SECONDS wait SECONDS between retrievals. | |
--waitretry=SECONDS wait 1..SECONDS between retries of a retrieval. | |
--random-wait wait from 0...2*WAIT secs between retrievals. | |
--no-proxy explicitly turn off proxy. | |
-Q, --quota=NUMBER set retrieval quota to NUMBER. | |
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host. | |
--limit-rate=RATE limit download rate to RATE. | |
--no-dns-cache disable caching DNS lookups. | |
--restrict-file-names=OS restrict chars in file names to ones OS allows. | |
--ignore-case ignore case when matching files/directories. | |
-4, --inet4-only connect only to IPv4 addresses. | |
-6, --inet6-only connect only to IPv6 addresses. | |
--prefer-family=FAMILY connect first to addresses of specified family, | |
one of IPv6, IPv4, or none. | |
--user=USER set both ftp and http user to USER. | |
--password=PASS set both ftp and http password to PASS. | |
--ask-password prompt for passwords. | |
--no-iri turn off IRI support. | |
--local-encoding=ENC use ENC as the local encoding for IRIs. | |
--remote-encoding=ENC use ENC as the default remote encoding. | |
| |
Directories: | <file> |
-nd, --no-directories don't create directories. | --recursive |
-x, --force-directories force creation of directories. | </file> |
-nH, --no-host-directories don't create host directories. | Tells wget to recursively download pages, starting from the specified URL. |
--protocol-directories use protocol name in directories. | <file> |
-P, --directory-prefix=PREFIX save files to PREFIX/... | --level=1 |
--cut-dirs=NUMBER ignore NUMBER remote directory components. | </file> |
| Tells wget to stop after one level of recursion. This can be changed to download more deeply, or set to 0 that means “no limit” |
| <file> |
| --no-clobber |
| </file> |
| Skip downloads that would download to existing files |
| <file> |
| --page-requisites |
| </file> |
| Tells wget to download all the resources (images, css, javascript, ...) that are needed for the page to work. |
| <file> |
| --html-extension |
| </file> |
| Adds ”.html” extension to downloaded files, with the double purpose of making the browser recognize them as html files and solving naming conflicts for “generated” URLs, when there are no directories with “index.html” but just a framework that responds dynamically with generated pages. |
| <file> |
| --convert-links |
| </file> |
| After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc. |
| <file> |
| --no-parent |
| </file> |
| Do not ever ascend to the parent directory when retrieving recursively. |
| <file> |
| --domains=www.example.com |
| </file> |
| Set domains to be followed. DOMAIN-LIST is a comma-separated list of domains. |
| |
HTTP options: | Avoiding imposed download limits |
--http-user=USER set http user to USER. | Many web servers tend to limit the pages a user can download in a given amount of time, or the user-agents that can access given pages, etc. To avoid such limits, some extra options may be added. |
--http-password=PASS set http password to PASS. | |
--no-cache disallow server-cached data. | |
--default-page=NAME Change the default page name (normally | |
this is `index.html'.). | |
-E, --adjust-extension save HTML/CSS documents with proper extensions. | |
--ignore-length ignore `Content-Length' header field. | |
--header=STRING insert STRING among the headers. | |
--max-redirect maximum redirections allowed per page. | |
--proxy-user=USER set USER as proxy username. | |
--proxy-password=PASS set PASS as proxy password. | |
--referer=URL include `Referer: URL' header in HTTP request. | |
--save-headers save the HTTP headers to file. | |
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION. | |
--no-http-keep-alive disable HTTP keep-alive (persistent connections). | |
--no-cookies don't use cookies. | |
--load-cookies=FILE load cookies from FILE before session. | |
--save-cookies=FILE save cookies to FILE after session. | |
--keep-session-cookies load and save session (non-permanent) cookies. | |
--post-data=STRING use the POST method; send STRING as the data. | |
--post-file=FILE use the POST method; send contents of FILE. | |
--content-disposition honor the Content-Disposition header when | |
choosing local file names (EXPERIMENTAL). | |
--auth-no-challenge send Basic HTTP authentication information | |
without first waiting for the server's | |
challenge. | |
| |
HTTPS (SSL/TLS) options: | <file> |
--secure-protocol=PR choose secure protocol, one of auto, | -U "Mozilla/5.0 (X11; U; Linux; en-US; rv:1.9.1.16) Gecko/20110929 Firefox/3.5.16" |
SSLv3, and TLSv1. | </file> |
--no-check-certificate don't validate the server's certificate. | Tells wget to use a fake user-agent, to emulate the one of a web browser (in this case, Firefox 3.5 on Linux) |
--certificate=FILE client certificate file. | <file> |
--certificate-type=TYPE client certificate type, PEM or DER. | --wait=3 |
--private-key=FILE private key file. | </file> |
--private-key-type=TYPE private key type, PEM or DER. | Tells wget to wait at least 3 seconds between retrievals. |
--ca-certificate=FILE file with the bundle of CA's. | <file> |
--ca-directory=DIR directory where hash list of CA's is stored. | --random-wait |
--random-file=FILE file with random data for seeding the SSL PRNG. | </file> |
--egd-file=FILE file naming the EGD socket with random data. | Tells wget to wait a random time between 0 and double the value specified with –wait between requests. |
| |
FTP options: | <file> |
--ftp-user=USER set ftp user to USER. | -P prefix |
--ftp-password=PASS set ftp password to PASS. | --directory-prefix=prefix |
--no-remove-listing don't remove `.listing' files. | </file> |
--no-glob turn off FTP file name globbing. | |
--no-passive-ftp disable the "passive" transfer mode. | |
--retr-symlinks when recursing, get linked-to files (not dir). | |
| |
Recursive download: | |
-r, --recursive specify recursive download. | |
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite). | |
--delete-after delete files locally after downloading them. | |
-k, --convert-links make links in downloaded HTML or CSS point to | |
local files. | |
-K, --backup-converted before converting file X, back up as X.orig. | |
-m, --mirror shortcut for -N -r -l inf --no-remove-listing. | |
-p, --page-requisites get all images, etc. needed to display HTML page. | |
--strict-comments turn on strict (SGML) handling of HTML comments. | |
| |
Recursive accept/reject: | |
-A, --accept=LIST comma-separated list of accepted extensions. | |
-R, --reject=LIST comma-separated list of rejected extensions. | |
-D, --domains=LIST comma-separated list of accepted domains. | |
--exclude-domains=LIST comma-separated list of rejected domains. | |
--follow-ftp follow FTP links from HTML documents. | |
--follow-tags=LIST comma-separated list of followed HTML tags. | |
--ignore-tags=LIST comma-separated list of ignored HTML tags. | |
-H, --span-hosts go to foreign hosts when recursive. | |
-L, --relative follow relative links only. | |
-I, --include-directories=LIST list of allowed directories. | |
--trust-server-names use the name specified by the redirection url last component. | |
-X, --exclude-directories=LIST list of excluded directories. | |
-np, --no-parent don't ascend to the parent directory. | |
| |
</code> | |
| |
=====사이트 전체 받기===== | |
| |
| Set directory prefix to prefix. The directory prefix is the |
| directory where all other files and sub-directories will be |
| saved to, i.e. the top of the retrieval tree. The default |
| is . (the current directory). |
| =====예제===== |
| ====사이트 전체 받기==== |
| [[http://www.linuxjournal.com/content/downloading-entire-web-site-wget|출처]] |
<file> | <file> |
wget \ | wget \ |
--domains website.org \ | --domains website.org \ |
--no-parent \ | --no-parent \ |
| --limit-rate=20k \ |
| --referer=125.209.222.141 \ |
www.website.org/tutorials/html/ | www.website.org/tutorials/html/ |
</file> | </file> |
| |
| --post-data=string |
| |
| |
| <file> |
| --recursive: download the entire Web site. |
| --domains website.org: don't follow links outside website.org. |
| --no-parent: don't follow links outside the directory tutorials/html/. |
| --page-requisites: get all the elements that compose the page (images, CSS and so on). |
| --html-extension: save files with the .html extension. |
| --convert-links: convert links so that they work locally, off-line. |
| --restrict-file-names=windows: modify filenames so that they will work in Windows as well. |
| --no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed). |
| </file> |
| ====사진 받기==== |
| |
| wget -r -np --reject "*.txt" http://192.168.0.100/images |
| |
| * -r은 --recursive를 줄인 것. |
| * -l (소문자 엘) 옵션을 사용하면 하위폴더의 단계를 정한다. 기본적으로 wget에서는 5단계까지 하위폴더의 모든 파일을 다운로드 함. |
| * -np라는 옵션은 no-parent. recursive 옵션을 주고 실행할 때 부모 디렉토리의 파일을 다운로드 하지 말라는 뜻. |
| * 이미지와 텍스트가 포함된 폴더라고 가정하면 --reject 옵션으로 txt 파일을 제외한다. [[http://ngee.tistory.com/376|출처 ngee]] |
| |
| ====여러 파일 받기==== |
| |
| txt로 받을 파일의 주소(URL)을 한줄에 하나씩 넣은 뒤 '-i' 옵션을 사용하면 된다. |
| |
| wget -i halo.txt |
| |
| <file txt halo.txt> |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part1.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part2.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part3.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part4.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part5.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part6.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD1_p30download.com.part7.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part1.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part2.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part3.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part4.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part5.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part6.rar |
| http://cdn.p30download.com/?b=p30dl-console&f=Halo.3.ODST.iMARS.DVD2_p30download.com.part7.rar |
| </file> |