'wget'에 해당되는 글 4건

  1. 2012.04.13 wget -r 로 웹 사이트 통채로 끌어오기
  2. 2010.09.11 curl
  3. 2010.04.10 wget 4
  4. 2010.04.10 wget - cookie 관련 옵션
Linux/Ubuntu2012. 4. 13. 23:20
혹시나 해서 찾아봤는데.. 헐 이렇게 가까운데에 이렇게 막강한 명령어가 있을줄이야.. OTL
윈도우에서는 웹집이라던가 이런류의 웹사이트 퍼오는 프로그램이 있어야 하지만
리눅스에서는 wget으로 간단하게 해결~!

-r
--recursive
Turn on recursive retrieving.

[링크 : http://linux.die.net/man/1/wget

[링크 : http://www.cyberciti.biz/faq/wget-recursive-download-command/]

Posted by 구차니
Linux2010. 9. 11. 22:49
android platform을 받는데 쓰인녀석인데..

curl is a tool to transfer data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP, FTPS, TFTP, DICT, TELNET, LDAP or FILE). The command is designed to work without user interaction.

curl offers a busload of useful tricks like proxy support, user authentication, ftp upload, HTTP post, SSL (https:) connections, cookies, file transfer resume and more. As you will see below, the amount of features will make your head spin!

[링크 : http://linux.die.net/man/1/curl]
[링크 : http://curl.haxx.se/]

일단 curl은 wget이 지원하는 모든 프로토콜을 지원하므로
기능상으로는 wget과 유사한 느낌이지만, 조금더 막강해진 녀석이다.

GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP , HTTPS , and FTP protocols, as well as retrieval through HTTP proxies.

[링크 : http://linux.die.net/man/1/wget]

'Linux' 카테고리의 다른 글

mkinitramfs  (0) 2010.09.19
top (1) - load average ?  (4) 2010.09.12
verbose가 모야?  (4) 2010.07.29
쉘스크립트의 stdout / stderr 리다이렉션(redirection c)  (0) 2010.07.07
bash 쉘 스크립트 에서 파일이 존재하지 않을 경우  (0) 2010.06.24
Posted by 구차니
Linux2010. 4. 10. 18:11
wget은 머의 약자일려나.. 혹시.. ftp의 get 명령어를 Web에서 한다고 wget 이려나?

아무튼, wget은 Ubuntu 9.10에서 2010-04-10 에 1.11.4 버전을 유지하고있다.
$ wget -V
GNU Wget 1.11.4

Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Currently maintained by Micah Cowan <micah@cowan.name>.
gnu 에서는 1.12 가 최신인데 짝수버전은 피하는 성격상 11.4를 넣은것으로 생각된다.

[링크 : http://www.gnu.org/software/wget/]
[링크 : http://ftp.gnu.org/gnu/wget/]


그리고, wget이 필요로 하는 라이브러리는 다음과 같다.
$ ldd /usr/bin/wget
        linux-gate.so.1 =>  (0xb78c7000)
        libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb78ac000)
        librt.so.1 => /lib/tls/i686/cmov/librt.so.1 (0xb78a3000)
        libssl.so.0.9.8 => /lib/i686/cmov/libssl.so.0.9.8 (0xb785c000)
        libcrypto.so.0.9.8 => /lib/i686/cmov/libcrypto.so.0.9.8 (0xb7716000)
        libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb75d1000)
        /lib/ld-linux.so.2 (0xb78c8000)
        libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb75b8000)
        libz.so.1 => /lib/libz.so.1 (0xb75a2000)

파일은 대략 237K 이다.(보기보다 덩치가 큰녀석이다!)
$ ll /usr/bin/wget
-rwxr-xr-x 1 root root 242516 2009-10-07 00:12 /usr/bin/wget

busybox 에서는 cookie 관련 내용은 빠져있는 것으로 생각된다.
wget

    wget [-c|--continue] [-s|--spider] [-q|--quiet] [-O|--output-document file]
        [--header 'header: value'] [-Y|--proxy on/off] [-P DIR]
        [-U|--user-agent agent] url

    Retrieve files via HTTP or FTP

    Options:

            -s      Spider mode - only check file existence
            -c      Continue retrieval of aborted transfer
            -q      Quiet
            -P      Set directory prefix to DIR
            -O      Save to filename ('-' for stdout)
            -U      Adjust 'User-Agent' field
            -Y      Use proxy ('on' or 'off')

[링크 : http://www.busybox.net/downloads/BusyBox.html]

Posted by 구차니
wget은 HTTP나 FTP 등의 프로토콜을 통해 웹페이지나 파일을 다운받는 유틸리티이다.
아무튼 youtube에서 파일로 다운로드 받기위해 주소를 변환해서 시도를 해도
403 Forbidden 에러만 날뿐 다운로드 되지 않는다.

아무튼 쿠키를 저장하고, 이를 다시 불러들여 다운로드를 시도하면 제대로 받아진다.
쿠키를 이용해서 세션이 달라지면, 이전의 내용은 무효화 되서 그런것으로 생각된다.

--no-cookies
Disable the use of cookies. Cookies are a mechanism for maintaining server-side state. The server sends the client a cookie using the Set-Cookie header, and the client responds with the same cookie upon further requests. Since cookies allow the server owners to keep track of visitors and for sites to exchange this information, some consider them a breach of privacy. The default is to use cookies; however, storing cookies is not on by default.


--load-cookies file
Load cookies from file before the first HTTP retrieval. file is a textual file in the format originally used by Netscape's cookies.txt file.

You will typically use this option when mirroring sites that require that you be logged in to access some or all of their content. The login process typically works by the web server issuing an http cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity.

Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by ‘--load-cookies’—simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations:

Netscape 4.x.
The cookies are in ~/.netscape/cookies.txt.
Mozilla and Netscape 6.x.
Mozilla's cookie file is also named cookies.txt, located somewhere under ~/.mozilla, in the directory of your profile. The full path usually ends up looking somewhat like ~/.mozilla/default/some-weird-string/cookies.txt.
Internet Explorer.
You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies. This has been tested with Internet Explorer 5; it is not guaranteed to work with earlier versions.
Other browsers.
If you are using a different browser to create your cookies, ‘--load-cookies’ will only work if you can locate or produce a cookie file in the Netscape format that Wget expects.

If you cannot use ‘--load-cookies’, there might still be an alternative. If your browser supports a “cookie manager”, you can use it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the “official” cookie support:

          wget --no-cookies --header "Cookie: name=value"


--save-cookies file
Save cookies to file before exiting. This will not save cookies that have expired or that have no expiry time (so-called “session cookies”), but also see ‘--keep-session-cookies’.


--keep-session-cookies
When specified, causes ‘--save-cookies’ to also save session cookies. Session cookies are normally not saved because they are meant to be kept in memory and forgotten when you exit the browser. Saving them is useful on sites that require you to log in or to visit the home page before you can access some pages. With this option, multiple Wget runs are considered a single browser session as far as the site is concerned.

Since the cookie file format does not normally carry session cookies, Wget marks them with an expiry timestamp of 0. Wget's ‘--load-cookies’ recognizes those as session cookies, but it might confuse other browsers. Also note that cookies so loaded will be treated as other session cookies, which means that if you want ‘--save-cookies’ to preserve them again, you must use ‘--keep-session-cookies’ again.


[링크 : http://www.gnu.org/software/wget/manual/html_node/HTTP-Options.html]

--save-cookies 로 저장한 youtube 쿠키이다. 음.. 무슨 의미지 -ㅁ-?
$ cat yt.cookie
# HTTP cookie file.
# Generated by Wget on 2010-04-10 11:57:59.
# Edit at your own risk.

.youtube.com    TRUE    /       FALSE   1586228278      PREF    f1=50000000&f2=8000000
.youtube.com    TRUE    /       FALSE   1291604278      VISITOR_INFO1_LIVE      FNfBrJzTQY

$ wget "http://www.youtube.com/watch?v=mdljV2uEs1A" --save-cookies yt.cookie
$ wget --load-cookies=yt.cookie "http://v22.lscache2.c.youtube.com/videoplayback?ip=211.0.0.0&sparams=id%2Cexpire%2Cip%2Cipbits%2Ci
tag%2Calgorithm%2Cburst%2Cfactor&fexp=904405%2C900037&algorithm=throttle-factor&itag=35&ipbits=8&burst=40&sver=3&expire=1270890000&key=yt1&signature=5C611E956FB97E74D3435F8815A7A2376E3C61D4.C2C593CDDE0C15671462BB13C5404EC6927F7F7D&factor=1.25&id=99d963576b84b350" -O file.mp4
--2010-04-10 12:33:26--  http://v22.lscache2.c.youtube.com/videoplayback?ip=211.0.0.0&sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Calgorithm%2Cburst%2Cfactor&fexp=904405%2C900037&algorithm=throttle-factor&itag=35&ipbits=8&burst=40&sver=3&expire=1270890000&key=yt1&signature=5C611E956FB97E74D3435F8815A7A2376E3C61D4.C2C593CDDE0C15671462BB13C5404EC6927F7F7D&factor=1.25&id=99d963576b84b350
Resolving v22.lscache2.c.youtube.com... 74.125.167.33
접속 v22.lscache2.c.youtube.com|74.125.167.33|:80... 접속됨.
HTTP request sent, awaiting response... 200 OK
Length: 15708973 (15M) [video/x-flv]
Saving to: `file.mp4'

100%[==========================================================================>] 15,708,973   105K/s   in 2m 0s

2010-04-10 12:35:27 (128 KB/s) - `file.mp4' saved [15708973/15708973]

URL이 너무 길어서 별도의 파일이름을 지정해주지 않으면
Cannot write to `videoplayback?ip=211.0.0.0&sparams=id,expire,ip,ipbits,itag,algorithm,burst,factor&fexp=904405,900037&algorithm=throttle-factor&itag=35&ipbits=8&burst=40&sver=3&expire=1270890000&key=yt1&signature=5C611E956FB97E74D3435F8815A7A2376E3C61D4.C2C593CDDE0C15671462BB13C5404EC6927F7F7D&factor=1.25&id=99d963576b84b350' (File name too long).
주소가 255자를 넘어서는 관계로, 파일 이름으로 하기에는 너무 길다고 에러가 발생한다.
반드시 -O filename 으로 별도의 이름을 지정해 주어야 한다.

2010/04/09 - youtube 동영상 페이지 fmt_map, fmt_url_map, fmt_list, fmt_stream_map
[링크 : http://kldp.org/node/75150]


Posted by 구차니