Search My Blog

ruby (4) web (4) ruby on rails (3) security (3) GPG (2) OpenPGP (2) RFC (2) linux (2) rails (2) shell (2) sysadmin (2) Exchange (1) GIT. (1) IMAP (1) RCS (1) SSH (1) SVN (1) bundle (1) cURL (1) command line (1) crack (1) css (1) developer (1) email (1) fail (1) hack (1) http (1) mac (1) network (1) password (1) regular expression (1) script (1) subversion (1) terminal (1) textmate (1) tip (1) vim (1)

Sunday, December 18, 2011

Power to the terminal command line: download any video by yourself

Those that, like me, don't have a flat rate for the Internet connection, know how careful you have to be watching to an interesting online video, like a video tutorial or a webinar, when the site doesn't allow you to review it offline, because if you have to watch it again later, your time/volume price limit approaches as quickly as larger is the video size.

However, the fact that a website does not allow the download of an online video is not quite true: your browser has to download it indeed, and it even temporarily stores it for the time that it needs it.
If this does surprise you, you might find useful to take a closer look at what does your browser when you click to watch a video online.

The video, like any other resource on the web page, has to be linked in the page your browser is showing: this means that parsing the code of the page it has to find its URL.
But if you look at the page source sometimes you are not able to find it, what you see instead is something like:
<script src="">
<embed allowscriptaccess="always" flashvars="file= 8917218&amp;autostart=false&amp;config=" height="180" menu="false" pluginspage="" quality="high" src="" type="application/x-shockwave-flash" width="320"></embed>

This means the video location will be revealed dynamically when the video player's script is executed.

Using some tools to debug the browser behaviour, like Safari's Web Inspector, we can easily find the exact GET statement used to retrieve the video because most of the time it is the largest object in the page, so sorting the request by objects size or by loading time it will be the first result:|86291812c0025268a81210a7c89cfbac&crap=mp4?start=0&id=playerId&client=FLASH%20MAC%2011,1,102,55&version=4.3.132&width=710

Taking a closer look at this GET, we will see that a query is performed using the parameters given in the above script:

file: modern-benoni-1-the-amanovs-thrilling-battle
token: 8917218|86291812c0025268a81210a7c89cfbac

Besides, every GET declares a "User-Agent" and a "Referer Page", used by the server to know the kind of browser and where this request comes from.
If the site needs to know something more about who is making the request, it will expects a cookie, a small piece of text stored by the browser to uniquely identify the current user or browsing session. You can find out this cookie in the Web Inspector by matching the web site hostname:

Now you have all the information needed to ask by yourself the video to the web server, you just need some tool to manually send out this GET: wget or cURL, to name only the best known.

With curl these are the options to set the User-Agent, the Referer and the cookie:

-b/--cookie (The data should be in the format "NAME1=VALUE1; NAME2=VALUE2")
-o/--output Write output to instead of stdout.

By putting all together we can test if it works by downloading manually the file:

$ curl -e '' -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.52.7 (KHTML, like Gecko) Version/5.1.2 Safari/534.52.7' -o test_GET.mp4 -b'__utmz=1.7324923106.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=1; __utmb=; __utma=1.721394078.1214124012.1214859116.8217491996.3; meebo-cim-session=3142ff122e11209bb12c; mcim=7281478shfjf8g1ASASGKJgakgaskg88&amp;AgANGJJGAKKhagsjgasgkgkasgkkgahsAFSkgjj73263853930139501915hgn3031nc90000128tnguj%2hjfd9hgh32h88813jkgkgkkgk138; _chartbeat2=98124091hhhfaksn.1849812495199; ki_t=1324118915713%3B1324118915713%3B1324118915713%3B1%3B1; ki_u=ababa666-12de-5121-0241-fef88ccac10a; view_counter=hsahgk3r7881r7v318h13921r8; __qca=P0-841249119-8817214891285; cal=hdasjsfhjhjAJSFJHFAJ28389391239r9vkdjskfjkjfakj34h1jhjGKGKjkj3k12j3rkvjk13jrkvjk1j2kjgkjk21kjg8214781249fa99214899982191248SHHJEJBedbeuEBbeF; tf_login_id=xhejak; PHPSESSID=sakjgjk2383gakj3t98gj1jj11; __utmz=21314121.7381273182.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=56795640; __utmb=21314121.1.10.2138921496; __utma=12144141.521442111.1214148871.8912481811.1142819189.1; __utmxx=72121411.; __utmx=21314121.' ';streamer=lighttpd&amp;token=8917218|86291812c0025268a81210a7c89cfbac&amp;crap=mp4?start=0&amp;id=playerId&amp;client=FLASH%20MAC%2011,1,102,55&amp;version=4.3.132&amp;width=710'

% Total % Received % Xferd  Average Speed   Time    Time     Time  Current
                            Dload  Upload   Total   Spent    Left  Speed
5 22.8M 5 1182k    0     0  35135      0  0:11:20  0:00:34  0:10:46 12231^C

Keep in mind that the cookies and the token used might be useful only within the current browser session, so you can make the previous test only during the current session of the browser with the web site.

No comments:

Post a Comment

If you find this useful please leave a feedback :)