Please join me at my new location bryankyle.com

Tuesday, February 23, 2010

A Programmer's Toolbox

Every programmer needs to have a scripting language in their toolbox.  The more languages the better, but at least one is a necessity.  For the longest time I've been torn between a few different general purpose scripting languages.  Endlessly debating over the relative strengths and weaknesses of each:  "Python is a fairly ubiquitous, but the libraries aren't consistent."  "Ruby has consistent libraries but there seems to be a steep learning curve before I can be a ninja."  Forgetting, of course that in the end it doesn't matter which one I choose, just so long as it solves a particular problem.

Well, the other day I had just a problem that I needed a scripting language to solve.  I was re-re-reading one of Steve Yegge's terrific blog posts (yes, I've read them a few times).  Specifically it was the one about 10 challenging books every programmer should read.  One of the books he mentioned was Structure and Interpretation of Computer Programs by Harold Abelson and Gerald Sussman.  I'd heard of the book before, and being a fan of lisp I'd always wanted to read it but somehow never got around to it.  I'd even seen the free copy online and the lectures posted on the book's website.  Finally I decided that I'd done enough procrastinating and while I wasn't going to want to read through the book, the lectures might be a good thing to watch when I have some time.  The only problem was I didn't want to have to download each of the 20 torrents and start them up myself.  Clicking on links is for suckers, in case you didn't know.

I needed a way to automate the downloading of all of the torrent files.  I'd heard of a library for Python called Beautiful Soup, but I'd also heard good things about the late _why's Hpricot.  Since I'd already dipped my toes in the Ruby water for a small tool I started working on the other day (more on that another time) I figured I'd try Hpricot.  After about 10 minutes of playing around with the library I had a working script that would scan through the page and download all of the torrents of the .avi versions of the lectures.  Below is the code I wrote, with an explanation to follow.

require 'rubygems'
require 'hpricot'
require 'open-uri'

url = 'http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/'
document = open(url) { |f| Hpricot(f) }

document.search('a') do |a|
  if a[:href] =~ /.avi.torrent/ then
    open(File.basename(a[:href]), 'wb') do |f|
      f.write open(url + a[:href]) { |f| f.read }
    end
  end
end

The above code makes some fairly heavy use of blocks, one of my favourite features of Ruby.  The code is fairly straight forward if you understand that functions and blocks implicitly return the result of their last expression.  Here's a play-by-play of what's going on:

  • Lines 1-3: A few libraries are loaded.
  • Line 6: The web page containing the links I want to read is loaded and parsed.  Since the block being passed returns the result of its last expression the document is returned.
  • Line 8: The document is searched for all of its links.  For each of the elements found the passed block is executed.
  • Line 9: Weed out any links that don't target the files I'm looking for.
  • Line 10: Open a local file to write the contents of the file pointed to by the link.
  • Line 11: Read the target of the link.  The result of the block passed will be written to the local file.

That's it.  15 lines of code and I didn't have to click a single one of the links.  Yes, I probably could have done it faster manually, but by taking the time to write a script this little task is a great saw sharpening exercise and the amount of productivity gained from that is well worth the investment of time.