Thursday, December 13, 2007

Old-School Shop Guide

I rediscovered this super-compact reference-and-tool-and-measuring device while looking for a tool. It is jam-packed with handy information for anyone doing things mechanical or woodworking. I got this from a family friend of a family friend -- I bought their lathe after her husband, an avid (and skilled) woodworker, had passed away. She wanted his tools to go to someone who would love them and use them, which was me, which I do. The lathe is good, but I've discovered after the fact that the throwins were the best part. The chisels are *top notch*, but still pale in comparison to getting "his old woodworking magazines." This turned out to be almost every issue of Fine Woodworking magazine, beginning in its first year of publication; somewhere in the stack was this nifty Shop Guide.
I think this thing is so neat -- so much information in such a small space. My own mechanical data reference table (more here) has more numbers but less intrinsic functionality ... What's really neat about this shop guide is how they used the shape of the guide itself as a tool. Print this onto heavy cardstock and punch brother punch with care... enjoy!

Labels: , , , , , , , , , , ,

Leveraging the Bittorrent Underground for semantic data and media

I just ran across a pretty interesting site called coverbrowser.com, which uses a variety of image APIs to pull in comic book, game, book, music, movie and other cover art. (Read the technical details here).

It reminded me of an idea I had while back but which I will never get around to implementing --- maybe you will, or for all I know someone's already been doing for years. (Sidenote: I've had some people express interest in this, and have worked out some parts of it, but just don't have the time to complete it right now. If you'd like to help develop it get in touch).

Many of the movie and music torrents on the, ahem, "Unauthorized Evaluation Copy" bittorrent sites contain hi-res scans of their cover art, and all of the major bittorrent sites maintain topic-specific RSS feeds.

As long as the torrent indexes the files individually (as not as an opaque .zip or .rar) -- and most do index individually -- you can target specific files within the torrent. I don't know whether you could chop all the large-file-size copyright-problematic files that you don't want out of the torrent, or whether you'd have to hack Azureus or other bittorrent client (instructing it to get only *.{png,gif,jpg,jpeg,bmp,tiff,tff} or what have you). Either way, you would then only be pushing out the bandwidth required to grab the photos and not the accompanying multi-megabyte file, and you would only be getting the information to which you assumedly have fair use rights for.

So you'd set up a daemon process that would

  • watch the Movies and the Music RSS feeds off whichever or all of the sites,
  • identify albums whose cover art you lack,
  • pull in the bittorrent,
  • but download only the cover art
  • and perhaps also process any of the accompanying semantic data
You might have to get yourself a seedbox to make this work, but they're not unaffordable.

I think this would lead to a large stream of incoming cover art for music and other media files, complete with a reasonable amount of semantic information.

There's probably a lot of other crowdsourced semantic data flowing through the underground, if someone actually created such a torrenting robot. (And yes, I feel yucky using "crowdsourced" and "semantic data" in the same sentence).

Labels: , , , , , , , , , , , , ,

Wednesday, December 5, 2007

Moving from Perl to Python with XML and Templating

Mr. XKCD is correct in this. (My friend Dr. Larsson has been saying this all along). As I'm moving from data munging to data working-with, I've been moving from perl to python. Recommended:
  • lxml is a beautiful interface for dealing with XML in Python. You get XPath and validation and namespaces and all that hooha but you don't have to think hard and you don't have to write SAX stream parsers or walk a DOM path. You just say crap like
    from lxml    import etree   
    from urllib2 import urlopen
    # Load file
    uri   = "http://vizsage.com/apps/baseball/results/parkinfo/parkinfo-all.xml"
    parks = etree.ElementTree(file=urlopen(uri))
    # for each park (<park> tag anywhere in document)
    for (idx, park) in enumerate(parks.xpath('//park')): 
      # dump its id, time of service and name (@attr is XPath for 'corresponding attribute')
      print ' -- '.join(
        [ s+': '+','.join(park.xpath('@'+s)) 
          for s in ('parkID', 'beg', 'end', 'games', 'name',) 
        ])
    
    and you get this in return
    parkID: MIL01 -- beg: 1878-05-14 -- end: 1878-09-14 -- games: 25   -- name: Milwaukee Base-Ball Grounds
    parkID: MIL02 -- beg: 1884-09-27 -- end: 1885-09-25 -- games: 14   -- name: Wright Street Grounds
    parkID: MIL03 -- beg: 1891-09-10 -- end: 1891-10-04 -- games: 20   -- name: Borchert Field
    parkID: MIL04 -- beg: 1901-05-03 -- end: 1901-09-12 -- games: 70   -- name: Lloyd Street Grounds
    parkID: MIL05 -- beg: 1953-04-14 -- end: 2000-09-28 -- games: 3484 -- name: County Stadium
    parkID: MIL06 -- beg: 2001-04-06 -- end: NULL       -- games: 486  -- name: Miller Park
  • lxml.objectify is the replacement for perl's XML::Simple we've all been looking for. You just say gimme and it pulls in an XML file as the corresponding do-what-I-mean data structure (identical elements become arrays, tree leaves become atoms, tree structures become maps).
  • Kid Templating is a great solution for XML transmogrifying, and I think I like it much better than XSLT. It looks perfect for your "Anything => XML" purposes, which is the hard part. I suppose XSLT can do the "XML => anything" tasks but those always look like stunts; the whole point of XML is that "Turn XML into whatever" tasks are easy, especially given a simple API like lxml or lxml.objectify.

Labels: , , , , , , , ,