Vizsage: Leveraging the Bittorrent Underground for semantic data and media

I just ran across a pretty interesting site called coverbrowser.com, which uses a variety of image APIs to pull in comic book, game, book, music, movie and other cover art. (Read the technical details here).

It reminded me of an idea I had while back but which I will never get around to implementing --- maybe you will, or for all I know someone's already been doing for years. (Sidenote: I've had some people express interest in this, and have worked out some parts of it, but just don't have the time to complete it right now. If you'd like to help develop it get in touch).

Many of the movie and music torrents on the, ahem, "Unauthorized Evaluation Copy" bittorrent sites contain hi-res scans of their cover art, and all of the major bittorrent sites maintain topic-specific RSS feeds.

As long as the torrent indexes the files individually (as not as an opaque .zip or .rar) -- and most do index individually -- you can target specific files within the torrent. I don't know whether you could chop all the large-file-size copyright-problematic files that you don't want out of the torrent, or whether you'd have to hack Azureus or other bittorrent client (instructing it to get only *.{png,gif,jpg,jpeg,bmp,tiff,tff} or what have you). Either way, you would then only be pushing out the bandwidth required to grab the photos and not the accompanying multi-megabyte file, and you would only be getting the information to which you assumedly have fair use rights for.

So you'd set up a daemon process that would

watch the Movies and the Music RSS feeds off whichever or all of the sites,
identify albums whose cover art you lack,
pull in the bittorrent,
but download only the cover art
and perhaps also process any of the accompanying semantic data

You might have to get yourself a seedbox to make this work, but they're not unaffordable.

I think this would lead to a large stream of incoming cover art for music and other media files, complete with a reasonable amount of semantic information.

There's probably a lot of other crowdsourced semantic data flowing through the underground, if someone actually created such a torrenting robot. (And yes, I feel yucky using "crowdsourced" and "semantic data" in the same sentence).

Labels: art, bittorrent, cover, data, dataset, feed, legal, legitimate, movie, music, piracy, rss, semantic, underground

Posted by flip on Thursday, December 13, 2007 at 12/13/2007 12:33:00 AM | Permalink

Vizsage

Building tools to help organize, explore and visualize massive raw information streams

Leveraging the Bittorrent Underground for semantic data and media

1 comment

Search

Previous posts

Archives

Links

About me