« Home | 50 years of Baseball Play-by-play data mashed with... » | Time Machine is neat-o, but I want a Time and Spac... » | Old-School Shop Guide » | Leveraging the Bittorrent Underground for semantic... » | Moving from Perl to Python with XML and Templating » | Hourly Weather data for each Retrosheet game » | Retrosheet Eventfile Inconsistencies II » | The Asdrubal Carrera Hall of Fame » | Retrosheet Eventfile Inconsistencies » | Rules of thumb for Rack Leave in Scrabble »

Owning my Metadata

Dear Lazyweb,

I'd like someone to invent a 'Metadata reclaimer': a program to screenscrape all my amazon ratings, flickr tags, facebook posts, etc.

I try, as far as possible, to only use apps that let me keep ownership of my metadata. As our friend pud has remarked, all successful internet enterprises share the same business model: either

  • People pay to Enter Data into your Database (eBay, Google AdWords, Flickr, Second Life, World of Warcraft, IMDB pro, Craigslist), or less defensible,
  • People Enter Data into Your Database For Free while Other People Pay to Get it Out (rapidshare, iTunes Music Store, Pud's Internal Memos; with youtube, myspace, epinions etc viewers pay with the tenuous currency of their ad brain).

There's nothing wrong with that; all these companies levelled their playing field in some fundamental and important way. (Well, nothing wrong unless you're the loathsome gracenote.com (formerly cddb), who turned an open community-generated resource into a closed database, without even the courtesy of a copy to fork from.)

But it's fair to ask that I be able to export my copy of the data I've added to their business asset, and to do so easily.

Sites that play well with others:
  • my del.icio.us tags and bookmarks
  • my bloglines/google reader feeds
  • my librarything.com everything
  • my last.fm history
  • my iTunes playcounts, tags and ratings: mostly, I think?
  • Firefox bookmarks and history

Sites with an 'I gave up my metadata and all I got was this stupid webpage' policy:

  • facebook posts, friends, photos, everthing
  • flickr tags &c
  • amazon recommendations
  • Google calendar mostly no (at least, the last time I tried to sync my address books it was a Giant Pain in the Ass: nothing was durably id'd and recurring events were semantically incorrect. (Yes, I'd love to have 96 separate entries for my Grandmother's birthday!)
  • eBay bids, purchases, ratings
  • Blogger: Blogs, yes if you remote host your site. However, you can't even /list/ the blogger comments you've made, let alone export them.
  • I believe Myspace's engineers can't even spell XML
(I could be wrong about any of these except the last one).

I'm picturing something with a plugin architecture -- the main app handles the screenscraping, authentication, form submission, web crawling and file export details; the plugin supplies URL wildcards and regexp's the data back into semantic structure. With XML export, a motivated plugin author or well-itched user could supply a decent XSLT stylesheet to represent that metadata in a useful local fashion (and with helpful links back to the main site). It would be useful to have plugins (trivial) and stylesheets (no more or less so) even for sites like Last.fm and Library Thing that Do The Right Thing by granting transparent access to your metadata.

Much of this may exist in some form or another; for example the Aperture/iPhoto plugin will apparently sync your flickr and iPhoto tags, and embed the result into the app database. But going from XML => app is more flexible -- and possibly easier -- than the other way 'round.

I one off'ed this a while back for my Amazon ratings, but I just saw where I'd gone from ~350 to ~650 'things rated' since then. I'm hoping the LazyWeb has solved my problem, since I'm not sure where I put those scripts. (Ironic, considering my previous post.)

Labels: , , , , , , , , ,