« Home | Hourly Weather data for each Retrosheet game » | Retrosheet Eventfile Inconsistencies II » | The Asdrubal Carrera Hall of Fame » | Retrosheet Eventfile Inconsistencies » | Rules of thumb for Rack Leave in Scrabble » | as3mathlib (formerly WIS math libraries) » | Subway Geography and Geometry » | Patches to the AS3 Cookbook Code » | How to make a patch using diff » | Flex Demo: Matrix Math (and an error in the Action... »

Moving from Perl to Python with XML and Templating

Mr. XKCD is correct in this. (My friend Dr. Larsson has been saying this all along). As I'm moving from data munging to data working-with, I've been moving from perl to python. Recommended:
  • lxml is a beautiful interface for dealing with XML in Python. You get XPath and validation and namespaces and all that hooha but you don't have to think hard and you don't have to write SAX stream parsers or walk a DOM path. You just say crap like
    from lxml    import etree   
    from urllib2 import urlopen
    # Load file
    uri   = "http://vizsage.com/apps/baseball/results/parkinfo/parkinfo-all.xml"
    parks = etree.ElementTree(file=urlopen(uri))
    # for each park (<park> tag anywhere in document)
    for (idx, park) in enumerate(parks.xpath('//park')): 
      # dump its id, time of service and name (@attr is XPath for 'corresponding attribute')
      print ' -- '.join(
        [ s+': '+','.join(park.xpath('@'+s)) 
          for s in ('parkID', 'beg', 'end', 'games', 'name',) 
        ])
    
    and you get this in return
    parkID: MIL01 -- beg: 1878-05-14 -- end: 1878-09-14 -- games: 25   -- name: Milwaukee Base-Ball Grounds
    parkID: MIL02 -- beg: 1884-09-27 -- end: 1885-09-25 -- games: 14   -- name: Wright Street Grounds
    parkID: MIL03 -- beg: 1891-09-10 -- end: 1891-10-04 -- games: 20   -- name: Borchert Field
    parkID: MIL04 -- beg: 1901-05-03 -- end: 1901-09-12 -- games: 70   -- name: Lloyd Street Grounds
    parkID: MIL05 -- beg: 1953-04-14 -- end: 2000-09-28 -- games: 3484 -- name: County Stadium
    parkID: MIL06 -- beg: 2001-04-06 -- end: NULL       -- games: 486  -- name: Miller Park
  • lxml.objectify is the replacement for perl's XML::Simple we've all been looking for. You just say gimme and it pulls in an XML file as the corresponding do-what-I-mean data structure (identical elements become arrays, tree leaves become atoms, tree structures become maps).
  • Kid Templating is a great solution for XML transmogrifying, and I think I like it much better than XSLT. It looks perfect for your "Anything => XML" purposes, which is the hard part. I suppose XSLT can do the "XML => anything" tasks but those always look like stunts; the whole point of XML is that "Turn XML into whatever" tasks are easy, especially given a simple API like lxml or lxml.objectify.

Labels: , , , , , , , ,