« Home | Rules of thumb for Rack Leave in Scrabble » | as3mathlib (formerly WIS math libraries) » | Subway Geography and Geometry » | Patches to the AS3 Cookbook Code » | How to make a patch using diff » | Flex Demo: Matrix Math (and an error in the Action... » | Emacs modes for Flex » | Adobe Flex and Custom Namespace / manifest.xml » | How to use exuberant CTAGS with ActionScript and Flex » | Resources for learning Flex »

Retrosheet Eventfile Inconsistencies

Here are a few inconsistent records in the retrosheet.org event files of 2007 Sep 23. I'm using chadwick and not the retrosheet DOS utils, but I think I've source all these to the original event files. Weird Attendance in gamelog GL1941.TXT:
  WS1194107220 (WS1 vs DET) has '1500 e' as its attendance
Weird Start Time in eventfiles: Many daynight records lack an AM or PM. I assume the time mapping of times are as follows:
   daynight  start_time   24hr Time
   D or N    0            Unknown
    D        1000..1259   1000h to 1259h
    D        100..459     1300h to 1659h
    N        500..1150    1700h to 1359h
In that case, here are some weird start times reported by cwgame:
  - Negative start time:
      2003 D 0  -195 SEA 2003 04 15        SEA200304150    info,starttime,-2:05PM   info,daynight,day
  - No daynight flag:
      1998 D 0   506 LAN 1998 08 30        LAN199808300    info,starttime,5:06      -- no daynight --
  - Plainly inconsistent daynight flag:
      1985 D 1   605 CIN 1985 06 21        CIN198506211    info,starttime,6:05PM    info,daynight,day
      1960 N 0   135 BOS 1960 04 19        BOS196004190    info,starttime,1:35PM    info,daynight,night
  - Second half of a double header, listed as a day game despite 5pm or later start:
      1966 D 2   507 BAL 1966 10 02        BAL196610022    info,starttime,5:07PM    info,daynight,day
      2001 D 2   500 PHI 2001 05 27        PHI200105272    info,starttime,5:00PM    info,daynight,day
      2001 D 2   519 PIT 2001 06 03        PIT200106032    info,starttime,5:19PM    info,daynight,day
      2001 D 2   625 MIN 2001 05 26        MIN200105262    info,starttime,6:25PM    info,daynight,day
      2001 D 2   719 CHA 2001 09 04        CHA200109042    info,starttime,7:19PM    info,daynight,day
      2001 D 2   738 CHN 2001 08 20        CHN200108202    info,starttime,7:38PM    info,daynight,day
      2001 D 2   752 PIT 2001 09 03        PIT200109032    info,starttime,7:52PM    info,daynight,day
      2001 D 2   753 SLN 2001 08 03        SLN200108032    info,starttime,7:53PM    info,daynight,day
  - Start times that appear to be after midnight (this could be correct):
      1996 N 1    35 CIN 1996 06 25        CIN199606251    info,starttime,0:35      info,daynight,night
      1998 N 0   105 LAN 1998 06 13        LAN199806130    info,starttime,1:05      info,daynight,night
      1966 N 2  1207 BAL 1966 06 08        BAL196606082    info,starttime,12:07AM   info,daynight,night
 
These eventfile games have more than one "info,daynight" record
  ATL197004150    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197004160    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197005260    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006191    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006192    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006200    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006210    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197007031    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197007032    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197007050    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009220    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009230    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009240    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009250    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009260    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009270    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197006220    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197008031    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197008032    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197008040    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197009010    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197009110    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197009130    info,starttime,0:00PM   info,daynight,day       info,daynight,night
This eventfile game is missing an "info,daynight" record:
  LAN199808300    info,starttime,5:06
File Structure in eventfile 2001HOU.EVN:
  2001HOU.EVN lacks a trailing newline (unix commands hate this).
Here are the unix commands I used to dump all that info. Sorry for the one-linerism.
# How many have a negative starttime?
grep 'info,starttime,-' *.EV*

# How many have missing or extra "info,daynight" fields?
# -- pull out the info, daynight and starttime records in order
# -- slurp the whole file as one giant string with internal linebreaks;
# -- split each stretch following an id,XXXX record into one line
# -- dump lines that have none or more than one daynight record
  cat *.EV* | egrep '^(id,|info,daynight|info,starttime)' | \
    perl -e '$_ = join(" ",<>); s/[\r\n]+/!!!/g; @games= (split /id,/, $_);
      shift @games;
      for $game (@games) {
          $game =~ s/!!!/\t/g; print "$game\n" if (($game !~ m/daynight/) || ($game =~ m/daynight.*daynight/));
      }'

# How many have a start_time and daynight_flag that disagree?
# -- use cwgame to pull off the gameID,start_time,daynight_flag records;
#    put it into a temporary file    
# -- Use a big stupid regex to find
#    . start_time that is >  500 and marked day
#    . start_time that is <  500 and marked night 
#    . start_time that is > 1200 and marked night 
#    . start_time that is <  100 
#    . start_time that is negative
( for ((year=1957;$year<=2006;year++)) ; do \
     for teamfile in ${year}*.[Ee][Vv]* ; do \
     cwgame -y $year -f '0-0,4-4,6-6' $teamfile 2>/dev/null ; \
     done; \
  done ) > /tmp/starttimeIDs.txt
cat /tmp/starttimeIDs.txt | \
  perl -ne '(m/"(\w\w\w)(\d\d\d\d)(\d\d)(\d\d)(\d)",(12\d\d|[1234]\d\d|\d\d|[1-9]|-\d+),"(N)"/ ||
    m/"(\w\w\w)(\d\d\d\d)(\d\d)(\d\d)(\d)",((?:5|6|7)\d\d|.*-.*|\d\d|[1-9]),"(D)"/)    &&
    printf "%s %s %5d %s %s %s %s\n", $7, $5, $6, $1, $2, $3, $4;' | sort

Labels: , , , , , , , ,