Retrosheet Eventfile Inconsistencies
Here are a few inconsistent records in the retrosheet.org event files of 2007 Sep 23. I'm using chadwick and not the retrosheet DOS utils, but I think I've source all these to the original event files. Weird Attendance in gamelog GL1941.TXT:
WS1194107220 (WS1 vs DET) has '1500 e' as its attendanceWeird Start Time in eventfiles: Many daynight records lack an AM or PM. I assume the time mapping of times are as follows:
daynight start_time 24hr Time D or N 0 Unknown D 1000..1259 1000h to 1259h D 100..459 1300h to 1659h N 500..1150 1700h to 1359hIn that case, here are some weird start times reported by cwgame:
- Negative start time: 2003 D 0 -195 SEA 2003 04 15 SEA200304150 info,starttime,-2:05PM info,daynight,day - No daynight flag: 1998 D 0 506 LAN 1998 08 30 LAN199808300 info,starttime,5:06 -- no daynight -- - Plainly inconsistent daynight flag: 1985 D 1 605 CIN 1985 06 21 CIN198506211 info,starttime,6:05PM info,daynight,day 1960 N 0 135 BOS 1960 04 19 BOS196004190 info,starttime,1:35PM info,daynight,night - Second half of a double header, listed as a day game despite 5pm or later start: 1966 D 2 507 BAL 1966 10 02 BAL196610022 info,starttime,5:07PM info,daynight,day 2001 D 2 500 PHI 2001 05 27 PHI200105272 info,starttime,5:00PM info,daynight,day 2001 D 2 519 PIT 2001 06 03 PIT200106032 info,starttime,5:19PM info,daynight,day 2001 D 2 625 MIN 2001 05 26 MIN200105262 info,starttime,6:25PM info,daynight,day 2001 D 2 719 CHA 2001 09 04 CHA200109042 info,starttime,7:19PM info,daynight,day 2001 D 2 738 CHN 2001 08 20 CHN200108202 info,starttime,7:38PM info,daynight,day 2001 D 2 752 PIT 2001 09 03 PIT200109032 info,starttime,7:52PM info,daynight,day 2001 D 2 753 SLN 2001 08 03 SLN200108032 info,starttime,7:53PM info,daynight,day - Start times that appear to be after midnight (this could be correct): 1996 N 1 35 CIN 1996 06 25 CIN199606251 info,starttime,0:35 info,daynight,night 1998 N 0 105 LAN 1998 06 13 LAN199806130 info,starttime,1:05 info,daynight,night 1966 N 2 1207 BAL 1966 06 08 BAL196606082 info,starttime,12:07AM info,daynight,nightThese eventfile games have more than one "info,daynight" record
ATL197004150 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197004160 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197005260 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197006191 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197006192 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197006200 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197006210 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197007031 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197007032 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197007050 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197009220 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197009230 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197009240 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197009250 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197009260 info,starttime,0:00PM info,daynight,day info,daynight,night ATL197009270 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197006220 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197008031 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197008032 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197008040 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197009010 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197009110 info,starttime,0:00PM info,daynight,day info,daynight,night HOU197009130 info,starttime,0:00PM info,daynight,day info,daynight,nightThis eventfile game is missing an "info,daynight" record:
LAN199808300 info,starttime,5:06File Structure in eventfile 2001HOU.EVN:
2001HOU.EVN lacks a trailing newline (unix commands hate this).Here are the unix commands I used to dump all that info. Sorry for the one-linerism.
# How many have a negative starttime? grep 'info,starttime,-' *.EV* # How many have missing or extra "info,daynight" fields? # -- pull out the info, daynight and starttime records in order # -- slurp the whole file as one giant string with internal linebreaks; # -- split each stretch following an id,XXXX record into one line # -- dump lines that have none or more than one daynight record cat *.EV* | egrep '^(id,|info,daynight|info,starttime)' | \ perl -e '$_ = join(" ",<>); s/[\r\n]+/!!!/g; @games= (split /id,/, $_); shift @games; for $game (@games) { $game =~ s/!!!/\t/g; print "$game\n" if (($game !~ m/daynight/) || ($game =~ m/daynight.*daynight/)); }' # How many have a start_time and daynight_flag that disagree? # -- use cwgame to pull off the gameID,start_time,daynight_flag records; # put it into a temporary file # -- Use a big stupid regex to find # . start_time that is > 500 and marked day # . start_time that is < 500 and marked night # . start_time that is > 1200 and marked night # . start_time that is < 100 # . start_time that is negative ( for ((year=1957;$year<=2006;year++)) ; do \ for teamfile in ${year}*.[Ee][Vv]* ; do \ cwgame -y $year -f '0-0,4-4,6-6' $teamfile 2>/dev/null ; \ done; \ done ) > /tmp/starttimeIDs.txt cat /tmp/starttimeIDs.txt | \ perl -ne '(m/"(\w\w\w)(\d\d\d\d)(\d\d)(\d\d)(\d)",(12\d\d|[1234]\d\d|\d\d|[1-9]|-\d+),"(N)"/ || m/"(\w\w\w)(\d\d\d\d)(\d\d)(\d\d)(\d)",((?:5|6|7)\d\d|.*-.*|\d\d|[1-9]),"(D)"/) && printf "%s %s %5d %s %s %s %s\n", $7, $5, $6, $1, $2, $3, $4;' | sort
Labels: baseball, bug, consistency, data, error, format, mining, retrosheet, retrosheet.org