Liveblogging SABR 40!
Just wrapped up two more research presentations:
Sanctioned Post-Season Series by Marty Pankin and Mike Canton
Using Marcel the Monkey to Help Understand Different Eras in Baseball by Andy Andres
These are kind of raw reports, and I apologize for any typos. I’m typing as fast as I can most of the time. For those of you who have never been to one of these things, each research presentation is 20 minutes long, with 5 minutes for questions, and then bam, the schedule turns over to the next one.
Sanctioned Post-Season Series
Marty Pankin and Mike Canton
These sanctioned post-season series between local rivals were very important. They often drew over 40,000 fans, the newspapers covered them extensively, and players took the games very seriously (often getting into fights). This is very overlooked part of baseball history, and its very significant to be getting the play by play info up on Retrosheet at last. The players’ records are valid and meaningful, as well.
These types of games took place between rivals within cities like the Cubs and White Sox. They all took place between 1905-1942, all after 1917 in Chicago. 24 games in Chicago, 3 Ohio (1910, 1911, 1917), 2 Giants vs. Yankees 1910, 1914, and other such examples.
190 games total count as sanctioned.
We now have PBP for 155 of the 190 games.
Unsanctioned games were played as early as 1882.
What’s the difference between sanctioned and unsanctioned? Sanctioned games were certified by the National Commission.
National Commission formed in 1905
-established World Series conditions
-authorized other interleague series
Sanctioned Series had to be:
-Played under same condition as WS
-best of seven (or nine in 1921)
-no games after the winner determined
-umpires chosen by the national commission
-rosters could only include those who played in regular season
Basically these were not “exhibition” games and were taken seriously.
Chicago played one best of 15… except there was one rainout and it was tied 7-7, and the player kts only ran through October 15th
The White Sox proposed a doubleheader to break the tie, but the Cubs didn’t want that, as Joe Tinker was already planning to leave town to get married in Kansas City so it ended up tied
Unsanctioned example 1913 St. Louis — two worst teams in baseball, ended in a tie so we didn’t find out who was the worst
Comparative attendance versus World Series versus the City Series
1908 – 1917 very comparable in attendance, not quite as good as WS for Cubs, not as good for White Sox
1928-1933 attendance was down except on weekend games, plus the Depression
1935 – 1939 — attendance definitely much worse for City Series than world series, which was likely why these series’ did not continue after WWII
The Cubs did actually blow a 3-0 lead before the red sox in 2004. In the 1912 Chicago series. White Sox Ed Walsh was the pitching star. Over the course of 10 days he started 4 times plus 2 relief appearances. And after that he went downhill.
Many tales of Cubs failure follow, and then the story of Grover Cleveland Loudermilk.
There would have been more New York city series except that the Giants kept getting into the World Series which put the kibosh on playing their local rivals.
Using Marcel the Monkey to Help Understand Different Eras in Baseball
Andy is a college professor at Boston University, but he also teaches a course in sabermetrics at Tufts University. They all read Moneyball and they all read the famous article by Stephen Jay Gould about how there will not be any more .400 hitters. Gould claims in the article that batting average has stayed steady throughout the ages. But is that really so? Well, there is some consistency but the data is also pretty noisy. Even noisier is look at runs scored per game averaged by year.
Andy shows a chart of all runs scored over all leagues for all time. And it doesn’t look that consistent. The 19th century had many different changes in the rules and it shows drastic inconsistency so we’re going to cut that out and start around 1901.
We look at runs per game because that’s the currency of the game. Looking at the chart from 1900 on, it’s still a lower run scoring environment in the deadball era. Trick pitches, dirty ball, large stadiums, etc. Then came the live ball era, Babe Ruth, and Ray Chapman dies and they decide to start replacing the baseball with clean balls (and also it doesn’t get as squishy) and records get shattered. 1920-1940 seems to be a grouping.
Then the data starts to get sketchy about how to group things. 1941-1960 we had the war years changing things drastically, plus integration. War years definitely make it dicey.
And then what do we call 1961 to the present? The expansion era might be counted? Questionable that expansion always leads to run scoring environments. The DH era? The free agency era?
Most recently the mid-90s to the present, Interleague and/or Wild Card Era? The Steroid Era, the Power Era.
Next Andy showed a graph splitting out the run scoring trends by league. Big difference between AL and NL in the 1930s. The separation of the leagues in the DH era didn’t actually start to have a visible difference until 1979 (even though started in 1973).
Home run trend upward and a clear pattern in recent years. Strikeouts have gone up, too. They sure seem to correlate. (R-squared value 0.7202)
Now look at graph of sac bunts, stolen bases, and home runs on one graph. And then each team averaged out since 1900.
Marcel the Monkey — he is an actual monkey, has appeared on Friends and in the movie Outbreak. He is the inspiration for a baseball projection system:
Marcel is a projection system developed and described by Tom Tango (www.tangotiger.net/marcel) “so simple a monkey can do it”
-Takes 3 previous years of data for a player
-regress these data to the league average for those years
-perform a simple age adjustments (29 years old for peak performance)
Simple, but the minimal needed to perform good baseball projections
Andy took data from 1901-2009, computed batting Marcels, got 29663 projections, and then found the following:
Hank Greenberg was projected to hit 42 hr in 1941. Got hurt and only hit 2.
Lou Gehrig, same thing.
Babe Ruth and Harmon Killebrew also did it the other way.
Andres Galaragga1993 up.
Sooo, if you took all the marcels, which year would be the year that would be the most off from the projections?
The audience shouted out many possibilities, 1969, 1931, and many others. But the worst? Was 1981. Strike year. Severe decrease because of lost ABs. But that also gave a ringing effect.
Some other years 1931 definitely stuck out and 1947 as well, but they average out to zero.
1969 does show way up in slugging.
1946 drops. Hitters who stayed then suddenly had to face pitchers coming back from the war.
One suggestion from the audience: can you pull out DHs, pinch-hitters, etc and see if that takes out some of the noise? Andy suggests someone try that for next year, and make other Marcel projections.
(Did you enjoy reading this blog entry? Please consider buying me a hot dog.)