After only making it to one research presentation yesterday, I hit three in a row today. I was too fatigued upon waking this morning to make it to the Media Panel. Having made myself rather ill last year by pushing too hard and doing too much (at all conventions, not just SABR’s), I made the decision to go back to sleep and hope that the audio or video of the panel will be online later.
Including yesterday, here are four of the RPs I saw:
* What About Solly Hemus? (Mark Armour)
* Analyzing Batter Performance Against Pitcher Clusters (Vince Gennaro)
* Baseball in the Age of Big Data: Why the Revolution Will Be Televised (Sean Lahman)
* Statistical Predictors of MLB Players’ Proneness to Long Hitting Streaks (Alan Reifman and Trent McCotter)
WHAT ABOUT SOLLY HEMUS?
In this presentation, Mark Armour basically looked at the stories surrounding Solly Hemus and the charges of racism levied against him by players like Bob Gibson and Curt Flood. In the end it seems Hemus’s greater deficiency may have been he was not a good manager and not a good motivator of people who was strongly disliked by most of his players, regardless of race or ethnicity, and that his outdated methods of management (berating players, belittling them, and spewing sarcasm) and his outdated modes of behavior (players calling each other racial epithets, whether in jest or in bench jockeying) could not, in the context of the times, be interpreted by black players as anything BUT racism, regardless of Hemus’s actual internal climate.
Hemus was one of the last player-managers, and he managed the Cardinals in 1959 and 1960, then was replaced by Keane in 1961.
Much of the debate centers on an on-field confrontation with pitcher Bennie Daniels on May 3, 1959. Like many baseball rhubarbs, it involved knockdown pitches and verbal retaliation, during which Hemus, as player-manager from the batters box, called Daniels a “black bastard” (by Hemus’s own admission). A brawl ensued. This incident was written about in The Long Season by Jim Brosnan, the ground-breaking baseball player memoir that predated Jim Bouton’s Ball Four by several years. Brosnan wrote that Hemus was “hated” by all his players. (Hemus, on the other hand, reportedly said, “If you think his book’s funny, you should see him pitch.”)
Armour provided many examples of Hemus’s managerial decisions and how they may have not been the best, especially in contrast to the decisions made by the man who replaced him, Johnny Keane, who got a lot more out of his ballclub. In particular, as described by Gibson and Flood, both of them were treated badly by Hemus, and when treated well by Keane, achieved star-level performances. (The actual stats showed both of them on the upswing before Hemus’s dismissal, though they both credited Keane with their improvements.) Gibson felt that Hemus thought his control problems were because he was “stupid”–that the problem was a mental deficiency rather than a physical one. Is that racism or bad management? Or possibly both, with preconceived notions leading to faulty evaluation and bad decision-making? Armour emphatically stated that he did not feel qualified to answer the question “was Hemus racist?” I feel that may simply be too simplistic a question to ask, and that reducing the debate to labeling is never useful. What was illuminating was examining an era of history in which both the culture of baseball and race relations in the US were undergoing parallel upheavals.
After moving on from the management job, Hemus coached with the Indians and then one year in the minors before starting an oil business. In 1968 he wrote a letter to Curt Flood, later reproduced in Flood’s book, essentially saying, “I misjudged you, you’re a fine player.” In 1992 Hemus spoke with Gibson and tried to explain the Daniels incident as a method for firing up the team. (Gibson called bullshit.) In 1994 Hemus spoke to David Halberstam for his book on the 1964 season, and tried explaining that racial epithets were merely the common parlance of his baseball era, including himself being called “Jew” because of his name. (Flood and Gibson didn’t buy it.)
As historians, that is where we have to leave it. Those who are storytellers, those who would make a TV movie of Hemus’s life, though, there are two possible arcs here. One is the optimist’s view: that Hemus had his eyes opened to the wrongness of racism and the way it blinded him to the worth of men around him, and he then tried to make amends. (Looks to me like Hemus didn’t value the worth of his white players either…) The pessimist’s view is that Hemus had his eyes opened to the fact that accusations of racism had left a black mark on his name and he tried to erase that stigma–regardless of whether he had a change of heart or whether he actually had any racist leanings. The truth is probably strictly neither, because life is not a story, and history is not fiction. Solly Hemus is still alive, and living in the Houston area, and who knows, perhaps next year if he makes it to the SABR convention, Mark Armour might get a chance to ask him about these things directly.
Analyzing Batter Performance Against Pitcher Clusters (Vince Gennaro)
Vince is the president of SABR, but before he got that position, he was already a sharp baseball analyst, an economist, and had won the award for best research presentation at the SABR convention in Cleveland. So whenever he presents, I try not to miss it.
This year he presented on a methodology of clustering pitchers by type. You’ve all seen a baseball broadcast where a hitter and a pitcher are facing each other and the announcer tells you, for example, that Derek Jeter is 4 for 10 off Roy Halladay. Do people really expect Jeter is going to hit .400 off Halladay? No. Especially when you realize that all four of those hits came in 1999. As Vince put it, “That should be inadmissible evidence.” Fourteen years later, this Jeter is not that Jeter, and this Halladay is not that Halladay.
So when trying to build matchups for bullpen use, lineup optimization, and picking pinch hitters, this single batter versus single pitcher thing is far too limited. The sample size is too small, the time frame of the full career is too long, if they’ve never faced each other you get nothing at all, and the data only looks at the outcomes of the confrontations.
What if, instead, we could say this hitter generally does this or that against this type of pitcher? First task was to divide al major league pitchers roughly into three tiers, the worst, middle, and best, by OPS Against. Then look at how each batter fares against each type. Some hitters, like Elvis Andrus and Derek Jeter, do better than league average against the best pitchers, average against the average pitchers, and actually worse against the worse pitchers. The result of that is a very flat graph. Chris Davis, on the other hand, is the opposite. He feasts on bad pitching and struggles against good pitching. That result in a very steep graph.
(An aside from my playing days in the women’s leagues: we felt this instinctively on the bench. Some of us always hit the better pitchers better but struggled against the pitchers with poor control, while others did much better against the bad pitchers and had lots of strikeouts against the good pitchers. At the time I explained it as a function of our swings. I had a very consistent swing–modeled on Jeter’s not coincidentally–and the better pitchers were more likely to be in the strike zone or near it, and that meant I could make contact and I was tough to strike out. When a pitcher was all over the place, it was harder to have a plan at the plate. Other hitters were more “bad ball hitters” but if they were faced with a tough fastball in/near the zone they were more likely to strike out.)
Vince’s model tries to take into account five major variables:
1. Pitching style (repertoire, release point, does he like to work down in the zone, what are his sequences)
2. Pitching quality (i.e. Drew Pomeranz and Clayton Kershaw have very similar styles but Kershaw is clearly better)
3. Hitting style (swing plane, do they like to take the ball deep in the zone, do they have a two strikes approach)
4. Hitter quality (over 1000 plate appearances do you discern how they adjust or perform differently)
Vince has partnered with Yarc Data (a Cray subsidiary) to crunch the numbers, and clusters pitchers together by type. What does every batter ask about a pitcher before the game when getting ready?
what does he throw?
what kind movement?
where’s his release point?
what kind of style?
By breaking down these basic questions into some more data components (i.e. horiz/vertical movement, percentage of pitches in the zone, etc…) you can graph pitchers into clusters. (All this data can be gotten from Pitch F/x and Hit F/x on batted ball data.) He showed a graph of the starting pitchers over the last two years and there were very distinct clusters in the dots, each dot representing a specific pitcher.
Two dots were way off to the side from everyone else, but were near each other. One guess who those were? (I got it right: Tim Wakefield and RA Dickey.)
Now say you’re a team with this data. You’re the Yankees and you’re playing Colorado and facing Nicasio, a pitcher your guys have never faced before. You have the choice of Ichiro or Brennan Boesch in right field. you can look at how each of them fares against the cluster of pitchers around Nicasio whom they HAVE faced. (In this case, Ichiro is only in the 30 percentile, while Boesch is in the 60 percentile.)
There was more, but you know, it’s tricky to even try to represent a portion of each presentation in a blog post, and you have to be here to get the whole thing. There were heatmaps of ballparks and other cool stuff included, too.
Vince, as an economist, looks at things in terms of dollars a lot. So what would the return on investment be for applying this data? He showed a calculation, I didn’t get the exact numbers, but on tweaking the lineup like this, replacing the 30-percentile hitter with the 60-percentile hitter, once every two weeks (I think he said), as well as using it to optimize 50 pinch hitting decisions in the season, and maybe one other thing (bullpen use? didn’t see, sorry). If the net result was an increase in 33 total runs per season, that would equal a certain number of wins. For a top team, those wins would equal $15 million. “If you think that’s way too aggressive, fine, cut it to a third, that’s still $5 million,” says Vince.
He’s still working on the system to tweak how to weight the importance, say, of hitter quality versus pitcher quality, but so far it’s a very impressive result. There was much more involved in it, including the Q&A, which I can’t easily boil down for a blog post. But basically it seems like this is the right combination of 5 factors, and it validates the outcomes.
One question from the audience was how does this help in the postseason, when you’re only going to face pitchers in the top tier? As Vince put it, the postseason is like MLB-prime, it’s the equivalent of a higher league, where the guys who pitched only 40% of a team’s regular season innings are now going to eat up 70% or more of the postseason innings. It’s something worth thinking about. Those guys who may not have as impressive overall numbers, because they don’t dominate weak pitching, but do elevate against great pitching (Jeter), may be more valuable in October than the guys who feast on weak pitching and whose performances helped get you there. (Maybe A-Rod? He wasn’t talked about in the presentation, that’s just my surmise on what I’ve heard of A-Rod hitting analysis in the past.)
Overall it was clearly the best presentation I saw at this SABR convention and was also the best of Vince’s several presentations I’ve seen. (Update: He won the research award this year, so it wasn’t just me who thought that!)
* Baseball in the Age of Big Data: Why the Revolution Will Be Televised (Sean Lahman)
This was a good presentation but which boiled down to “hey! there’s big data!” I think for some of us that meant it was just preaching to the converted and there was a lot of silent head-nodding, which is the SABR equivalent of shouting “hallelujah!” For some folks in the audience though, perhaps this was their come-to-Jesus meeting, where they finally got on the big data bandwagon? I don’t know. All my friends are stats-lovers.
He opened with the reminder that “Big data doesn’t just mean a lot of data, it means ALL the data you can gather. In infinite amount, perhaps.”
Anyway, Sean began by giving some examples of how big data is in use in the corporate world, leading to headlines like “How Target Figured out a Teen Girl was Pregnant Before her Father Did.” Target figured out that the expecting mother and new mother demographic was very lucrative (and though Lahman didn’t say this part, but I read elsewhere, one of the reasons they’re so lucrative is that new moms develop habits that influence their family’s buying patterns for years to come). Now, of course they could look at things like purchases of maternity clothes: obvious. But through their data analytics they could identify women by other changes in their habits, like buying the vitamins that have been identified as pre-natal care.
“Netflix serves up 30 million videos a day, they keep records on when you pause when you rewind, when you bail out. It gives them a competitive advantage over the networks who use the Nielsen ratings,” Lahman said. “One of the major findings of Netflix: there was a huge difference between what people said they wanted to watch and what they actually watched. They would put the classic movies in their queue, like Citizen Kane, but then they would actually watch Breaking Bad and the Twilight movies.”
“In sports when we work with statistics, we think of very orderly with spreadsheets and ordered tables and such. But think about Google. When you do a search it grabs video, images, text, everything. We’re moving our processing power from the front end to the back end. As computing power has increased we can do billions of operations in seconds.”
He proceeded to show some graphs that demonstrated how exponentially (and I use the word in the strict sense) the data amount has been growing. “More data has been collected in last 5 years than we have in the entire 140 years previous in baseball. And in another 3-4 years we’ll probably be standing here telling you how rudimentary Pitch f/x was and how small the amount of data was.” Field f/x results in billions of data points over the course of a season.
Another place where data isn’t being mined that much other than by Ben Lindbergh and colleagues at Baseball Prospectus is all the video archives at MLB.tv. “We as a research community have not taken advantage of the video archives at MLB TV that we could be. Subscribers can see every game back to 2010. Some people are doing this work but I feel there is a lot to be capture out of there. When the balk rule change, eliminating the fake-third-fake-first came along, everyone wrote an opinion piece about it. Ben Lindbergh at BP looked at every instance of the fake to third fake to first on video. He looked at the evidence on whether it worked or not, instead of just offering an opinion about it.” Lahman also mentioned the pitch framing research at BP done with the video archive. The pitch framing work has become so well known, Vin Scully talked about it in the Yankees/Dodgers broadcast the other night when Hiroki Kuroda was throwing to Chris Stewart.
Lahman: “Apple has sold 90 million ipads. Players are rabid about video. One third of all households have some form of a tablet. This approach is not new. The video rooms have been around since the 1980s. What has transformed is our ability to slice and dice the video and the speed we can serve them up.” He talked about how far ahead the NFL is on using video and how they’ve always been obsessive about it, video taping practices, and now able to share a digital playbook with players that shows them each the plays how they should run FROM THE POINT OF VIEW OF THEIR OWN POSITION. Smart. Baseball has had video rooms since the eighties, but now guys actually run and look at an at bat right behind the dugout every time the strike out.
“Those of you under 30, go into data science, there will always be jobs there. There is mindblowing research being done in universities and think tanks, how do we teach computers to learn and to look for things in the data that we don’t see. A human being can’t possibly look at the huge amount of data at Target.”
So the takeaways are:
* there is big data
* a lot of that data is in video and other forms, not just numbers
* computer tools to analyze data are crucial to the leaps forward that can be made
* Statistical Predictors of MLB Players’ Proneness to Long Hitting Streaks (Alan Reifman and Trent McCotter)
Alan Reifman presented this while Trent sat in the front row. (Trent was also later on the winning team at trivia, the Researchers. Trivia was really fun this year. I loved how Bruce Brown set up the questions and the Jeopardy-style point boards and categories. Great job, Bruce! And congratulations Researchers!)
So, baseball fans love streaks, which appeal not only to the stat-minded but to those who love the daily drama of the streak going on. But what makes a player prone to streaks? You’d think high average would be one but Ted Williams, who had a career .344 average, never had a streak longer than 23 games. Are day to day at bats important, so is a player, like Williams, who walks frequently and records fewer at bats per game, at a disadvantage? What about foot speed, can a guy who beats out infield hits really have a better chance at a streak than a slower hitter? Strikeouts?
A commenter at Baseball Think Factory, on reading the topic, seemed to miss the point of a lot of SABR-style studies. Sometimes things seem obvious. The whole point of researching them is to find out if “common wisdom” holds, or is contradicted by the research. If what you find ends up upholding “common wisdom,” it was still worth looking into, even if you don’t get a “wow” out of it.
The commenter wrote, “Hit for a high average, don’t walk, hit at the top of the order, team scores a lot, shouldn’t take long.”
Anyway, Reifman and McCotter were not satisfied to take an Internet commenters word for it, and they studied both current and former players, using baseball-reference.com back to 1915 and the Trent’s “private reserve” of data on streaks prior to that. They used the “best known players” function of BBREF to filter the data, and then took hitters A-F (N=225). The length of each player’s three longest consecutive streaks were averaged to increase the reliability and stability of this measure and reduce a single outlier mucking things up.
At first the look at footspeed made it seem negatively correlated. But then it turned out that stolen base and caught-stealing data for some years is missing the caught-stealing data, which had skewed the numbers. And in the end it turned out not to be a significant factor at all.
Another thing that turned out NOT to be a factor was walks. Strikeouts per plate appearance also seemed at first glance to have a suppressing effect, but after regression was found to be insignificant.
In the end, players who appeared in many games, hit for better average, and were getting more at bats (hit early in the lineup or for teams who turned their lineups over often) produced the longest hitting streaks. So the commenter was sort of right.
One questioner from the audience asked the question I wanted to ask, which was, what if you do this study looking at players who had long strings of oh-fers in their careers. Could you correlated that with guys with long streaks to find an actual measure of “streakiness” which implies a player has both hot and cold streaks? We’ll have to wait until a future convention for that answer, perhaps?
Coming soon, writeup of the Statistical Analysis Panel!
(Did you enjoy reading this blog entry? Please consider buying me a hot dog.)