I have a diploma hanging on the wall behind me in my office. It declares that I have a Bachelor of Arts in English from a well-known university. One of the few things I remember from my college days is the distinction between two kinds of dictionaries. Prescriptive dictionaries tell you how things should be; descriptive dictionaries tell you how things are. For example, some dictionaries now list, as an acceptable definition of “literally,” a definition that is actually the opposite: “in effect; in substance; very nearly; virtually.” This is because people have misused the word so much that descriptive dictionaries have included the misuse.
I believe that there are two kinds of stats, too, and a few years ago I made up similar names for them. But while we can all agree that descriptive dictionaries that tell me “literally” means “virtually” are dumb and wrong, the two kinds of stats are not good and bad — they are both good, but have different applications. I call them “descriptive stats” and “predictive stats.” Descriptive stats tell you what has happened; predictive stats help you guess what will happen in the future.
The distinction between the two types of stats has been on my mind again recently, specifically as it relates to the National League Cy Young Award voting. The top three in the voting are virtually set in stone: Jake Arrieta, Zack Greinke, and Clayton Kershaw will appear in some order, probably not alphabetically by last name, which is how I chose to list them here.
Realistically, while Arrieta has been outstanding this season, he is not going to win the award unless Greinke and Kershaw somehow split the vote. But the most likely scenario is that the Chicago Cubs’ ace appears in third place on most ballots and finishes in third place overall, which is probably just about where he should be. So for purposes of this discussion, I will focus only on the Los Angeles Dodgers’ one-two punch of Kershaw and Greinke.
Recently, three ESPN writers weighed in on the NL Cy Young race. Both Keith Law and Buster Olney put Kershaw ahead of Greinke, which prompted Jayson Stark to write a piece “defending” Greinke as the most deserving candidate.
The wonderful thing about this discussion is that you can make a compelling case for both pitchers. As great as Arrieta has been, it is easy to dismiss his chances of actually winning because the only statistical advantage he has over the Dodger duo is his 19 wins, which carry very little weight with the voters these days. But you can rattle off a laundry list of statistical reasons Greinke should win — most notably, his microscopic ERA — and then turn around and make an equally compelling argument for Kershaw, including his strikeouts and the dreaded “advanced statistics.”
Wins Above Replacement (WAR) is an interesting statistic for many reasons, including the fact that there are two different versions. FanGraphs has its calculation (often abbreviated as fWAR), and Baseball-Reference has its version (rWAR). There are three major differences between the two versions, but only one is a significant factor in determining WAR for pitchers. From FanGraphs’ own primer on the two versions:
Pitcher Value. fWAR uses FIP to calculate WAR for pitchers — making it defense independent — while rWAR takes a pitcher’s actual Runs Allowed and adjusts that to account for their opponents, team defense, park, and role.
Let’s look at the current WAR totals for our top three, again in alphabetical order:
[table caption=”WAR Totals for NL Top Three”]
What we’re seeing here is the difference between predictive stats and descriptive stats. Fielding Independent Pitching (FIP) and its brother xFIP (Expected FIP) are predictive stats — they tell us how many runs a pitcher is likely to allow if he performs the way he has. Earned Run Average and Runs Allowed are descriptive stats — they tell us how many runs a pitcher actually allowed.
Of course, nothing in life is that simple. The actual purpose of FIP is to remove defense from the discussion of a pitcher’s performance, so a groundout to shortstop and a base hit just out of the reach of the shortstop count exactly the same on the pitcher’s ledger, since only the vagaries of luck and the shortstop’s range — neither of which says anything about the pitcher — separate the two. But it is still a predictive stat, because what it basically says is, “A pitcher who allows X grounders is likely to get Y outs,” with no regard for what actually happened. If one pitcher throws a perfect game with six strikeouts and his opponent allows three runs on six singles and a double with 11 strikeouts and one walk, the second guy will have a better FIP, but the first guy is going to win the “Big O Tires Player of the Game” on the postgame show.
The crux of Olney’s and Law’s arguments in favor of Kershaw is this: in every way other than the runs he actually allowed, Kershaw has pitched better than Greinke this season. This is reflected in the FIP and xFIPs of the pitchers, as seen here:
[table caption=”FIP and xFIP for NL Top Three”]
Kershaw has a huge lead in both FIP and xFIP, which explains his huge lead in fWAR (which, you’ll recall, uses FIP instead of actual Runs Allowed).
The crux of Stark’s rebuttal is easier to copy and paste:
Well, call me old-fashioned, but I’ve always thought that when it came time to vote, we should be voting on what ACTUALLY happened, not what theoretically should have happened according to the numbers.
Just as the thrower of the perfect game in our example above is going to win the Player of the Game, Stark believes that the pitcher who actually had the best results should win the Cy Young Award.
Except it’s not quite that simple (again). Stark makes it clear that runs allowed is his definition of “what ACTUALLY happened” when he says this:
Feel free to believe there are more incisive metrics now that help us delve deeper into the science and mathematics of pitching than we’ve ever delved before. I won’t argue you’re wrong.
But until we start deciding games based on how many runs a team SHOULD have scored, according to the data, instead of how many were ACTUALLY scored, I’m saying that it’s the real runs that matter. And real run prevention matters. And ERAs matter.
Let’s go back to our perfect game example above. What if all the hits allowed by the losing pitcher were broken-bat bloopers and the one walk was the result of the umpire missing calls on two strikes? And what if the winning pitcher got outs on two robbed home runs by his left fielder, three diving catches by his center fielder, one outstanding play by his third baseman, and a blown call at first base on an infield single that the umpire called an out? Yes, the guy who threw the perfect game probably still wins the Player of the Game due to tradition, but you could definitely make the case that the losing pitcher actually pitched better.
Predictively speaking, I would rather have a pitcher pitch the way Kershaw has than the way Greinke has, even though descriptively speaking, Greinke has gotten better results if we define “results” solely in terms of run prevention. And based on these predictive stats, the smart money is on Kershaw having a better season next year than Greinke.
The question is, what does it all mean in terms of Cy Young Award voting? We celebrate the fact that pitcher wins are no longer the main determining factor in the voting, but doesn’t ERA have many of the same flaws? Isn’t there a reason we don’t just automatically give the award to the guy with the lowest ERA? Stark is not wrong when he says that “it’s the real runs that matter. And real run prevention matters.” Where he just might be wrong, though, is to assign all the credit or blame for run prevention to the guy throwing the pitches.
A huge (perhaps unspoken) declaration of FIP and xFIP is that ERA is not an individual statistic. The average number of earned runs that a pitcher allows every nine innings is surely impacted by the quality of the pitcher, but it is also impacted by countless other variables, most notably defense and, yes, luck. So once we’ve accepted that a pitcher can’t totally control his ERA, isn’t the next logical step to say that the pitcher who has done best the things that he can control was the best pitcher in the league?
My favorite part of this debate is that, as far as I can tell, there is no right or wrong answer. I think Keith Law and Buster Olney are absolutely right to focus on which pitcher was best at doing the things that make a pitcher great. I think Jayson Stark is absolutely right to focus on which pitcher was the most successful at keeping runs from crossing the plate. I have changed my mind repeatedly while I have been writing this about which pitcher I would vote for if I had a vote.
FIP and xFIP both focus on strikeouts, walks, and home runs; the difference is that FIP uses the actual number of home runs allowed, while xFIP uses the expected number of home runs allowed based on the number of fly balls allowed and the league average of homers per fly ball.
FIP and xFIP tell us that Greinke has been more lucky this year than Kershaw. Greinke’s ERA has outperformed his FIP by 1.08, which shows that he has had some good luck on balls in play. His xFIP is another 0.51 higher than his FIP, which shows that Greinke has also probably been lucky that relatively few of the fly balls he has allowed have left the yard. Those assumptions are borne out by the fact that Greinke’s home run per fly ball (HR/FB) rate sits at 6.8 percent, the second-lowest number of his career and 25 percent lower than has career number of 9.1 percent, and his .239 batting average on balls in play (BABIP) is well below his career .299 mark.
Kershaw is a different story. After last night’s game, Kershaw’s ERA, FIP, and xFIP are all nearly identical. That doesn’t mean that Kershaw has not gotten lucky this season; it just means that, to this point, his good luck and his bad luck have basically evened out. This chart seen on Reddit shows the progression of Kershaw’s three numbers over the course of the season:
Kershaw was quite unlucky early in the season — after seven starts, he had allowed a .357 BABIP and 20.8 HR/FB (compared to his career marks of .272 and 7.0). Twenty-two starts later, his season numbers now sit at .272 and 10.6.
So what does it all mean? Well, first and foremost, it means that both Greinke and Kershaw (and, for that matter, Arrieta) have been outstanding this season. When it comes time to vote, a vote for Greinke and a vote for Kershaw are both correct. Kershaw has a large edge in FIP, xFIP, strikeouts, and innings pitched. Greinke has a large edge in ERA and a smaller edge in walk rate and home run rate.
If I had a vote, I would vote for Kershaw. Unless I voted for Greinke. Oh gee, don’t make me choose.