Diamond Deities: Problems with Pitching Predictors

Last week we took a look at BABIP and how a hitter’s patience at the plate can affect it. For today’s sermon, we hope to expose some problems with advanced pitching metrics, such as FIP, xFIP, SIERA, and tERA. It is important to note that, generally speaking, all of the metrics listed above are much better than traditional pitching stats for evaluating the “true talent” of a pitcher. The problem however, is that some pitchers defy the logic behind these metrics. Therefore, while advanced pitching metrics are useful for a general analysis of pitchers throughout baseball, there are still limitations to be mindful of.

Primer: Why Advanced Pitching Metrics are Better

The most commonly known and accepted statistic used to evaluate pitchers is ERA. It is basic and logical. The problem is, ERA does not account for several variables. Defense, park factor, batted-ball data, and league all contribute to a pitcher’s ERA, yet are not accounted for by traditional pitching stats. In other words, ERA can be misleading. Pitchers who play in a hitter-friendly park, or have a bad defense behind them, or play in the American League are at an immediate disadvantage when it comes to ERA.

Advanced pitching metrics attempt to neutralize these factors by adjusting to the league average. As the folks at FanGraphs explain it, FIP and xFIP attempt to “measure what a player’s ERA should have looked like over a give time period, assuming that performance on balls in play and timing were league average.” In short, these metrics focus solely on the factors which a pitcher can control. This process has proven to be much more accurate in predicting future performance than ERA.

More recently, sabermetricians have attempted to improve on FIP and xFIP. SIERA (Skill-Interactive ERA), like FIP, emphasizes the factors which a pitcher can control. However, SIERA doesn’t ignore balls in play, as it “attempts to explain why certain pitchers are more successful at limiting hits and preventing runs,” (FanGraphs). Once again, SIERA has proven to be a more reliable way to predict future performance than ERA and other traditional statistics.

Disclaimer: But wait… There’s a catch

It is important to understand these metrics in context, and recognize that some pitchers are unique. For instance, a significant portion of xFIP is calculated by replacing the pitcher’s actual HR/FB ratio (% of fly balls that result in a home run) with the league average HR/FB ratio. In theory this makes sense, because HR/FB ratios are historically unreliable and unpredictable. More often than not, a low HR/FB ratio is mere luck and unlikely to continue. Notice I am using generalities… This is where advanced metrics run into problems. Sometimes, for whatever reason, a pitcher has the ability to limit home runs. CC Sabathia is a perfect example. For his career, Sabathia has a HR/FB ratio of 8.4%, well below the mean (10.6%), and he has never posted a HR/FB ratio higher than league-average in a single season. Thus, xFIP makes an assumption that quite frankly does not apply to Sabathia and therefore it is not a useful predictor of “true talent” in his case.

Likewise, the current MLB leader in ERA, Johnny Cueto might defy the logic behind these metrics. While Cueto’s 2.03 ERA ranks best in all of baseball, his xFIP and SIERA numbers are worst among those who rank in the top 10 for ERA. xFIP likely undervalues Cueto for the same reasons as Sabathia. Cueto’s ridiculously low 5.6% HR/FB may appear to be “lucky,” as xFIP would suggest. However, his HR/FB ratio has improved every year since his debut in 2007. Therefore, it would appear to me that Cueto’s improvement in limiting homeruns is the mark of an improving pitcher, as the downward trend in HR/FB suggests. SIERA also suggests that Cueto has been “lucky,” but for a different reason. The formula used to determine SIERA will penalize Cueto for being “lucky” on balls in play. In theory, Cueto should be giving up more hits because his strikeout rate isn’t elite and therefore opposing batters put a lot of balls in play. The main problem here is the inability of SIERA to gauge how hard balls are hit. Sure, we can measure the percent of balls in play that are fly ball, ground balls and line drives; but there is no accurate way to determine how many of the balls are hit “weakly.”

Précis: Pitching Performance in Perspective

The main thing to keep in mind when using predictive metrics is that there is no “one-size fits all” approach to analyzing performance. You should always use more than one statistic in order to see the broader picture. There are always those players who defy accepted norms and thus their performances are inaccurately rated by some metrics. That doesn’t mean advanced metrics are “wrong,” or even unreliable. Instead, these metrics should be viewed for what they are: improvements on traditional statistics that are still evolving, with room for improvement.

Please follow us on twitter @DiamondDeities