Last week on Saber-Slant, I talked about putting run numbers to a player’s offensive production and the problems of divvying credit accordingly. Clearly, a single with a runner on second is worth more than single with the bases empty. But how much of that value should be accredited to the hitter, and how much of it should be a part of the context in which he was hitting? In other words, how much more value is the base/out state providing to the run value of the single with a runner on? And should a hitter be given credit for any of that?
Those who think RBI are invaluable to the game think that there are hitters who are better with runners on, but empirical evidence shows that that just isn’t the case. Of course, there are some differences in hitting with runners on, but few if any hitters can empirically be shown to be significantly improved with runners on base. For example, here are the five players with the most RBI last season and their splits with bases empty and runners on (for context, I tacked on the 2009 MLB average as well):
Prince Fielder: .282/.364/.559 Empty; .284/.399/.529 Runners
Ryan Howard: .269/.342/.542 Empty; .290/.405/.649 Runners (this may be due to the shift employed on Howard with no one on)
Albert Pujols: .330/.406/.625 Empty; .337/.448/.631 Runners
Mark Teixeira: .283/.355/.553 Empty; .293/.398/.550 Runners
Jason Bay: .279/.369/.517 Empty; .278/.383/.514 Runners
2009 MLB: .259/.323/.417 Empty; .267/.345/.418 Runners
The difference in OBP is almost entirely as a result of intentional walks rather than any inherent walk-inducing capability with runners on. Combine that with the fact that any empty/runners splits that did exist would have to be regressed to the league average (which, as you can see, is not a large split), and you get a general consensus that the vast majority of major leaguers hit just as well with runners on than with bases empty.
But now we get back to our basic question: how can we assign credit for a hitter’s single if these singles happen in different contexts? What if we tried to strip that context away?
Run Expectancy (RE)
The single with a runner on second is more likely to score a run than the one with the bases empty. Both singles change the game state on the bases, however. In the former instance, the single likely scores the runner from second, while putting a man on first. In the latter, it only puts a runner on first from a bases empty situation. Can this information be somehow turned into a run value for us to get something tangible?
Well, if we have both base states (runners on each of the bases) along with out states (amount of outs in the inning), we can. Using actual game data over a certain time period, one can find the number of runs scored by the end of an inning from each base/out state. Here is a chart of that information for 1999-2002 MLB. This can be considered the run expectancy of each base/out state. For example, from that chart, you can see that each inning starts off (no one on, 0 outs) with 0.555 runs expected to score by the end of the inning (based on the league average runs per game in that time period). With a runner on second and no one out, you would expect to score 1.189 runs by the end of the inning.
Now, we know that a single is going to change the base/out state accordingly. The single with no one on takes the expected runs scored from 0.555 (assuming no outs) to 0.953 runs, a change of 0.398 runs; that is the run value of that particular single. Now, what about the single with a man on second? The base/out state goes from runner on second to runner on first, which changes the run expectancy from 1.189 runs to 0.953 runs, a difference of -0.236 runs. However, a run has also scored, and that run would be reflected in our tally at the end of the inning, so we also count that, giving a total run value for that single of 0.764 runs.
Now, you can do this for every event that occurs in the game and get a run value for each event. Now you have all your singles, home runs, and outs tallied in runs. In fact, FanGraphs does it for you. Keep in mind that these are tallied as runs above average (average being zero, of course).
Equal Runs for Equal Play
Done, right? Not quite. Remember the whole “strip the context” thing we talked about at first? We don’t want to give more credit to the guy who got to hit his single with a runner on second or with no one out as opposed to the guy who hit with no one on or two outs. The way we resolve this is that we give them the same credit. After all, the only thing the hitter was able to affect was the single itself, not the base/out state (there other things we could keep track of, like ballpark, pitcher, and defense, but that would get unnecessarily confusing).
What is that one value that we give out for all singles, home runs, or other events? If you take the average starting RE for every single and the average runs scored by the end of the inning due to those singles (that is, the average ending RE plus the average number of runs driven in by those singles) and take the difference between those two, you get the average run expectancy of a single, stripped entirely of base/out context. This is a value that we can assign to all singles. Do this for all events and you get what is called a set of linear weights.
Linear weights are simple to use, because all you have to do is multiply the amount of events with the appropriate weight to get a run value contributed by a player. Linear weights are essentially context-neutral because they assume each player sees an average proportion of base/out states, which is fair for what we set out to do. These weights are the basis for many of the current offensive metrics that you see, such as wOBA on FanGraphs and TAv on Baseball Prospectus. They achieve the goal of stripping as much context away from events and giving hitters equal (run) credit for equal play.
I hope that was clear to everyone. If there are any questions, ask away in the comments; I’m always happy to answer.