ERV Differential - All You Need?
Tom Verducci recently wrote an odd column in which he argues that walks are equivalent to turnovers in the NFL. He also tries to correlate walk differential with winning percentage on a very rough basis - showing the top 5 and bottom 5 teams in walk differential and their won-loss records. Verducci explains his thought process:
I've begun to think in recent years that the giveaway/takeaway equivalentI find it odd that Verducci would make that statement without getting any statistical help. It's a simple linear regression - walk differential vs.games over .500. Well, I plugged in the team walk differentials and games above or below .500 for 2003 and 2004 and got an r-squared value of .4599, meaning that, statistically speaking 46% of the variance in games over .500 can be explained by a team's walk differential.
in baseball is walks. Get more of them than you give away and chances are good you'll have a winning team, even a playoff team. For instance, of the 11 clubs last year that posted a walk differential better than plus-30, 10 of them had winning records. Six of those made the playoffs. The lone losing team to go better than plus-30? Hold on to your stein: It was Milwaukee, a great credit to the job pitching coach Mike Maddux is doing there. Only St. Louis and San Diego walked fewer batters than the Brewers last year in the NL.
Now, Verducci's column is not the first place the importance of walk differential has been recognized. Rob Neyer's Beane Count tracks the rankings of teams in walks and home runs, suggesting that these are critical elements to a team's success. Those rankings, however, measure relative rankings, not absolute differentials in walks and home runs. The problem is, there are no real good ways to combine walks and home runs into one stats....or are there?
That's when I decided to combine walk differential, strikeout differential, and home run differential into one statistic and try to analyze them versus the success of a team. The natural means to combine those statistics is ERV - the expected run value of each event. ERV has been explained and calculated in this space before. In short, walks are worth .337 expected runs, home runs are worth 1.391 expected runs, and strikeouts are worth -.296 expected runs. So I took the walk, strikeout and home run differentials of all major league teams in 2003 and 2004 and ran a linear regression against their games over .500, and here's what I got:
The .7393 r-squared value is pretty high, suggesting that a very high percentage of a team's success can be attributed to just three statistics - K, BB and HR. These statistics, of course, are defense-independent events or the "Three True Outcomes" in baseball. And, in fact, the majority of the success can be attributed to BB and HR only - I ran the same linear regression on the ERV value of BB and HR differential as well, and got a r-squared of .6894 - indicating that K differential, on its own, adds little (5% or so) explanatory power [in fact, a linear regression of strikeout differential and games over .500 got an r-squared of .20 in 2003-04. When I ran the same one for 2004 only, the r-squared was about .07]. I'm not sure if I'm not reinventing the wheel here, but it does suggest that getting players who can walk and hit home runs and pitchers that do not give up walks and do not give up homers is the key to success.
Now, there are problems with this - one of which is that my ERV calculation is not based on perfect (just good) data and might be off a bit, as I explained in the orginal ERV post, and because I didn't calculate ERV in 2003, just using the ERV K/BB/HR values for 2004. But, in general, it makes the point as to how much the value the defense-independent events make to a team's overall success, all other things being equal.
As with my prior posts on ERV, if you send an e-mail to firstname.lastname@example.org, I'll send you my data.