## Wednesday, December 08, 2004

### OneStat, Take Two

Through some spirited discussions with John from Washington Baseball Blog, I focused my attention away from the K/9*K/BB/(1+HR/9) formulation and towards a statistic based on expected runs saved vs. expected runs allowed based on a pitcher's pitcher-controlled stats (K/BB/HR).

In short, the formula is expressed:

Expected Runs Saved per K * K - Expected Runs Allowed per BB * BB - ExRA per HR*HR

That gets you the amount of saved runs to a team from a pitcher's defense-independent performance.

The only problem was to derive the expected run values of strikeouts, walks, and home runs. I started with the Baseball Prospectus 2004 expected run values by situation (I'd link it, but it is subscriber-only). Using that matrix, I created similar matrices for the expected runs added (or, in the case of strikeouts, subtracted) by the contribution of a marginal strikeout, walk, or homerun.

A strikeout situation is easy; you just take the current value of the situation and subtract out the value of the situation one out later. For example, if a team expected to score 0.8 runs with runner on first and none out, but 0.4 runs with runner on first and one out, I calculated the value of a K in that situation as 0.4 runs saved.

Walks are easy as well; you just take the difference between the current situation after a walk and the current situation without a walk. Thus, if a team expects to score .4 runs with a man on 1st and 1 out, but expects to score .8 runs with men on first and second and 1 out, the value of the walk in that situation is 0.4 expected runs. With the bases loaded, the value of a walk is 1 run.

Homers are a little counter-intuitive. With bases empty, the value of a home run is 1 run (obviously). With runners on, it's a little difference. For example, if there is a runner on 3rd and none out, the expected run value is 1.45 runs. If a batter homers, then the team gets 2 runs, but is left with a situation in which there are none on and none out - a situation with an expected run value of 0.54 runs. So the difference (1.45 - 0.54) must be subtracted out, leaving the value of that home run of 1.08 runs. The calculation is a little jarring at first, until you realize that your team is pretty much going to get that guy in anyway, so the real value you provide by hitting the homer is getting yourself around the bases.

Then I weighted the K, BB and HR matrices for the relative occurence of each cell in the real world. There were 188,539 plate appearances in MLB this year, and 103,387 (54.84%) came with no one on base. I got the plate appearances on a runner-situation basis from MLB.com, although it was not cross-referenced with out-situation. So I had to weight runner-situations by the relative occurence of out situations. Out situations are extremely evenly weighted - 34.5% came with none out, 33.2% with one out, and 32.32% came with two outs. If someone has the weightings of each cell in the 24-situation matrix, I could refine the data further.

The expected run values across MLB 2004 of a K, BB and HR are -.294422, +.327641, and +1.39299, respectively. I plugged these values into the formula, and created a runs saved per 9/IP figure by dividing by innings pitched and multiplying by nine. I calculated the values for all MLB pitchers in 2004 and plotted the runs saved per 9/IP against ERA. Here's the XY scatterplot I got for all pitchers with 20+ innings pitched in 2004:

It's an interesting calculation, and I'll have to noodle it further. For the time being, here are the top 10 pitchers (with 100 or more IP) in terms of runs saved per 9/IP through defense-independent pitching efforts:

PitcherERAERV/9
Randy Johnson 2.60 1.68
Ben Sheets 2.70 1.23
Johan Santana 2.61 1.06
Jason Schmidt 3.20 0.94
Scott Shields 3.33 0.91
Jake Peavy 2.27 0.84
Roger Clemens 2.98 0.73
Curt Schilling 3.26 0.65
Roy Oswalt 3.49 0.63
A.J. Burnett 3.68 0.62

At 11:54 PM,  tmk67 said...

Well done, SuperNoVa! You have calculated rigorously a figure I've tried to do back-of-the-envelope for a couple years (turns out I had been underweighting strikeouts and overrating HRs...)

I could swear that I've seen the 24-situation matrix filled in completely somewhere, but I doubt it will make a significant difference in the final results. For puriy, HBP should probably be added to BB as well, but again, there will be little effect in the final results.

It'd be interesting to see if there is a correlation with dERA as well. Also, do you see a better correlation with ERA for starters than relievers? Because inherited baserunners are charged to the prior pitcher, I'd expect that the correlation between your number and ERA should start to break down for RPs.

At 12:30 AM,  SuperNoVa said...

TMK- if you send me an e-mail at natsblog@gmail.com, I'll send you every shred of data I've got.

And, you are right, it does break down for pitchers with few innings to some extent (just like ERA, K/9, etc all do). That's why my cut off was 20+ IP. You could probably group the fewer-than 20 IP pitchers by strata or something to include them in the data, but they do act weird.

At 6:46 AM,  John said...

This is basically what BP's PRAA measures(pitcher-only runs above average). There are some differences... first, PRAA is league adjusted, park adjusted, etc. which is easy to do here. Second, they baseline to the average pitcher, which is reasonable either way. Third, they attempt to be more thorough about isolating defense effects, as they are in dERA.

I think that DIPS has been widely regarded as a bit too simplistic for a while, which is why you don't actually see many DIPS stats, just the more complex formulations.

But I do think that DIPS stats like a DIPS PR or PRAA are useful, because they're far easier for the average joe to calculate mid-season.

By the way, have you accepted the merit of dERA? For something a bit easier to calculate yourself, I suggest calculating the DIPS ERA, which requires normalizing hits to an average defense.

At 8:10 AM,  SuperNoVa said...

I still can't accept that dERA is right if Odalis Perez has a .70 lower dERA than Kris Benson. It simply doesn't match reality.

And none of the values I used are park adjusted. But park adjustment is a really nasty thing, though, for several reasons (1) pitcher dependent stats don't always match park-factors- home run rates; (2) I have a hard time using year-by-year park factors - it's hard for me to believe that, for example, Anaheim Stadium shot up to a 107 batter park factor in 2001, despite the fact that it didn't change its dimensions at all and was back under 98 the very next year.

At 11:23 AM,  dexys_midnight said...

This is just outrageously good stuff. Well done SNV; I especially like how useful the numbers are independent of whether McCracken's theories are right or not. They are even more useful, the further is his towards being correct, but they are useful by themselves as well.

At 7:06 AM,  John said...

That is why all stats use a 4 year average of park factor, achieving a higher level of statistical significance.

You said: "I still can't accept that dERA is right if Odalis Perez has a .70 lower dERA than Kris Benson. It simply doesn't match reality."

ERA+ says quite explicitly that park factors and league factors had a negligible difference on the two of them (which isn't much of a surprise, when you look at the park factors). That involves no data from dERA, that's from your ERA+ data. dERA says that Benson's defense has lost him .25 runs every 9 innings. That, to me, seems like a huge number. Defense is usually far less important than that to a game!

I don't know why you're so hot on Benson when compared to Perez, but all the numbers I've looked at say the same thing: Benson was clearly not as good as Perez last year.

At 7:24 AM,  SuperNoVa said...

(1) I'm not hot on Benson. I'm hot on his wife, though.

(2) "All the stats that I've looked at" - which ones are those? Are they defense-dependent?

Because the stats I've looked at, HR, BB, K (and HBP), the stats that the pitcher has direct control over, say that Benson is better because he gives up fewer walks. I don't care what the defense does behind Perez.

I just want to hear one argument based on the defense-independent performances of Perez and Benson that Perez was beter. Citation to a statistic (dERA), for which you have no formula is not proof and cannot explain how it weights home runs versus walks, strikeouts and a team's defensive performance, is conclusory and circular. Perez isn't better because his dERA is better. Unless dERA is explained and it appears to take into account the relevant defense-independent performances better, I cannot accept dERA alone as an argument.

(3) Four-year park factors make more sense. I'd still want to weight the years, but that's OK.

At 12:36 PM,  John said...

Excuse me?? Benson gives up MORE walks than Perez. In 2004:

Benson:
BF: 854, BB: 61, HR: 15, IBB: 8, K: 134, HB: 10
Perez:
BF: 787, BB: 44, HR: 26, IBB: 4, K: 128, HB: 3

So Perez gives up more home runs than Benson, but he walks fewer and strikes out more. Oh, and he hits fewer batters, too. Parks don't have much effect on BB and K rates, but they do on HRs. The Dodgers PF for HRs was 1.016, giving Perez 26.4 in a neutral park. Benson gave up 7 dingers with PIT, which adjusts to 6.13 and 8 with NYM, which adjust to 6.43.

Really, this means that Benson has a 2 HR advantage park adjusted vs. non-park adjusted. Adjusted BF/HR for Benson: 67.97 and for Perez: 29.81.

If there's something to complain about with Perez, this is it.

I haven't looked at your stat in any detail due to a lack of time. It doesn't take into account batters faced (or innings pitched), so this is going to value starting pitchers far more highly than relievers, which is fine in terms of overall contribution, but doesn't measure "good" as well, if you think of "good" as how well the guy pitches when he does pitch. Of course, even that has its problems...

If you divide all your numbers by IP, you're basically computing defense-independent expected runs per innings pitched (or do batters faced), which I think seems more useful. Still, even DIPS 2.0 acknowledges that just isolating out defense-independent stuff isn't enough. Read the article on DIPS 2.0... and use the method to calculate things (there's an article walking you through it somewhere).

I'd much rather see defense-adjusted ERA numbers than runs allowed per defense-independent event numbers, because you're not factoring in running game, how the guy acts with runners on base, etc. E.g., does he only make mistakes on the homer when no one is on, or does he get nervous with runners on base, and miss more often, giving the batter more control (e.g., more line drives, which are somewhat defense-dependent in terms of making outs, but certainly not even close to totally)? That kind of stuff factors into ERA.

Ultimately the big difference between DIPS ERA and BP dERA is that DIPS (2.0 anyway) really estimates runs without ever looking at actual runs, and IIRC, doesn't address some of these issues. It guesses some hard statistics that it doesn't need to guess dERA gets around the problem by scaling things like hits based on defense factors. That is, the BP methodology for this is to weight numbers as if the defense were average.

I don't have to have the formula to understand what they're intending to calculate, and I've never seen anyone besides you even hint that there might be some problems with their numbers, AND, I know that they go through a review process for their stats, and run correlations every year to evaluate their own stats. They have not been afraid to change their stats when they find variants that correlate better. And remember that dERA factors into their Runs Above Average formula and overall value comparisons to replacement players.

You're basically saying all of BP's main pitching stats are a crock of shit because you don't like them rating Perez higher than Benson overall, which seems incredibly dubious. If the park-adjusted ERAs still show the two guys over an earned run apart, knowing how little defense does matter, I couldn't imagine that defensive adjustments would change an ERA by anything close to a full run over the course of a season. For you to base your objection on dERA over an incident where that would have to be the case is absurd.

Sure, when I look at their top line, I appreciate Benson's HR rate and Perez' walk rate, and I wonder how that translates into actual value, when they do pitch. ERA is a reasonable attempt to measure that, because runs win games. Then I appreciate adjustments for a more accurate indicator.

Back to DIPS vs. BP's dERA, I think it's less desirable to only isolate a few stats, because it's clear that, while the pitcher doesn't always get credit for inducing a pop-up because it often happens anyway, some pitchers are far better at it than others. A lot more goes into making a good pitcher. This is why normalizing stats like hits to defense makes sense, and why you don't see DIPS in wide use.

At 6:19 PM,  John said...

By the way, 15% of the plate appearances against Benson ended in line drives, as opposed to 12% for Perez. Average across MLB was 13%. If someone hits a line drive, the outcome is mostly a matter of luck, as opposed to ground balls and fly balls, where skill of the defense clearly plays more of a factor. The only number I can see that makes Perez look bad in the slightest is his HR total.

At 8:57 PM,  SuperNoVa said...

John, you're right, I meant home run rate when I said walk rate. Benson has a huge advantage over Perez when it comes to home run rate - the 11 fewer home runs are well worth the 17 fewer walks(which I've said time and time again).

Percentage of line drives is an interesting stat (I don't have access to that data), but it strikes me that the % of line drives on BIP for the two men accounts for 10 more line drives off Benson. I'd trade 10 more line drives for 11 fewer homers.

I don't see how you calculate a 2 HR difference.

As for being weighted towards starters, the post clearly discusses that the total is a per-9 total. In fact, the best pitchers with 20+ innings are Brad Lidge, Gagne, etc., who have very low home run rates and very high K rates.

At 4:15 AM,  John said...

Missed the "per 9", not reading closely anymore. I'm over being helpful by subjecting myself to lots of stupid arguing.

According to THT data, Benson face 564 batters at PIT, giving line drives to 15% of them, and 290 in NY, giving up 14% line drives. This is 125, vs. about 94 for Perez. That's a difference of about 30. Line drives fall in for hits around 75% of the time, and the other 25% are pretty much hit to somebody.

Benson puts a ton more people on base than Perez (1.31 WHIP vs. 1.14, .320 OBP vs. .288). Perez' WHIP is very good, and Benson's is just okay. Perez was much, much better at helping his team win last year, with 17 win shares (10 above replacement) vs. 9 wins shares for Benson (2 above replacement). Perez was the #18 pitcher in VORP, with a 49.7 whereas Benson posted a 15.8 to come in at #154. Guys like Andy Petitte, who spent a big chunk of the year on the DL out-VORPed Benson. And just imagine how much higher Perez' VORP would have been with half the home runs.

At the end of the day, these guys are worlds apart. You can't take any minor adjustment to numbers like that to get two guys in the same ballpark, especially if the adjustment is defense, because it just doesn't matter that much (plus, stuff like VORP is already defense adjusted). The only thing that Benson has going for him is fewer HRs. But Perez clearly has a much, much better overall game, which shows up in pretty much every stat out there.

If these two guys are so even, why are their ERA+s so different that no amount of defense adjusting could ever come close to closing the gap?

As for the 2 HR gap, I multiplied the # of homers each guy gave up vs. the park factor for home runs for the guy's home park. For Benson, this was done for each team separately, since the two parks have different factors.

At 8:44 PM,  SuperNoVa said...

John, I think you think ERA+, dERA which are useful stats, proves much more than they do. I tell you what, I will STOP comparing Odalis Perez and Kris Benson for the purposes of this post.

Compare these two lines:

Odalis Perez:

196 1/3 IP 26 HR, 44 BB, 128 K, 3.25 ERA, 127 ERA+

Pitcher B:

185 1/3 IP 28 HR, 46 BB, 141 K, _____ ERA, ____ ERA +

Who would you think is a better pitcher? Looks like they are pretty much the same, don't they? Pitcher B gave up 2 more walks and 2 more homers in 9 fewer innings, but also struck out 15 more guys in those 9 fewer innings (a wash from an expected pitcher run saved standpoint).

Let me fill in the blanks for Pitcher B...4.52 ERA and 89 ERA+. Both of those number worse than Benson.

Pitcher B is Odalis Perez in 2003. Perez had a dERA of 4.81 in 2003 and 3.74 in 2004. Same team, same defense behind him, roughly same walk rate, homerun rate and strikeout rate.

Please explain why dERA and ERA+ explain Odalis Perez's defense-independent performances between the two years. They can't - he was the same pitcher. And if he gave up more line drives in 2003, wouldn't that be a start down the path that the amount of line drives given up by a pitcher is not a repeatable performance from year to year?

At 10:15 PM,  John said...

Wow, you don't get this sabermetrics thing at all. I would never make a decision based on only those numbers, because I require far more data.

Look around and you'll see that he was far more consistent in 2004 than 2003... he started off brilliant, then fell apart, much like Vazquez did for the Yankees this year. From June onward, he wasn't very good at all, and even had shoulder issues. There was also talk about him tipping his pitches, leading to hits in key circumstances. He also improved his handle of the running game a bit in 2004. In 2003, no left hander gave up more steals. This year, he vastly improved his game from that perspective, cutting down attempts by nearly 50%.

These "intangibles" should be more tangible, and is the whole damn point of advanced statistics. I can go straight to one of those and see that his 49.7 VORP in 2004 is much better than his 9.3 in 2003, or that his 12 Win Shares in 2004 blows away his 6 in 2003. I can say, "wow, what was with that?", and then more research gives the detail.

But, you can get somewhat of an indication without the advanced stats. In 2003, he through more pitches per plate appearance and let more people on base (.311 vs. .288). And, when people did hit him, they hit him harder (.442 vs. .420), which indicates more line drives into gaps, which is not a defensive failing. His WHIP went down from good to very good, as well (1.87 to 1.14).

Now, compare Perez to the 2002 version. There, I think the 2004 version is much more comparable. In 2004, he basically showed to stat-heads that 2002 wasn't a fluke.

Listen to yourself. At the end of the day, you are pretty much arguing that you can learn enough about a player to make a meaningful comparison based on only the most cursory stats. Do you really believe that?

At 10:20 PM,  John said...

Typo'd that WHIP... it was 1.27.

At 7:29 AM,  SuperNoVa said...

John, first you say that dERA is a much better way than any other statistic to take into account defense-independent pitching performance and now you are saying "you can't look at any one stat"? You've contradicted yourself so many times in this discussion I don't know where you come out any more. I try to limit the issue to one thing - defense-independent pitching performance (the Three True Outcomes) and you keep trying to raise extraneous things and backpedaled and jumped around. And at no time have I ever suggested that anything I have proposed is the sole and exclusive basis to judge a pitcher at all - one of my first posts on the subject used ERA+ to measure pitching performance.

Let me get this straight - I'm the one that doesn't get this sabermetric thing and you start talking about non-quantifiable things like "Perez started out brilliant and then fell apart" ... when the truth is that he went to Colorado with a 3.23 ERA in late May 2003 and gave up 9 runs in 3 innings - enough to jack up his ERA another run. He had another bad game against the White Sox in June, one against the Cardinals in July and got bombed for 10 runs in August vs. Houston. Four bad games.

In 2004 he had two blow up games - one in Colorado (again) and at our Expos in August. The difference between the two seasons lies almost exclusively in the two bad outings versus the four bad outings.

"He let more people on base" - well, it wasn't through the walk - it was through the base hits allowed. Again, not necessarily a defense-independent proposition.

At 8:50 AM,  John said...

You're putting words in my mouth. I have never said I evaluate a player on a sole statistic. I have never said that defense-independent stats are the be-all, end-all. In fact, I don't think they are, which is why I've said that dERA is probably better than most defense-independent stats, because it is defense-adjusted instead of pulling out only the things known to be isolated and easy to calculate. The *basic* DIPS hypothesis is clearly too simple to be all that useful, which is why even DIPS 2.0 is vastly more complicated than the original.

I've not been backpedling except when I misread, misreason or misunderstand (I'm willing to admit when I'm wrong, and I've certainly been wrong on some minor things along the way). I don't believe that I have contradicted myself, been jumping around, etc... more likely you don't understand my points due to whatever miscommunication.

Let me try to recap to demonstrate how the argument has evolved from my perspective.

My focus was, at first, to understand why you have basically said, "dERA is a bullshit stat... it clearly has some big problem with it". Your "proof", as far as I can tell, is basically that you think Benson and Perez should be considered about equal based on 2004 performance. I have tried to figure out why the hell you'd think that when every sabermetric stat that anybody relies upon to help evaluate a player says that Perez was better, including ones that take defense into account. You basically seem to be implying that all of them, from Win Shares to VORP are broken because they get wrong what is so obvious. Something so obvious, I can't see it.

So once I finally did a reasonable job communicating that, you tried to show me, as far as I can tell, that sabermetric stats aren't always accurate indicators (that they're getting the Benson and Perez comparison wrong) based on two BASIC stat lines for consecutive years from Perez.

Now, I started reading this blog because you guys seemed at least reasonably clueful. Yet, here you were basically arguing that Perez had two equal years based on two equal looking stat lines, without even attempting to look at any other metric, especially ones that have been demonstrated to correlate far better to actual performance. I expected better from you, and I called you on it. I provided hard numbers from several metrics demonstrating that Perez performed better in 2004 than he did in 2003, to show that what you were doing was too simplistic.

Since you seemed to be ignoring sabermetric evaluation of these people, I thought I'd try to look beyond the numbers for possible explanations to which you might actually be able to relate. All the info from there is condensed from public sources that seem to be in agreement (shoulder problems, running game, etc).

Looking into it, you may actually be right with respect to a two-game difference, even though a lot of analysts say otherwise (read the ESPN review of the guy, for instance). I don't know and I don't care, and it wasn't the point. The point was, ultimately, that the top-line stats don't give enough information to demonstrate your stupid hypothesis that dERA was flawed.

Reading in between the lines on which I was writing, another point is that you seem to be trying to come up with statistics to support your foregone conclusions, and you're drawing those conclusions only from the most peripheral stats.

Going back to a previous point, your style of defense-independent pitching stats miss a lot of things that make them less useful, such as the running game, and what a guy does when there are actually runners on base.

I firmly believe that stats aren't a perfect way of capturing future performance because of factors the stats can't capture like injuries, etc. And I may not be a statistician, but I do understand enough about the field to know about things like standard deviations and confidence levels quite well. I know that, as a measure of the performance they're attempting to measure, things like VORP aren't going to misjudge actual performance by a full 40.4 runs over the course of a year with any significantly non-zero probability. If I think VORP can produce results that off, then I would be forced to conclude that something is significantly wrong with VORP. And then I'd go and try to collect some real evidence, not an anecdote or two.

Im my mind, whatever anecdotal evidence either you or I come up with should be only an attempt to explain stats, not to judge whether the stats are working. I have good reason to trust the stats to measure actual performance to the degree of confidence merited, and only use the other factors to look at future performance, or to try to explain why the stats don't match with expected performance, etc. You apparently don't do the same.

Feel free to go back to ignoring large parts of my argument that conveniently don't fit your ill-formed hypothesis of the day. Heck, I suspect you don't like me and just want me to go away, and I'm fine with doing that, as I have been trying to help further your understanding of things, but have ended up wasting a ton of time, because you apparently don't like to be challenged in front of your friends.

It's not like I was actively trying to discourage you from building your own stat... I was pointing out why your first one made no sense, and though it took me way too long, it looks like I finally got through on that one. This time, I've been trying to show you that it is silly to judge whether a stat is good based on your personal judgement after evaluating some statistics known not to correlate very well to overall performance. I don't think I've gotten through to you, so it's turned into a big argument, but honestly, I was trying to be helpful by pointing out that your attitude toward the ability of these stats to measure the things they're supposed to measure is way too cavalier.

Note well, I am not saying that dERA is the one true way to evaluate a pitcher for a year. I am saying that, given a big enough sample size, it is a very accurate way of measuring in a relative sense how often a pitcher was giving up runs, when adjusting to a defense-neutral situation. While I do think that is a much better overall indicator than simply isolating Ks, BBs, HBPs and HRs, I still use other tools to get a well-rounded view of a player, including VORP, WHIP, ...

Anyway, I'm happy to stop being a thorn the footpath of your day. While I do have some interest in hearing what the other two guys involved with this site have to say, it's not like you guys as a whole have been doing any real analysis of the team recently, anyway. If there's ever any real content worth reading, maybe one of the other blogs I read will link to it.

At 9:06 AM,  John said...

Oh, one final thing. If you're going to keep arguing on the main points, then please demonstrate to me how VORP can easily be wrong about the performance of a pitcher over the course of a year by more than 20 runs in that year. You're clearly saying that it can be. Of course, if it can be, most of the BP stats are necessarily fundamentally flawed, and I urge you to prepare your "proof" and rescue the poor sabermetric community from the illusions they've been feeding upon for all these years.

But if you're just going to ignore those and try to poke holes at things that are tangents that you led me down, totally irrelevent to the discussion, then keep on going if you want... I enjoy it for the entertainment value.

At 3:41 PM,  dexys_midnight said...

ok, I just read this from the beginning to the end for the first time, and John, not to be rude, but your argument makes very little sense. Not because we "don't understand it," but because you have been defending these defense-independent stats from the beginning and it is your own contradictory arguments (SNV is EXACTLY right on this point) that make it sound like DIPS and dERA are garbage. I don't think SNV ever said that and none of us actually think it is garbage at all or we wouldn't be spending so much time on this; he just said that based on defense independent stats like K, BB and HR, Benson's season was better than Perez. Looking at those defense-independent stats, there can be no doubt about that as Perez's many many more homeruns given up per batter faced overwhelms the slight K difference and the medium BB difference.
But I read your posts to three different sabermetric fans and omitted the random sentences where you just state how great DIPS and dERA are and chastize SNV, and all three guys said the exact same thing: "wow, this guy really doesn't like DIPS and dERA." You want to know why they said that? Because your entire argument that DIPS and dERA are good stats seems to be based on statistics that are NOT defense-independent such as WHIP and ERA. I'm not sure why you do this, but it is crystal clear to everyone reading this that it is indeed what you are doing.
Finally, excuse me for being harsh, but that's what you get for trivalizing everyone else on this board as being not up to your standards of greatness.

At 5:09 PM,  John said...

Well, I don't think you've understood my argument, but that's probably my own fault.

First, when you said SuperNoVa made a Perez vs. Benson judgement based solely on defense independent stats, note that he did it based on his own stat (the first OneStat). He seemed to have as a metric of evaluation as to whether a stat is even valid, and that metric is whether Perez and Benson come out equal. I quote:

"Actually, John, I'm suggesting that if dERA is measuring the difference between Benson and Perez as .70, then there's something wrong with it."

Then later he said, "I still can't accept that dERA is right if Odalis Perez has a .70 lower dERA than Kris Benson. It simply doesn't match reality."

I'm pretty sure that is SNV pretty clearly saying that he thinks dERA is garbage.

Also, at the highest level, this conversation has been for me:

1) SNV posts stat that makes no sense but seems to be in the same spirit as DIPS. I point this out.
2) SNV changes it to a stat I have no problem with. I pointed out some possible directions to go by pointing out what is different with other DIPS stats (the third post in this thread).
3) I said (in the same post), to paraphrase, "I think this is useful, even though I've some issues with basic DIPS".
4) After I thought he got to the right place there, I brought up the one issue that was still bothering me (again in the same post), whether he thought BP had a broken stat as above.

From there, read my previous post.

Note the following:

1) I have never dissed OneStat, take 2. In this very thread I compare it to established stats.
2) You quoting the good things I say but not the bad is taking my post out of context, and doesn't accurately capture what I have represented. I do think that DIPS 1.0 has problems in that it tries to measure something more granular than it actually measures. I have some reason to believe that DIPS 2.0 is probably much better, but not quite what I'd be looking for vs. dERA. I do admit I don't understand the specifics of dERA. I haven't dissed dERA, even though I'm happy to admit that it's not the be-all, end-all... it measures a specific thing.
3) My core argument behind dERA is more that it has been peer reviewed. I assume that, particularly since it is used to compute VORP (as someone else said somewhere on this blog), that people have used regression tests to validate what they're doing to within a particular level of confidence. What I have been doing is trying to show someone who believes that BP has somehow failed that he shouldn't be so quick to judge. I don't offer it as conclusive proof, but I think they are points that are highly suggestive that his assertions about not believing in dERA are likely in left field.
4) I'm sorry if you think I'm trivializing everyone on the board. I have only been on the offensive against SuperNoVa, not everyone else on the board. I think that your overall effort to come up with a stat meeting certain goals was a success... I switched gears and was trying to set your boy back on the path of, "maybe the BP guys didn't totally screw up on dERA".

If you feel like I've been overly harsh, realize that from my perspective, it feels like I'm spending a lot of time talking to a wall. If the guy had said, "dERA is fine, and here's why I find this other stat useful", that's fine. I have no problem with that. There are a lot of stats some people find useful that others don't.

And, if he would just acknowledge that a valid, fully adjusted, defense-adjusted ERA is unlikely to be effected by nearly 3/4 of a run over a year just on the basis of defense, I'd shut up. Or, if he'd even just admit there would be a valid reason in the performance of either Benson or Perez resulting in the difference, that'd be fine. Or, anything along those lines.

Once again, accept my apology if you feel that I was attacking you. I wasn't, and it was probably due to my own miscommunication. But I definitely got into this to help the guy along by trying to show him his assertions quoted above are likely to be cavalier.

At 12:34 AM,  SuperNoVa said...

For the benefit of our probably one or two remaining readers, I want to say that I think John's thoughts on this issue have been helpful, and were extremely valuable in deflating OneStat as a highly valuable measure (I still think it's a pretty OK one, though, taken with a grain of salt).

John, you are always welcome here. I do get frustrated when you imply that I'm stupid or sabermetrically challenged, because it's not helpful. One, if I am sabermetrically challenged, it's better to explain it to me than to call me dumb. Why not help everyone be sabermetrically learned? Two, if I am dumb, everyone else would see that and you don't need to say it.

I have tried to maintain a consistent position throughout this - trying to come up with a defense-independent measure for pitchers based on the outcomes that pitchers SOLELY control - HR, BB, K - not anything else. And I think ERV9 (what I am now calling OneStat, Take Two), is an excellent way of taking that into account. In fact, without naming names, one BP author e-mailed me indicating that he liked ERV9.

My argument as to dERA is this - the dERA formula is not published, as far as I know. However, to the extent that dERA is used as a stat to measure defense-independent pitching performance, it fails to do so, based on the Benson vs. Perez 2004, and Perez 2003 vs. Perez 2004 comparisons. I cannot and will not impugn the value of dERA (which, again, I am unaware of the exact formula for calculating), to the extent that it measures other things to calculate a pitcher's value.

At 8:20 AM,  John said...

I think what you have is good, but I'll once again refer you to the DIPS 2.0 construction, which is well documented, and I think would better meet your goals. The comparative benefit of ERA-type constructions is that they try to recognize that not every plate appearance is independent and equally valuable... there are times where a hit is fine, and times where it isn't (how well does the guy pitch from the stretch, with runners in scoring position, etc). The DIPS constructions don't use anything except the defense-independent stats you want to use, IIRC.

Since, like ERA, dERA is meant to measure pitching performance but is meant to do so after defense adjustments, you're still saying the stat doesn't do what it's supposed to do, if you don't think it handles defense right. And I still don't understand your conclusions here even based on anecdotal evidence, because, as I pointed out before, dERA adjustments move the two guys closer by a quarter run, which I think is pretty large considering how unimportant defense tends to be overall.

I would suggest that you ask your guy at BP why their dERA does such a bad job of measuring the effect of defense, since that's what it is intended to do, and see what he says.

At 8:06 PM,  SuperNoVa said...

I think we may have agreement here - there is a difference between purely defense-independent and defense-adjusted. I agree that dERA looks like it is defense-adjusted. But that doesn't mean it isn't luck adjusted. To the extent that dERA takes into account park effects and the team defensive performance (e.g., Dodgers are 2% better at turning batted balls into outs), yes it is a good measure. But it is not defense-independent, because the remaining stats incorporate defense-involved activities such as hits, etc.

So I think that dERA is valid as a defense-adjusted measure but not a defense-independent measure, which I don't even believe it purports to be. Make sense?

At 8:53 PM,  John said...

Okay, I'll agree with that. Adjusting only definitely leaves in the luck of the game. On the flip side, isolating only the skills of the pitcher you know you can measure doesn't provide a complete enough indicator of the pitcher's skill. For instance, it is clear that pitchers have some control on balls in play, which is why they work hard to get batters to do particular things when they do put the ball in play that can clearly help increase the odds of getting the guy out (if the pitcher doesn't miss). A good defense will know what the pitcher is trying to do, and that will help even more (Willie Mays, for instance, was outstanding at knowing what his pitchers were trying to do with each pitch, and communicating where to play to the rest of the outfield). It's certainly not as much as they would like, of course. But it's definitely a factor that can only be approximated with a DIPS-like approach, not measured.

Again, I suggest you look at DIPS 2.0 if you haven't already. It's really detailed, takes many different factors into account, and is truly defense independent.

At 11:24 AM,  Steve Tolkin said...

This is the data you requested Format in fixed width font to see the data table clearly I got it from http://knology.net/~johnfjarvis/oera.html "Implementing the Cover-Keilers Offensive Earned Run Average" John F. Jarvis

Observed count of inning states
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 466816 121719 36394 27578 6862 12045 6682 6858
1 333344 138538 65900 50260 22451 24397 17088 17449
2 264740 139007 78751 63447 32133 31637 18784 20993

Here is your request from December 08, 2004 "I got the plate appearances on a runner-situation basis from MLB.com, although
it was not cross-referenced with out-situation. So I had to weight
runner-situations by the relative occurence of out situations. Out situations
are extremely evenly weighted - 34.5% came with none out, 33.2% with one out,
and 32.32% came with two outs. If someone has the weightings of each cell in
the 24-situation matrix, I could refine the data further.