Tuesday, December 07, 2004

Does "OneStat" take care of extremes?

I was wondering how well OneStat shows us relativity among great seasons and also whether or not the truly great seasons translate to OneStat or if we would look at them and say: "this can't be right," so I took a bunch of great seasons from truly great pitchers of the last decade as well as a not very good 1991 season of David Cone's (I did not add HBPs to BBs by the way).

Here are the seasons:

1992 Greg Maddux 2.18 ERA (166 ERA+)
1993 Greg Maddux 2.36 (171)
1994 Greg Maddux 1.56 (273)
1995 Greg Maddux 1.63 (259)
1997 Greg Maddux 2.20 (191)
2001 Curt Schilling 2.98 (154)
2002 Curt Schilling 3.23 (136)
2003 Curt Schilling 2.95 (159)
2004 Curt Schilling 3.26 (150)
1997 Pedro Martinez 1.90 (221)
1999 Pedro Martinez 2.07 (245)
2000 Pedro Martinez 1.74 (285)
2001 Pedro Martinez 2.39 (189)
2002 Pedro Martinez 2.26 (196)
2003 Pedro Martinez 2.22 (212)
1991 David Cone 3.29 (111) (and a 1.19 WHIP to go along with his 14-14 record)

ranking the above seasons using One Stat gets you some interesting results (remember Kris Benson and Odalis Perez's 2004 seasons scored in the high 7s).:
1999 Pedro Martinez 2.07 (245); OneStat= 80.92
2000 Pedro Martinez 1.74 (285); OneStat= 61.32
2001 Pedro Martinez 2.39 (189); OneStat= 59.13
2002 Curt Schilling 2.95 (159); OneStat= 52.34
1995 Greg Maddux 1.63 (259); OneStat= 45.52
1997 Greg Maddux 2.20 (191); OneStat= 44.94
2002 Pedro Martinez 2.26 (196); OneStat= 40.62
2001 Curt Schilling 2.98 (154); OneStat= 33.6
2003 Curt Schilling 2.95 (159); OneStat= 32.97
2003 Pedro Martinez 2.22 (212); OneStat= 32.55
1997 Pedro Martinez 1.90 (221); OneStat= 32.41
1994 Greg Maddux 1.56 (273); OneStat= 29.7
2004 Curt Schilling 3.26 (150); OneStat= 24.44
1991 David Cone 3.29 (111); OneStat= 20.51
1993 Greg Maddux 2.36 (171); OneStat= 17.09
1992 Greg Maddux 2.18 (166); OneStat= 15.38

I'm not sure what this tells us exactly, but some of it is counter-intuitive. If we believe the theory that the only thing a pitcher controls is HRs, BBs and Ks and we want to measure it by OneStat (which would certainly need tinkering over time, I don't think DM & SNV think they have gotten it on their first shot, they are just experimenting), does that translate into a statement that a higher OneStat does indeed mean a better pitching season? It's hard to imagine that the results of something like Curt Schilling's 2002 season being that much better than any of Maddux's great years could be accurate. I just put this out there for discussion as opposed to making too many conclusions today.


26 Comments:

At 7:00 PM, Blogger SuperNoVa said...

OK, well, let's dive into comparing Maddux's best year - 1995 - with Schilling's 2002. Here are their relative stats:

Schill02 259.3 IP 1017 TBF 218 H 29 HR 33 BB 316 K 3 HBP 3.23 ERA
OneStat= 52.3 BABIP=.297
Maddux95 209.7 IP 785 TBF 147 H 8 HR 23 BB 181 K 4 HBP 1.63 ERA
OneStat=45.5 BABIP=.244

So Schilling struck out 135 more batters in 49 2/3 more innings (meaning he essentially struck out 3 batters in each extra inning he pitched over Maddux). He walked 10 more guys in those extra innings.

If Maddux had pitched the same number of innings, he would have walked 28 (5 fewer than Schilling) and struck out 224 (92 fewer than Schilling), and given up 10 homers (19 fewer than Schilling).

I'd say on the walks and strikeouts alone, Schilling had the far better season - he bought 42% more strikeouts with only 18% more walks.

Look at the BABIPs - Maddux got a lot of love from his fielders in 1995 - only a .244 BABIP.

Of course, the homers are a big difference, and one could argue that the OneStat underestimates HR rate in a pitcher's effectiveness. We're noodling mostly with the HR factor to figure this out. Maybe it should be 0.5 + HR/9 as a denominator. I don't know. In addition, we haven't taken into account park factors in home runs. (Although Schilling gave up 17 of his 29 homers on the road).

But I think in a lot of ways, Maddux's 1995 looks a lot better because of his low BABIP, which OneStat eliminates. And Schilling's 2002 was an awesome season, without a doubt. I'd say it was on a par with Maddux's 1995 season.

Note that Pedro's 1999-2001 seasons stand out with OneStat, which give me some comfort.

At the end of the day, OneStat probably isn't going to measure the extreme seasons very well. It looks like anything over 30 or so is a great, great season.

 
At 9:11 PM, Blogger John said...

See my comment on the previous post, but this is all absurd. You should be quite worried when your stat doesn't map to anything quantifiable. ERA is not perfect, but it's clear what it measures... an average of how many (earned) runs a guy gives up in 9 innings. A dERA is the same thing, adjusted for defense. Other stats baseline on things like percentile vs. league average, or... but it's still clear what is being measured.

An 82 vs. a 9? Heck, replace the K^2 with the K and it all falls in a much tighter range, but you're still not measuring anything useful.

 
At 9:29 PM, Blogger DM said...

OK, John, you want quantifiable, how about this? Take BB SO HBP IBB and HR (those situations where the pitcher is acting independent of defense) and calculate a "On-Base-Percentage" from them, i.e. (BB+HBP+IBB+HR)/(BB+HBP+IBB+HR+SO). You could also do "Slugging Percentage" too with the same formula except multiply the HR by 4.

Under this approach Perez has a .354 "pitcher" OBP and .747 "pitcher" slugging, while Benson has a .362 POBP and .576 PSLG (neither park adjusted). This to me gives you more information than the dERA about the worth of Benson versus Perez than dERA, which seems to show Perez as the clearly better pitcher.

It seems that it is not "absurd" that you should be able to combine these stats in meaningful AND simple ways.

 
At 10:15 PM, Blogger SuperNoVa said...

John,

I don't think either DM or I were suggesting that OneStat is a replacement for all other pitching stats. DM just wanted a way to put K/9, K/BB and HR/9 all together in a single stat. We never meant it to replace ERA or ERA+ or any of the other meaningful stats.

I think ERA is, ultimately, the best expression of a pitcher's performance. But there are a lot of inputs into an ERA. And to make sure you aren't over-valuing ERA, you have to check it against other things.

I for one think OneStat works. Sure, we may have to give it a new name, but I do think it's a relatively good measure of performance independent of defense.

Moreover, John, what say you to the fact that Benson's dERA is .70 higher than Odalis Perez's dERA, even though Benson allowed 11 fewer homers vs. 17 more walks? I mean, if dERA isn't going to be able to show that Benson and Perez's performances were pretty close independent of defense, what in the hell use is it?

 
At 10:33 PM, Blogger John said...

It's not absurd to want quantifiable stats. I was pointing out that your "OneStat" wasn't quantifying anything significant.

In your comment, you've got something that makes sense... once you subtract out the ABs where the ball was in play and that play wasn't under the pitcher's control, what's the ratio of bad events to the total number of events. That's a meaningful thing to ask, at least (though I think you want to subtract out IBB not add it in twice... it is usually a manager's call, not the pitcher's, and is included in the BB stats).

But, if you notice, this doesn't factor in good events, such as strikeouts, so it's still not as useful as it could be. For example, it surely doesn't measure the overall worth of a pitcher as well as dERA does.

DIPS ERA and other DIPS stats do not isolate the defense-independent stuff as if the defense weren't there. Instead, they normalize things to the league average defense. By that, I mean it effectively calculates what would happen if the pitcher had an absolute average defense behind him. That's far more useful than subtracting out things you can't control, because the averages mean far less in an intuitive sense.

If you normalize for defense and then look at the OBPA of a pitcher, that's useful in evaluating a pitcher, and has the same intuitive relationship to a DIPS ERA that a standard OBP would have to a standard ERA (i.e., the DIPS OBP would strictly measure how often a pitcher lets runners on base with an average defense behind him, whereas DIPS ERA would measure more overall value, such as how the pitcher particularly deals with men on base situations (e.g., can he control the running game).

This is why I would put far more weight on existing defense-adjusted ERA stats than anything you're likely to pull out of a hat, particularly without a deeper understanding of statistics.

 
At 10:47 PM, Blogger John said...

SuperNova: It's fine to suggest a stat, but what does OneStat measure? As someone who understands basic statistics, I can tell you it is measuring a complex relationship that doesn't tell us anything real-world meaningful.

If you want to determine how valuable something is as a predictor, there are statistical methods for doing that... it's called correlation. IIRC, the original DIPS ERA work was shown to correlate with the next year's ERA far, far better than the actual ERA stat. Can you show me such a statistical analysis?

If a defense independent stat says that, were they put on a level playing field all year, Benson is going to give up .70 runs more per 9 innings than will Perez, there is a sound basis for having confidence, rooted in mathematics. This indicates that, more than likely, Benson really was worse than Perez by something on the order of one run per start, IF ALL OTHER THINGS HAD BEEN EQUAL. Go figure, ERA+ is telling you that exact same fact, but in a different way (by comparing their adjusted ERA to the league average).

If you don't think it's intuitive, this just means you don't understand the exact impact that the particular defense behind these guys, the leagues they played in and ballparks they pitched in.

 
At 10:53 PM, Blogger SuperNoVa said...

We DO need a new name for OneStat!

 
At 11:16 PM, Blogger SuperNoVa said...

Actually, John, I'm suggesting that if dERA is measuring the difference between Benson and Perez as .70, then there's something wrong with it.

Perez' home park was Dodger Stadium - batter's park factor 95/pitcher's park factor 96. Benson's home stadiums were PNC (96/96) and Shea (99/99). So Perez's dERA, all other things being equal, would be slightly HIGHER than Benson's.

Ok, let's look at defensive efficiency. The Dodgers had the highest defensive efficiency in the league (.7147)...the Mets were 6th (.6964) and the Pirates were third to last (.6871). Again, these should push Perez's dERA up versus Benson's.

So how can you explain the fact that Benson gave up 11 fewer homers vs. 17 more walks and has a dERA of .70 higher? Why doesn't dERA push Perez and Benson together?

You also said:

"If a defense independent stat says that, were they put on a level playing field all year, Benson is going to give up .70 runs more per 9 innings than will Perez, there is a sound basis for having confidence, rooted in mathematics."

That's a circular argument. A stat is not sound because it is rooted in mathematics. All stats are rooted in mathematics. The question is whether the statistic adequately explains the real world.

Listen, I'm not going to defend OneStat to the death based on only 2 days of analysis. DM isn't going to either. But if the purpose is to put K, BB, and HR into all one statistic - which DM said at the beginning - I think OneStat does that nicely. We aren't excluding everything else. But I think that the Benson/Perez example is a very good one as to the limitations of dERA.

 
At 12:08 AM, Blogger DM said...

What if we just simplify? (W+HR)/K. Kind of like WHIP, but we can call it WHRK (sounds like "work"). Based on 2004 NL pitchers, the average WHRK was 0.81, with the best being Mike Gonzalez of PIT with a 0.15 and Billy Wagner with 0.19. For starters, it's Randy Johnson (0.21) and Ben Sheets (0.22)

As for the death match between Perez and Benson, it's Perez (0.55) and Benson (0.57).

Note through all of this I have noticed that Luis Ayala on the Nats is pretty good and one to watch.

Ultimately I agree with John that I can't find too many free agent starters that would be worth the price. The one with the lowest WHRK? David Wells (0.43), whom I think John thought would be a good pickup.

 
At 12:08 AM, Blogger DM said...

This comment has been removed by a blog administrator.

 
At 12:22 AM, Blogger John said...

Even if you were right about dERA, it does not change the fact that if you can't even identify what your stat is measuring, you are in no way going to be able to rate its value quantitatively versus other statistics.

To be fair, I don't understand the dERA construction, because BP hasn't posted anything that detailed on it on their web site, and I haven't looked around beyond that. I can explain to you very well DIPS ERA, which is definitely different. However, I have enough experience with BP to know that their stats are soundly derived and peer reviewed. That provides a basis of trust that causes me to believe that the stat measures what it's supposed to measure, even though I don't necessarily understand how it is derived. I'm amused that you'd link to so many notions from their site, then just dismiss dERA out of hand just because you don't like what the numbers tell you.

That being said, I'm pretty confident that I can provide some evidence showing that the stat is probably sound. First, let's look at the ERA+ statistics you give. For Perez, you give a 127. For Benson, you give a 97. Let's convert those into park adjusted, league adjusted ERAs (call it aERA). The guys both pitched in the NL, where the lgERA was 4.30. We know that (lgERA/aERA)*100 is ERA+, therefore aERA = lgERA*100/ERA+.

Therefore, Perez posted an adjusted ERA of 3.39 and Benson a 4.43, before adjusting for defense. Once you adjust further for defense, Perez' resulting ERA (which I assume is the dERA) goes up to 3.70.

After you adjust for park factors and Perez gets closer to Benson in terms of raw ERA, as seen by ERA+, the difference going from 1.06 to 1.04. When you adjust for defense, you said this difference goes down to .70, which means they get even closer together when you take that into account. I actually got .82, I don't know where you came up with .70 (you have to average the NY and PIT dERAs using XIP). Either way, you were right that they got closer together in ERA as we took into account more factors. What you did was overestimate the impact that those factors had. They're usually not too drastic, all things considered.

By the way, there's strong evidence that Dodger Stadium has an impact on defensive efficiency, making the defense look better than it is. And, both PNC and Shea have the opposite effect, as shown by their defensive park factors.

 
At 12:34 AM, Blogger John said...

DM,

Yes, that's a far more satisfying way of looking at the proportion of bad stuff to good stuff when the defense doesn't come into play.

dERA, to me, is a more accurate guide, because it takes the complete game into account better. It takes into account that the pitchers out there do things that cause the ball to be put in play, generally far more than they do things where the ball ends up not in play (you can measure that proportion too, of course).

 
At 12:46 AM, Blogger John said...

Actually, I like looking at the inverse of that stat you've provided:

K/(BB - IBB + HBP + HR)

It is basically, "strikeouts per (exploited) pitcher goof". Randy Johnson has a 4.08 here.

You could also factor in 4*HR. In your formulation, it basically would then measure how many total bases the pitcher gives up in goofs per strikeout. Or you could do it in terms of expected runs, particularly if you consider play-by-play data, which would be extremely interesting.

Realize, SN, that we are now talking quantitatively, instead of just jamming stats together randomly with operators and looking to see if the result seems to have any meaning.

 
At 9:29 AM, Blogger dexys_midnight said...

Was very surprised to log on today and find 13 comments. I'd like to make two comments ON the comments. The first is that DN, SNV & I all believe that Baseball Prospectus is an excellent site with fabulous thinkers. The three of us have all been fascinated with the "deeper" stats of baseball since we were each little kids and all subscribe to most of the Bill James/Rob Neyer/Billy Beane etc. school of thought. That being said, John, despite the fact that BP may have some statistics PhDs on its roster, I am guessing that they often come up with stats in similar ways: through open and thoughtful discussion among people--I think you are a bit harsh in your judgment of OneStat--no one here said it was the next big thing or even correct(!) It was just a way of starting discussion--which obviously it did--and an attempt to work out something new and potentially useful.
Second, as much as I am into this subject at least as much as anyone, we should be careful to keep true to the light and jovial nature of this blog. I don't want to end this conversation by any means--please keep it going. I just hesitate to turn off any non-stat-heads from frequenting our fair kingdom here. Hopefully, people will see from reading a bunch of posts here that we try to talk about a variety of subjects (with some focus on the Nats of course :-) )

 
At 9:35 AM, Blogger dexys_midnight said...

oh, a third comment on the comments. It seems to be that any construction that makes a walk equivalent to a homerun must be flawed.

 
At 9:55 AM, Blogger SuperNoVa said...

John,

First, ERA+ is already park adjusted, that's why I used it.

Second, I agree that Baseball Prospectus's stats are peer reviewed. But that doesn't mean that they always get things right. Neither I nor DM have ever said in the TWO DAYS we've noodled OneStat that we think it's right - although we've looked at real world data to suggest that it explains things pretty well as a SHORTHAND for K/9, K/BB and HR rate all rolled into one. We have not said and will not say that its an all-encompassing statistic like Bill James has the gall to say about Win Shares.

Third, I have explained it in English, several times now. The premise is that more strikeouts per nine is better, more strikeouts to walks is better, and fewer home runs is better. You have two more is better stats and one fewer is better stat. You multiply the two more is better stats and divided by the fewer is better stat to combine them.

You could also do this by (K/9 + K/BB) / (1+HR/9). But the fact is that would give a pitcher with 14K's per nine and 7 BB/9 (think Steve Dalkowski) a 16 numerator, while a pitcher with 10 K's per nine and 3 BB/9 - a better pitcher by far - a 13 numerator.

At no point were we "randomly jamming stats together with operators." We made assumptions about the meaning of particular statistics, and figured out what the best way was to interact them. I tried (K/9 * K/BB divided by 1+HR/9, 2+ HR/9, 4+HR/9) to see if I could get better fits to a linear regression line and an exponential regression curve (ERA is, after all, an exponential function - can't have an ERA lower than zero).

If you want to criticize, please don't point us to BP's site and suggest "it's all been done" or "they are smarter than you and think more about baseball." Who the hell was Voros McCracken before he wrote his article? (And if you haven't read Tippett's rebuttal at the Diamon Mind site, you should). Give us some analytic feedback; show us examples why OneStat doesn't explain something in the real world or why it's a poor tool for explaining something.

I did find your suggestion of K/(BB-IBB + HR + HBP) to be useful. I agree with Dexys that a HR cannot be made equivalent to a BB. But I think your equation is useful - it may even be more useful as K / (BB-IB + 4*HR + HBP). What I'm trying to figure out is whether your formulation rewards absolute levels of strikeouts enough. Two pitchers may have the same rate stats in that equation:

Pitcher 1: 200 K 40 BB 20 HR 0 HBP
Pitcher 2: 100 K 20 BB 10 HR 0 HBP

Assuming the same number of innings pitched (say 200), is Pitcher 2 really equivalent to Pitcher 1? There's no question that he's not hurting himself through BB or HR allowed. But he's putting an awful lot of pressure on his defense relative to Pitcher 1. All other things equal, I think that Pitcher 2 gives up more runs.

Their OneStats would be: Pitcher 1 (9*5 / 1+0.9) = 23.7 and Pitcher 2 (4.5*5 / (1+0.45) = 15.5.
I think those values are relatively fair for the two pitchers.

 
At 10:00 AM, Blogger John said...

I think that even if you interpret what I'm saying as harsh, which was not at all my intent, I have certainly been open and thoughtful. Pointing out that OneStat doesn't measure anything useful is not meant to denigrate the effort, but to help nudge you onto more productive paths.

I'm sure the biggest challenge for people like the guys at BP is identifying useful things to measure that tell us things we don't already know that we would find insightful. Once you do that, if you have a reasonable understanding of statistics, it's usually pretty easy to do something reasonable, though to get the highest level of accuracy, you may have to take into account lots of different variables that can be hard to isolate (particularly, in baseball, things involving defense). But you're right, in that these things are certainly peer reviewed. I've been through peer review processes on both ends for mathematical stuff (I've done cryptography work that's in several standards). So I know what it's like to read someone's criticism and be a bit embarrassed because you want to be perfect, but you didn't understand something, or overlooked something, etc. But, if you can't accept constructive criticism as objective, you're not going to develop your skills very quickly.

As for a construction that makes a walk equal to a HR being flawed: OBP and batting average treat home runs and walks equally. Does that make them flawed? It depends on what you want to use the stat for. You certainly can't use OBP to measure the overall worth of a player relative to others, since there are so many other factors, and not just power and defense, but also things like playing time. Still, OBP is a good metric that is easy to calculate, and means a very specific thing, and gives you part of the picture on a player.

The proportion of good vs. bad defense-independent events certainly falls in the same category, in that it measures something concrete that people will understand, but whether individuals find it useful in aiding their understanding of pitchers will really depend.

 
At 10:17 AM, Blogger dexys_midnight said...

well, I assumed based on everything we had discussed that OneStat or whatever you want to use for this discussion that the purpose was to measure the effectiveness of pitchers and compare among pitchers "who is better." In that capacity, making BB=HR is indeed flawed, as would using purely OBP to compare two players be flawed (and seeing that it was flawed, people came up with OPS and other stats).

Second, and really just a quibble, but you must have meant a stat other than batting average. Batting average does not treat a walk and a HR equal. A walk doesn't even count in BA, which is probably one of the top few reasons why people started looking into deeper stats in the first place.

Oh, and in defense of my friend, I highly doubt that SNVs reactions to your criticisms come because he is "embarrassed" that his stat wasn't perfect.

 
At 10:45 AM, Blogger John said...

SuperNova:

I *said* that ERA+ was adjusted. You missed my point because I was really tired (and still am). Let's look at the two guys (sorry if this comes out poorly... tables weren't accepted in comments):

ERA aERA dERA
----------------------------------------------------------
Perez 3.25 3.39 3.70
Benson 4.31 4.43 4.52
Difference 1.06 1.04 .82

Note that aERA is just ERA+ turned back into an adjusted ERA, instead of a proportion versus the league ERA.

These numbers say that Perez was helped by the pitcher's park and the league a bit, but was helped by the backing defense a lot in comparison. And, the impact of those effects was more favorable on his ERA than Benson's, meaning that Perez was indeed closer to Benson than just the ERA would indicate, as your intuition told you. It's just not even like you want to believe.

As for your stat, your premise is the premise of DIPS, but you don't understand enough about stats to derive something useful. Again, WHY do you multiply "more is better" stats? What does the scaling tell you that is useful? It tells you nothing. Now you've just moved to a non-linear function for no good reason whatsoever.

Take a close look, you've got the underlying premise right, but you're not combining the pieces together in a meaningful way. Sure, you can propose a function out of the ether, and see if it fits the data, but it's generally good to have a hypothesis as to why the function might actually work that's rooted in intuition, not just "I want to slap these stats together somehow and get something meaningful out".

Isolating the effects of defense to measure the overall worth of a pitcher *has* been done. That doesn't mean it couldn't be done better, etc. But there is something to be said about doing this kind of thing for a living vs. being an amateur. Everyone starts off as an amateur in every field of endeavor, and they tend to make tons of mistakes and learn from those mistakes, building expertise. I don't see how saying there's good cause for a high level of confidence in their work is either wrong, or denigrating to you, as you clearly take it.

I've given plenty of analysis of what's wrong that you aren't seeing. You've even noticed some of the artifacts yourself in the post we're commenting on. Look at my transformation of your equation, and explain to me the relationship when you think about it that way. Why have strikeouts squared in proportion to a mixture of good and bad things? Why the quadratic treatment of strikeouts? What does it show?

What is the scale here? What is the difference between a 34 and a 1? quantitatively? In ERA, it's perfectly clear, it's a multiple on expected runs per 9. With you, it's just some arbitrary number. You're attempting to measure total value, but you have no idea as to how to try to do it other than trial and error.

Even the formula I gave isn't intended to measure total value of a pitcher. If you think total bases yielded through pitching mistakes per K is a useful measure of a pitcher, that's great. It's a neat statistic if you want to isolate what happens when the ball doesn't go into play. If you want to factor in how often the ball does go in play, how the pitcher handles the running game, etc. it doesn't tell you much, but it's not designed to do so, either. We can certainly have different stats with different explicit goals that tell us different things, all interesting.

Dexys: you're perfectly right in wanting to calculate "who is better", but most stats are designed not just to calculate "who is better" (at a particular well-specified thing), but also attempt to quantify how much, often by relating it to concrete goals that have meaning to us, like number of wins, number of earned runs given up, etc.

And I was definitely only thinking of OBP... thanks for pointing that out. Maybe I'll get to sleep before the weekend...

 
At 11:53 AM, Blogger SuperNoVa said...

I think I have explained why there is scaling and why it is quadratic.

First, the goal of the damned thing is to come up with one number that combines a pitcher's strikeout rate, walk rate, and home run rate in a way that explains the differences in such rates between pitchers. Among other things dERA doesn't do that because, I assume, it takes into account things other than K rate, BB rate and home run rate. So it cannot, per se, achieve the goal we are trying to achieve. It's purpose is to explain something we aren't trying to explain.

There are lots of ways to combine the stats. Addition is one of them. One could simply take the K rate and subtract the BB rate and HR rate. A 9 k/9 rate and a 3 bb/9 rate and a 1 hr/9 rate would rate a pitcher a 5. But it would also fail to relatively rank these two pitchers:

Pitcher 1 = 12 k/9 4 bb/9 1.5 hr/9 = 6.5
Pitcher 2 = 7 k/9 1 bb/9 0.5 hr/9 = 5.5

Pitcher #2 seems, to me, to be a better pitcher than Pitcher #1.

You could also just use K/BB rate minus HR rate, e.g., a pitcher with a 3-1 strikeout rate with 1 hr/9 would have a 2 number. But then you get the following pairs:

Pitcher 1 4K's, 1 BB, 1 hr/9 = 3
Pitcher 2 8k's, 2 BB, 1 hr/9 = 3

It strikes me that pitcher #2 buys 4 more strikeouts with 1 walk, and that his absolute number of strikeouts makes him a better pitcher.

The scaling is due to the fact that you need to balance absolute strikeouts (K/9) against relative strikeouts. This is one of the issues I have with K /(BB-IB + HR + HBP) - it doesn't account for absolutely high strikeouts enough, although I think it is a very good formula.

[What if it were multiplied by K/IP? Wouldn't that balance the relative ability of a pitcher to produce strikeouts versus bad events against the volume of strikeouts? It would be saying "this is your cost of a strikeout vis-a-vis bad events...and this is your quantity of strikeouts. I will noodle this, too, although I'd like to multiply HR*4].

Yes, the multiplication of K/9 and K/BB does produce a statistic that is quadratic. But remember that ERA is a limit function anyway. As you pile on more strikeouts and rack up fewer walks, you can lower your ERA, but you can't get it below zero. And there is a slackening of the ability of an incrementally higher K rate and K/bb rate to affect ERA as ERA approaches zero (a 5k/9 rate versus 6 k/9 will get you an incremental ERA improvement greater than going from 15 k/9 to 16 k/9). So I think that a quadratic formula fits the limit function of ERA nicely if imperfectly. We could make it a pure limit function by eliminating the 1+ from the denominator, but this would dramatically skew the results towards low-HR-rate pitchers. You could walk a ton of people, strike out very few, but allow no homers and you'd be the best pitcher of all time if you calculated it that way.

The main failing of OneStat (thus far) is how it takes into account HR rate. That is something I've spend the most time noodling anyway. I am uncomfortable with it because of that, and will remain that way until I do more analysis.

Furthermore, as the K rates and K/BB rates get higher, so do the numbers and the less you can tell from incremental increases.

 
At 12:25 PM, Blogger John said...

You've maybe explained why in your rationale you've done things, but you haven't demonstrated why what you're measuring is meaningful, certainly not in any intuitive sense. Why are homers scaled by the number of walks again? Break the thing into pieces... what does HR*BB actually measure that is relevant? I know what an HR is and what a BB is, but what are you measuring when you multiply them together? In most cases in baseball, it's pretty easy to answer that question. Usually you've got a rate, and you're trying to turn that into a fixed value for either the numerator or denominator given the other value, or you're trying to assign relative value to things using weights that are reasonable metrics (the worst metric like that being the multiply in counting for SLG). Does the pain value of a HR increase linearly as the number of walks increases? What exactly does that quantify?

That is just one of the building blocks of your stat. It makes no sense that I can relate to on its own, and it certainly makes less sense when you start combining it with things like K's squared (why do Ks have quadratic importance, but home runs do not, for instance). As a result, your number is meaningless unless it happens to fit the data by magic. I see no reason why it should compared to many other arbitrary combinations of those variables. Again, you should take the methodology where you know precisely what it is you are trying to measure every step of the way, then figure out how to actually measure it in a way that you can clearly demonstrate works. Coming up with something based on gut then hoping the data supports it is going to be a huge waste of your time.

Anyway, I'm going to stop here, because it's perfectly clear that I've utterly failed to get you to see the point. You have fun, and I'll go spend my time more productively.

 
At 12:59 PM, Blogger SuperNoVa said...

Well, this is the kind of feedback that I think advances the ball here.

I've been noodling your first effort to work with our goal of one stat to combine BB, HR, K -- K / (BB-IBB + HR + HBP). And there are some ways that I think you can use it - and I think that using expected run values of these events are important. More to come later.

 
At 1:53 PM, Blogger SuperNoVa said...

Ok, I've noodled John's thoughts some more and I think that the way perhaps to go is the expected run value route that John suggested, although the K / (BB-IBP+HR+HBP) is not the way to go, I think. How much benefit does a K bring versus the cost in walks and home runs?

I think the equation would look like this:

Expected Run Value (ERV) *(-1) K MINUS (ERV*BB +ERV*HR). You can then get this to a per-inning stat by dividing by 9 innings.

Thus, if the ERV of a strikeout is -.2 and the ERV of a walk is .25 and the ERV of a HR is 1.25, then a pitcher would be a net contributor to his team's run prevention if he had 6 k per 9, 2 walks per 9, and .5 HR/9 (e.g., 1.2 - (.5 +.625) = .075). I don't know the actual ERVs of Ks, BB and HR, but this makes some inherent sense to me. I know that work has been done to compile ERV by situation, so these measures are also influenced by situations, but it does make some sense that there is an average value of a walk, homer, and strikeout.

I THINK what this stat would do is measure the total amount of runs saved/caused by a pitcher's pitcher-controlled outcomes per 9 innings. The K's and BB's and HR's would be balanced to their relative importance.

If a pitcher is not K'ing people, walking people, or giving up gopher balls, he is leaving the game up to his defense (with the exceptions noted by Tippett). So the question is how much run value is the pitcher adding/subtracting by his pitcher-dependent contributions?

(No scaling, no nothing, John! Thanks for focusing my thoughts on this).

 
At 8:13 PM, Blogger tmk67 said...

I don't mean to intrude on this discussion, but of all the suggestions in this and the other thread, a linear (e.g., cumulative) Expected Run Value per Inning formulation makes the most theoretical sense, in my view. I've been using a simple, backhand version of it for a couple years, once I first saw the Three True Outcomes idea.

In the end, for a statistic to make "sense" (which is what John seems to be stating), it has to relate to one or both of the only two numbers in baseball that matter -- runs and outs. So, why isn't

(ExRK*K + ExRBB*B + ExRHR*HR)/IP
(where ExRK is expected runs from a strikeout, ErBB is expected runs from a walk, and, ExRHR is expected runs from a home run)

all you really need? What am I missing?

If you want to compare "great seasons," you would not need to divide by innings if you want a season-cumulative statistic. For free agent acquisitions, it makes more sense to look at performance per-inning, and then project innings pitched in a different way.

Where it gets complicated is the Expected Run calculation -- you could use league averages, historical averages, take into account ballpark effects, or you could normalize for the particular opponents faced (e.g., Perez has to pitch in Coors more than Benson). That's where the fun is.

 
At 8:21 PM, Blogger SuperNoVa said...

TMK, that's exactly the formula I proposed (although expressed in a slightly different way). In fact, I've calculated it for every 2004 pitcher and the correlation between it and ERA is pretty tight.

Have you seen it published anywhere?

 
At 10:45 PM, Blogger tmk67 said...

SuperNoVa,

I have not seen it published, but it amounts to a very simple specification of McCracken's Defense-Independent Pitching Stats that does not account for ball-park factors, and other component stats.

What I don't like about DIPS and BP's dERA is the attempt to translate the True Outcomes to a statistic folks are "used" to seeing -- ERA (especially dERA's callibration to 4.50). The result is hidden assumptions in the final number (such as using league-average averages for batted balls in play, etc.) Doing that, simply to come up with a stat that "looks like ERA" just injects uncertainty and potential error that the user cannot control (such as the odd result that a good defense in Dodger Stadium actually causes an overstatement of dERA). DIPS and dERA are useful for answering the question, "Who had a better year in 2001?", but I find that callibration to make it harder to project for next year.

I would rather take a per-inning Defense-Independent Expected Runs Prevented (DIE-RP?) figure and then adjust that number to analyze any other thing I want to know about. For example, you can increase or decrease projected innings due to age or injury. You can adjust the ExR for park effects if you really want to (e.g., moving Jamey Wright from Coors to RFK, which probably will probably play like the old Vet). It also lays a nice, basic foundation for looking at minor leaguers and their potential contribution.

Don't get me wrong...I really BP's translation of dERA into "Runs Above Average", and "Wins" and eventually Wins Above Replacement Player (WARP). It may be the best way of assigning relative value between pitchers and shortstops. But to compare Odalis Perez in RFK against Jamey Wright in RFK, I don't think you need it.

(Best I can tell, now that McCracken is working for the BoSox, he does not publish DIPS publicly anymore.)

 

Post a Comment

<< Home