[外電] Is WAR the new RBI?

作者: abc12812   2011-09-07 09:15:02
http://itsaboutthemoney.net/archives/2011/09/06/is-war-the-new-rbi/
Let’s face it, it’s the only reason we remember Tony Batista.
In 2004, in his final full season as a major-leaguer, T-Bats drove in 110
runs for the Montreal Expos, despite a putrid .272 OBP. Although he was,
arguably, the worst everyday player in the majors in ’04, he was hardly the
worst player to ever drive in 100 runs (see Ruben Sierra, 1993), nor was 110
the highest RBI total ever amassed by a replacement-level player (see Joe
Carter, 1990). However, for some reason, Tony Batista became a sabremetric
icon, our favorite cause celebre when we rage, rage against the RBI.
You’ve heard it before. RBIs are just neat round numbers and context.
Given the opportunity to hit behind a couple of on-base machines like Brad
Wilkerson and Jose Vidro, anybody could drive in 100 runs. But just because
a blind squirrel gets a nut every once in awhile, that doesn’t mean he
should bat cleanup.
In the wake of T-Bats glorious season, the sabremetric cause was moving from
its grassroots mail-order infancy to full-blown mainstream phenomenon, buoyed
by New York Times Bestsellers, championship GMs, and senior columnists. When
an broadcaster spouted out a flurry of “traditionals” – batting average,
homers, RBI, wins, saves – to make his point, basements full of fantasy
addicts looked up from their digital almanacs and replied in unison: Bleh.
Give me OBP, give me OPS, give me IPO, give me WPA, give me K/BB; just don’t
give me RBI! If you’re going to give me RBI, Mr. McCarver, I’d rather you
gave me nothing.
And then came WAR.
The concept was ratified by the sabremetric Godfather, Bill James, who’d
created Win Shares according to a similar ideology in 2002. It was a
neoclassical economist’s wet dream, like baseball GDP: an elegant equation
which accounted for all the sport’s diverse variables and yielded a single
number roughly reducible to the oldest and most hallowed statistic of them
all, the win. Hallelujah.
Wins Above Replacement is a beautiful idea. Euclidean grace in a quantum
world. A simple answer, not only for age-old baseball conundrums like “
Mantle or DiMaggio?”, but also a formula for unprecedented comparisons like
“Rickey Henderson v. Johnny Bench” and “Roy Halladay v. Alex Rodriguez“.
There’s only one problem. It doesn’t work.
(click “view full post” to read more)
At least, not yet. Not in the fantastically straight-forward way we try to
use it. The idea is so good, so clarifying – like democracy or the rational
market – that we really, really want it to work, we’re willing to suspend
our disbelief just a little while longer in the hope that it might. Because it
’d be so great to know with statistical certainty that Albert Pujols was
worth $200 Million, that we really couldn’t win that pennant without Andy
Pettitte, that Jacoby Ellsbury is definitely the AL MVP, and that Ben Zobrist
is exactly 9.3% better than Adrian Gonzalez. Darn that dream.
The cruel irony, the I-could’
ve-had-Sean-Doolittle-and-all-I-got-was-stupid-Barry-Zito irony, is that the
problem with WAR is the same as the problem with RBI. It frequently measures
context as much as performance. Especially when used to evaluate single
seasons, it doesn’t sufficiently account for the inevitable variations in
opportunity and environment.
What if Granderson played behind Ian Kennedy and Daniel Hudson?: UZR &
Flyball Rates
A few weeks back I critiqued Steve Berthiaume’s analysis of Curtis Granderson
’s defense by looking at some inconsistencies in the way Ultimate Zone
Rating (the defensive metric associated with Fangraph’s WAR) assesses
outfielders. Mark Simon of the ESPN Stats & Info Blog followed up with a very
interesting review of specific plays which have adversely effected Granderson
’s low ratings in 2011. While Simon isn’t looking at UZR specifically, he
does point out that most defensive metrics do not account for positioning and
that half a dozen plays can cause sizable shifts in the aggregate numbers
when we’re dealing with less than a season’s worth of data.
I’m not the only one who’s noticed that UZR frequently yields suspicious
results in small samples, at Fenway, and when several good outfielders are
playing alongside one another. I do, however, want to expand upon my claim
that outfield UZR is substantively effected by flyball rates.
In the Granderson article I pointed out that the teams in each league which
rank highest in outfield UZR for 2011 – Boston and Arizona – also ranked #1
in their league in FB%. This remains true. However, this is obviously not
sufficient proof of correlation, for a couple reasons. Not only is there a
high possibility of coincidence in any single example, but both the D-Backs
and Red Sox feature several outfielders traditionally regarded highly by both
sabremetricians and scouts. For anybody who’s watched them consistently, it
would be pretty hard to argue that the trio of Gerardo Parra, Chris Young,
and Justin Upton isn’t among the best in the major leagues, no matter who’s
on the mound.
So, I looked back at all teams that finished at the extremes of the flyball
scale since 2003. I do not claim that there is a perfect or, in the parlance
of economics, a “strong” correlation. That is, a team with a 35% flyball
rate wouldn’t have a dramatic disadvantage in OF UZR compared to one at 38%.
There is, however, significant evidence that pitching staffs with extreme
batted ball tendencies can dramatically effect their outfielders UZR numbers.
(These extremes I defined at upward of 40% at the high end and below 33% at
the low end.)
Average OF UZR for FB% > 40.0: 10.1
Average OF UZR for FB% < 33.0: -10.6
Of the sixteen teams at the high end of the range, five finished #1 in their
league in OF UZR. Of the 21 teams at the low-end, only five finished with a
UZR north of zero.
From these I would point to some interesting pieces of anecdotal evidence:
The 2010 Giants and their 40.7 FB% led the majors in outfield UZR by a
substantial margin (40.7 to 31.6), despite the fact that they gave more than
1100 innings to Pat Burrell and Aubrey Huff, lead-footed former DHs who
nonetheless somehow finished with positive UZRs for the season.
The 2007 Cubs had an exceptional 44.3 OF UZR in a season where they handed
most of the innings to Alfonso Soriano, Jacque Jones, and Cliff Floyd, all of
whom substantially outperformed their career numbers with some help from a
Chicago staff that sent 40.6% of batted balls in their direction.
On the other side, the ’05 Cardinals, despite featuring some premier
outfield talent in Jim Edmonds, Larry Walker, Reggie Sanders, and So Taguchi,
finished with a -6.1 OF UZR, thanks to a pitching staff that put only 29.7%
of batted balls in the air.
The difference between 30% and 40% can easily be several hundred plays, so
when you consider Simon’s point about the significance of even a handful of
mistakes in a few months of play, you can see what kind of advantage those
extra opportunities provide.
This is not to say that UZR is useless, just that is unreliable in single
season increments and that unreliability is passed on to WAR, which we
habitually use/misuse when discussing single seasons and partial seasons.
I can’t play several positions. (or “The Adam Dunn Effect”)
WAR’s move to the mainstream is deeply tied to the rising popularity of
FanGraphs. One of the first of it’s “unlikely results” to spark
considerable conversation was Ben Zobrist leading AL batters (and finishing
behind only Albert Pujols and Zack Greinke overall) in 2009. Zobrist had a
breakout season which was impressive by any measure, but his WAR was given a
major boost by his defense (only Franklin Gutierrez and Nyjer Morgan got a
greater advantage from fielding).
On one level, this seemed legit. Zobrist appeared at every position on the
diamond in ’09 and over the years has proven himself to be an above-average
defender at second base and in right field. Managers have long lauded the
value of versatility and lavished praise on players like Zobrist, Mark
DeRosa, and Placido Polanco, who play several key positions well and also
swing decent sticks. Zobrist’s looked like evidence of their wisdom.
But while it isn’t much of a stretch to believe that Zobrist’s glove was
worth a couple wins to the Rays in 2009, try selling this: According to WAR,
in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.
There are two types of utilitymen, those who are given the job because they
play many positions well and those who are given it because they play no
position well. As yet, WAR struggles to distinguish between the two. It
reads Houston’s inability to decide where Lee hurts them least as evidence
of Lee’s versatility. It suggests that Howie Kendrick‘s defense at second
base has gone from average to exceptional since Mike Scioscia started giving
him more starts in left field.
UZR results get weirder the smaller the sample gets. The utility player may
log a thousand innings in total, thus suggesting his UZR is somewhat more
reliable, but what actually happens is that several hyper-unreliable samples
of a few hundred innings or less are bundled together like toxic mortgages
and rated AAA.
WAR Hates Sluggers
One of the things which advanced stats should be applauded for is the extent
to which they’ve decreased the fetishizing of the homerun and raised
awareness of all-around contributions. Jonah Keri and Dave Dameshek debated
the relative merits of Willie Stargell and Tim Raines this week, largely
based on the fact they had identical career WAR totals. Dustin Pedroia has a
real shot at his second MVP, despite the fact that his “traditionals” (.309
AVG, 85 R, 18 HR, 74 RBI, 25 SB) are basically the same as Melky Cabrera‘s
(.303, 83, 17, 79, 17).
However, one can’t help but notice that a cross-section of the most
intimidating hitters in the game are treated with relative disdain by the
metric. It doesn’t like them because they play first base or left field (or
DH), which aren’t scarcity positions. It doesn’t like that they are fat
and slow.
While I understand that everybody would love to have Chase Utley or Troy
Tulowitzki, a middle-of-the-order hitter who makes big contributions in the
field and on the basepaths, as well as at the plate, the fact remains,
building a lineup without a slugger (or two) is like building a mall with
seven Sunglass Huts and no department stores. A few sluggers are swift,
slender middle-infielders. Most of them aren’t. To paraphrase Reggie,
there are lots of drinks and precious few straws. If you get left without
one, no amount of Range Factor, WHIP, or baserunning acumen can save your
season. Just ask the Padres, or the Mariners.
Yet, we misuse WAR to insist that it’s better to have Ian Kinsler than
Miguel Cabrera or that Peter Bourjos is as valuable as Prince Fielder or Mark
Teixeira.
We’ve struggled to understand and statistically represent the effect hitters
have on one another. Would Nyjer Morgan be hitting .306 if he wasn’t
batting directly in front of Ryan Braun and Prince Fielder? (WAR suggests,
by the way, that Morgan has been more valuable on a per game basis than
Fielder.) Morgan is taking free passes this season at only about half his
career rate. Has he become less patient? (On the other side of things,
Adrian Gonzalez‘s career OPS is fifty points higher when the pitcher is
throwing from the stretch. He’s enjoyed that situation in 52% of his plate
appearances in 2011.)
While I admit the difficulty of building a model that accounts for the effect
a pairing like Braun/Fielder or Pujols/Holliday has on the rest of the
lineup, this is one area in which I find the conventional wisdom to be
irrefutable. While I applaud WAR (and other metrics) for aiding in our
appreciation of defense and baserunning, it’s beyond asinine to conclude
that Ellsbury is twice as valuable as Fielder. Too often WAR is used as a
means of comparing oranges to apples. One of the things that makes baseball
great is the diversity of the fruit basket. WAR give incredible weight to
scarcity of shortstops, but no weight to the scarcity of
pitcher-intimidating, strategy-altering cleanup hitters, which I see as a
form of reverse discrimination.
These are not the last of the problems. WAR evaluates catching using only
the ability to control the running game. There is abundant evidence that
certain park factors have not been sufficiently accounted for. I’m not
arguing, however, that WAR should be completely discounted. As yet, it is
probably as good a singular statistic as is widely available. But, WAR is
not a debate-ending statistic, especially for single seasons. Even WAR’s
adherents, like Dave Cameron, generally admit the margin of error is at least
15%. When we stubbornly suggest that 0.5 WAR means anything, we are grossly
exaggerating the statistic’s accuracy, even according to its creators. It
remains true that any reasoned discussion of an individual’s contributions
still requires analysis of the various components that go into WAR, as well
as several that don’t, and, as such, subjectivity reigns.
Statistical elegance is elusive. Variables get short shrift or go
unaccounted for entirely. Results yield unintended consequences.
Misunderstood data is misrepresented and polemicized. In the words of
Tolstoy: WAR makes fools of us all.
作者: maxspeed150 (聽說茉夏分手了)   2011-09-07 10:04:00
這篇不好翻唷~ 長 而且必須要知道WAR是什麼才能抓到他
作者: maxspeed150 (聽說茉夏分手了)   2011-09-07 10:05:00
想表達的意義
作者: justis (安樂王)   2011-09-07 10:18:00
Wins Above Replacement
作者: Kinra (喵天使)   2011-09-07 15:31:00
太有挑戰性了,不敢翻XDrz
作者: freesoul (No place like home ￾)   2011-09-08 02:09:00
好文推,不過這篇真得很難翻啊,談的是sabermetrics又用
作者: freesoul (No place like home ￾)   2011-09-08 02:11:00
了一堆典故譬喻。另外給二樓,極速150大大當然知道WAR是
作者: freesoul (No place like home ￾)   2011-09-08 02:12:00
什麼啊! XD

Links booklink

Contact Us: admin [ a t ] ucptt.com