parinella's blog: The perils of stats

Thursday, January 12, 2006

The perils of stats

I love stats, and always have. When I was 5, I would quiz my mom about stats from the backs of baseball cards. I kept track of Little League stats, wrestling records, golf scores, everything. And in ultimate, I have a record of every tournament since 1992 (and the game scores for those first few years, too). Whenever I have a chance, I've taken individual stats from videos, and was part of a team that recorded every pass back in 1991.

But sometimes I wonder what it's all for. There are a few problems with stats that might just overwhelm their usefulness.

1. Small sample sizes.
2. Other things being equal, ....
3. Field position.
4. False normalization.

The first one is a bane for all sports. You'll hear "so and so is 2 for 18 against this pitcher." This has almost no value in predicting the outcome of their next encounter. Even if you have 100 passes, there is still some chance that a 90% passer will outperform a 95% passer. A lot of studies have shown that the "hot hand" is mostly just a figment of the beholder's imagination.

The problem with the second one is that they're usually not. 3 for 3 is better than 2 for 3, if the passes are the same length, but if the first passer completes dumps that put the disc on a trapped line while the second passer attempts upwind hucks, the second one is more useful. Over time, some of these things wash out, but there will still be outstanding differences resulting from usage patterns. The ideal solution is to compartmentalize the stats as much as possible, so that you separate hucks from dumps, O points from D points, zone from man, strong opponents from weak opponents, but then you get back into problem #1.

Field position (and wind) is just a subset. 40% scoring efficiency might be great if you're starting from your own goal line going upwind, but horrible when starting in the red zone.

False normalization results from the zero-sum aspect of the game. Baseball fielding has a similar problem, in that no matter how bad your defense, you still produce 27 outs in a game. You'll get 15 goals if you win no matter how bad your opponent or how bad you play. Sure, you'll have turnovers, just as you have errors or "defensive efficiency" (percentage of batted balls turned into outs) in baseball, but it makes it impossible to compare players on different teams. Having good teammates will actually hurt your raw stats in either ultimate or baseball (for fielding; batting stats will improve) because you'll have fewer opportunities for yourself.

What to do about this? I'm not sure. It's a bit of a problem, since we need to have a large sample from all levels in order to come up with a good model of how the game works, but no one wants to put in all the effort if they won't be able to do any actionable analysis immediately. But if nobody does anything, then all we're left with is conventional wisdom about what the smart move is in a given situation.

I'd like to see more situational analysis, measuring tradeoffs. For instance, if you have a 20% chance at pulling it OB, is it worth the risk of trying to land it within 10 yards of the sideline? How effective are set plays? What's the difference between having the disc on the line vs in the middle of the field? &c.

12 comments:

Gambler said...: Also, in your experience, how much of a problem is it to have people try and "pad" or manipulate their stats by making different choices based on what they know is recorded, regardless of the true value of that choice?; 4:40 PM
greg said...: with respect to your first problem, i just suspect that there's really no way to get around this when you consider what ultimate players want. in baseball you might say "player A hits 0.270", but in reality "player A is 3-for-4 when hitting against pitcher B in a dome". i think this example is similar to what people are trying to wring out of ultimate stats. you might say "handler A completes 95% of his passes", but in this situation he's trying to throw an upwind break-mark in mild wind and he's only 1-for-4 on those passes. should he be going for this type of throw? of course, there is some value in going for the the more difficult throw, but there's also something to be said for throwing within your skills. much of this can't quite be quantified (i would guess, but i would like to be persuaded otherwise), or if it can the sample size is so small as to be insignificant. so i think aggregating all throws, and then maybe grouping players by position is an adequate solution. (then just highlight anything that proves i'm superior and chalk any evidence to the contrary as statistically insignificant); 4:52 PM
parinella said...: The only real padding I've seen or expect to see would be in games against weak opponents, where you'd bait more than is optimal and look for cutters in the end zone more than is optimal.

Seems like grouping into positions is a good idea, but also maybe not just "handler" or "receiver", but also "possession" or "aggressive" for throwing. I think the position you play is more about cutting than it is throwing, at least once everyone can throw decently, so labeling someone as handler or receiver doesn't really tell you about the usefulness of their throws.

Goals thrown gives you an idea of the latter axis, but that too may be misleading.

And I suspect that initial efforts to extract meaning out of the stats will indeed take the form of "which stats do my best players excel in"?; 7:42 PM
Schmelz said...: As with any sport, there's obviously not going to be one magical stat that determines a player's usefulness. There are many factors that influence stats, far more factors than can be accounted for.

To gain useful conclusions (without statistically significant amounts of data), it becomes more about narrowing down the important factors based on conventional wisdom. Kind of a chicken/egg thing.

Just like baseball has Onbase %, Slugging, etc, there should be multiple tools that should be able to outline a player's tendecies (leadoff hitters should have higher OBP and Batting Avg.). What would those specialized categories be in ultimate?
Maybe:
Avg # of Turnovers per touch?
Avg Distance per completed pass?
Avg Distance per Goal thrown?

Aren't numbers fun?; 6:29 AM
greg said...: as i thought about this a little more, and then re-read jim's original post, i thought about the possibility of normalizing by the team's stats. is this was you're referring to with "false normalization" jim? my thought being that maybe it would be a good idea to define a players turnover-ness by what percentage of the team's turnovers they have. this way you can account for the conditions and opponent in some way. so if over the course of a tourney one player has 10% of the offenses touches, but only 5% of their turnovers, this player's value might be quite high. this might be an interesting way of normalizing to get the best relative value out of players on the same team. i'm probably over-simplifying.; 8:04 AM
parinella said...: The issue with false normalization is that it's not possible to compare players on different teams (or maybe even an O team guy and a D team guy). You can look at the stats for two hitters, adjust for the park and league, and make a fair comparison as to which one is better.

Today, most people who would be interested in stats are mostly concerned about their own team and how to make them better, so they wouldn't care too much about this problem.; 8:54 AM
AJ said...: The issue with false normalization is that it's not possible to compare players on different teams (or maybe even an O team guy and a D team guy).

Does this mean I was a bit hasty in claiming to be the best player at club nationals this year based on my scoreomatic stats?

Thanks for reporting those scores Dusty!

aj; 10:19 AM
Adam said...: There are some statistics that are more natural to use than others, but what do people think of the +/- stat from hockey? Every time someone is on the field, the thing that matters is scoring. Unlike hockey, there might have to be some way of handling if you start on D or O, but it's still about whether or not someone on your 7 catches the disc in the end zone.

(Sorry if I'm sliding off to the side of the original point.); 8:33 AM
Anonymous said...: This comment is entirely off-topic, and completely pointless, yet I post it.

Jim, i thought you should know (though you surely don't care) that you, through your blog, have become the inspiration for a California Junior's ultimate team.

Whenever we need to play chill, possesion offense, we yell "do it for Jim Parinella!" or something equivalant.

Jim Parinella, you are my strategist hero.; 10:39 PM
Bob Krier said...: I'm currently reading Moneyball, and one of his points seems appropriate here. He mentions that it took a group of outsiders in the 1970s to recreate baseball stats, and to track the game completely. The teams and MLB tracked certain stats, but not a complete recording of the game. MLB recorded minimal stats and each team tracked what additionaly they thought was important, if anything.

Ultimate isn't even up to 1970's baseball, as we have yet to have organized stat keeping at any level. I was just looking at my Newsletter today and noticed that in the Master's final, 5 of 26 (~18%) goals scored did not get the thrower or receiver recorded. And each team that currently keeps stats, seems to keep a different set of them.

So my two thoughts are:
1) If stats are to be kept, it'll be because of the efforts of a few motivated individuals who organize it all. It will not be because of the effors of the UPA or the teams involved.
2) It'll most likely need to be done in stages. First, getting some minimal level of stats recorded, correctly and completely. Then once that becomes common, there will be the possibility of refining it.; 4:26 PM
parinella said...: BK,
We're probably closer to the 1870s than the 1970s. According to Alan Schwartz's "The Numbers Game: Baseball's Lifelong Fascination with Statistics", the first box score was in 1858.

Baseball's real stats push probably came from Stats, Inc., which came about from Project Scoresheet, an initiative pushed by Bill James. In it, individuals were to score all aspects of every major league game using a standardized scoresheet, then submit the sheets to a central repository, which would store and analyze the data. I tried to get this started last year with the UPA's blessing, but I completely dropped the ball and did nothing.

Some level of stat-keeping is common, but the level is not. I know of several teams that record (or have recorded) every pass, even breaking down dump/short/middle/long. But that data probably gets lost to history. And stat-keepers might be reluctant to share, either because it's proprietary or because it's personal.

I think we'd stand a better chance at collecting team-level stats on a wide level, as it's not nearly as much work and doesn't reveal many secrets about a team's effectiveness.

And anonymous, thanks for making my day.

I like the concept of +/-, which might prove itself to be more useful to ultimate because of the non-linear nature of individual stats. But it is probably even more susceptible to peril #2. Take a look here for an NBA version of +/-.; 6:04 PM
Bob Krier said...: We're probably closer to the 1870s than the 1970s

Thats a good point. I think that whatever initial statsheet that gets used consistantly will probably, in hindsight, turn out to have overlooked valuable data like Henry Chadwick's modification of baseball scoresheet back in the 1870s, which didn't track walks as part of the batter's record. He considered walks to have nothing to do with the talent of the hitter, but merely a function of the pitcher. But without version 1, there's no chance to modify it.

So why don't you try to get it started for this year? We're you thinking of club or college?; 11:14 PM

parinella's blog

Thursday, January 12, 2006

The perils of stats

12 comments:

About Me

Followers

Blog Archive

Links