Tuesday, June 21, 2005

blowouts and upsets

Are there too many of these in ultimate? Are upsets so unlikely that most games are a given? A couple months ago, during the ever-so-boring winter, I took a look at the RRI for club teams for 2004 (rankings are no longer available on the UPA site since the 2005 season is going on). One nice feature of this Bradley-Terry model is that you can easily convert rankings to expected point differentials.

The next logical step is to convert point differentials into expected winning percentages. I want you to make a quick guess on the following problem. If the rankings predict a 15-12.5 victory, how often will the underdog actually win?

Baseball teams that outscore their opponents by that ratio will win about 59% of their games (96 wins in a 162 game season). NFL teams: 63% (10-6). Soccer teams: 55% (counting ties as half-wins). NBA: 94% (77-5). Ultimate: my brief, severely flawed study shows 65% for women's games and 85% for men's. An earlier brief, less-flawed study came up with an estimate of 67-75%. (The other estimates use a version of Bill James' Pythagorean Theorem, which says that the expected winning percentage is A^N/(A^N + B^N), with N=2 for baseball. Other sports' exponents: NFL 3, NBA 15, soccer 1.2, ultimate 4-6. If you had a sport where every game was 1-0, the exponent would be 1.)

I took all the games for the top 27 ranked teams in 2004, and grouped them according to the number of goals a lower-ranked team should score against them. For instance, where the score should be anywhere between 15-14.99 and 15-14, the higher-ranked men's teams were 59-34. (Incidentally, I think the one defeat for 15-7 men's teams was a mistake in the reported score.)

One major flaw is that this method uses actual results to predict those same results. Really, the rankings should be calculated ignoring each game, one at a time, and then using that. But here are the results:

Men Women
Spread W L W L
14 59 34 19 7
13 65 29 20 10
12 51 9 36 19
11 45 3 31 10
10 45 3 42 4
9 52 1 35 0
8 35 0 44 4
7 30 1 29 0
6 18 0 38 0
5 29 0 28 0
4 22 0 23 0
3 20 0 27 0
2 23 0 24 0
1 21 0 37 0
0 17 0 52 0

Couple other interesting things: women's game are much more likely to be expected blowouts (three times as many 15-0 matchups), and slight underdogs in women's games actually had a much better chance at victory than in men's. But maybe that second part is just an artifact of the ranking system, since the disparity in women's abilities are much greater and it messes with the rankings since you can't win any worse than 15-0. (The #1 women's team is expected to beat the #14 women's team 15-5, while that matchup for guys would be 15-10.3.)

Overall, about 45% of men's games and 60% of women's games were simply going through the motions to see what the final score was.


Anonymous said...

yes. there are too many blowouts in ultimate. there's got to be some way to get rid of the saturday pool play blowouts where most teams have no realistic shot at upsetting a top tier team.

it seems more and more common these days to have a tournament where a top level team (nationals quarterfinalists or better) gets 1 or 2 quality games in a weekend. the rest of them being blowouts, or really easy wins. seems almost like a waste of time for the elite level teams.

Justin R said...


How did you figure out the % teams would win if they scored at the ratio predicted by their RRI?

I would expect the standard deviations to be different for each team (some teams performing poorly due to no-shows, not showing up in force, and just having a good/bad game/day), as well as within levels for each division (lower teams less predictable b/c one or two no shows drops their competitiveness more than a top level team), and throughout the season (early games teams play with try outs etc).

If the standard deviation approaches 0, the higher ranked team never loses. If it is high, the lower team wins more often, even if their ranking is significantly lower.

-Justin R

Justin R said...

Last sentence re-written:

As the standard deviation increases, the lower ranked team wins more often, even though difference between the rankings is the same.

parinella said...

A lot of the elite teams now go to elite-only tournaments as much as possible. ECC, Colorado Cup, Labor Day, NJ Invite: all of those have 6-8 teams at about the same level, round robin, then some short elimination round sequence. The Boston Invite used to be almost like this, but this year, DoG will probably have three games against teams unlikely to finish in the Top 5 in their region, followed by a quarterfinals upset loss to DC.

And it's exciting for teams to enter tournaments where they can win if they play well and won't make semis if they don't. Isn't this preferable to a tournament where your stretch goal is to lose 15-4 in the quarters?

I wonder how important it is to the sport and to the barely-Regionals- or Sectionals-level teams to be able to play teams like DoG or Pike, who probably are looking past the team anyway.

So, the easy solution is to have more tournaments that are highly-stratified, or possibly even a set of leagues akin to European soccer with relegation and promotion between divisions.

parinella said...


The RRI creates a ranking, a number like 2572, and it also predicts what the score should be between two teams. Take a look at this page, for instance.

I took the end of season rankings and predicted scores, and looked at all the actual scores to see how often upsets occurred. As I said, this is somewhat flawed because if two identically-ranked teams played each other once, the winner would then have the higher ranking and in my system the higher-ranked team would automatically be 1-0. This is probably true for DoG and BAT this year, for instance.

Re: sd's. Yep. I looked at four college teams for the recently completed seasons, and they had tournament sd's of 97, 122, 88, and 69 points. An RRI difference of 100 translates into about 15-12.4.

Edward Lee said...

So, the easy solution is to have more tournaments that are highly-stratified, or possibly even a set of leagues akin to European soccer with relegation and promotion between divisions.

*cough cough* roster limits *cough cough*

parinella said...

Roster limits are worth a post or two of its own, but they are not going to do anything for most of these blowouts. The elite teams are already playing the bottom half of their roster a disproportionate amount in the blowout games (or even the expected 15-11 games).

But that's not what you're talking about. Yes, significantly small roster limits (say, 14 or 15) would make some difference. Obviously you would see weaker top teams, but the other effects are hard to predict well. I'll make a new post.

Anonymous said...

I'm curious what the numbers are like for mixed.