Tuesday, February 11, 2020

The Lake Wobegon Effect, or Why Most People Are Above Average

Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average."-- Garrison Keillor 

More generally, illusory superiority is the tendency to overestimate one’s ability compared to others. For instance, it’s widely quoted that 80% of drivers think they are above average in driving skill, and we all pooh-pooh those other dumb people who erroneously think they can somehow defy simple math. However, there is a perfectly rational reason for this to be the case.

For something like height that is easy to measure and easy to compare, you would probably find that close to 50% of people think they are above average. But what about more complex skills like driving or intelligence or soccer-playing? There are multiple dimensions in each of those. Person A hasn’t been in an accident in 10 years. Person B easily maneuvers through city traffic and can merge onto a highway seamlessly. Person C would win a race around a closed course. In soccer, some players score goals, some pass the ball well, and some play defense. Who’s the best? Well, the mother of the soccer player who scores goals thinks that goal-scoring is the most important skill, while the father of the defender thinks that defense wins championships, and the grandfather of the passer loves the beautiful game and wants to see the ball passed around. Who is best? “My kid!”

Let’s say that each player can accurately assess how good everyone is in each attribute. However, each of them thinks that the attribute they are best at is twice as important as the attribute they are second-best at, and four times as important as the one they are worst at (so, a 4x-2x-x weighting). What happens now?

Here’s a simple example with three players. Each is really good at one skill, average in another, and terrible in the third, but is average if each skill is equally important. Fairly evaluating each skill (but not their importance to “soccer playing ability”), each player thinks that they are the best. Similarly, each player is considered by one of the others to be the worst.

Equal assessment
Scorer’s assessment
Passer’s assessment
Defender’s assessment

What about the broader case? I did a simple simulation for this. I have 10 000 individuals whose skills in three attributes are independent and are uniformly distributed. As you would expect, 50% of them are above average on each attribute and on the average of the three attributes. However, each weights the importance of the skill in accordance with their ability in that skill.            What happens now? As it turns out, 74% of these people are now above average! The median person is now in the 63rd percentile.

Now let’s add in the illusory superiority bias and see where we get. Let’s assume each person overestimates their skill by a mere 5%, so if they are truly average, they think they are in the 55th percentile. Now, 81% of the people are above average.

How about a real-life example? Who was the best hitter in the American League in 2017? I looked at three stats (batting average, home run rate, and strikeout avoidance) for all 78 qualifiers that year, and compared each player to the average qualifier in that stat. Using an equal weight for the three metrics, there were about an equal number of above-average and below-average hitters, as we would expect. But what if each qualifier got to choose what stat was most important and which was least important, like in the hypothetical example above? Then, the average player was suddenly 12% better than average, and 57 of 79 (73%) are above average. Welcome to Lake Wobegon.

Friday, April 10, 2015

How my team has done in every tournament I've played in since 1992

At Nathan Wicks' request, I made a chart of all 239 tournaments I've played in the last 24 years and how my team did (1st place is at the top). It's split out by division, and Nationals and Worlds are highlighted.

Friday, January 17, 2014

Live at skydmagazine.com

I am now writing a monthly feature at skydmagazine.com. The first feature is "Why I'm Still Around". In it, I trace my interest in playing over the years, starting out as an 18 year old who didn't have anything else better to do before going away to college, as a 20-something developing my game, as a 30-something at the top of my game, and as a 40-something waiting for death. One of the things I touch on but didn't get to develop enough is that I get a chance or take the opportunity to chat more. For instance, I played with Zip for several years, but I've probably had more serious conversations with him the past two years (bar run, party at his house I crashed, goaltimate) than I did during our time as teammates. DoG Masters and Beach Worlds 2011 also led to getting to know more people, and those have led to additional opportunities like playing with Los Zodiac at Paganello, where I got to team with modern luminaries Beau Kittredge and Bart Watson. Two quotes from Bart: "Beau on sand is like the rest of you on grass" and "So, what was it like to play before there was strategy?" I (and Alex) got on that team through our connection with Greg Husak of the Condors, who (along with Steve Dugan and Mike Namkung) played with DoG Masters at Worlds in 2008. Good quote from Husak after playing with us for a few days: "So, let me get this straight. Your offense is 'this guy cuts, then that guy cuts, and if he doesn't get it, this other guy cuts'." I guess the Condors offense was a bit more formally structured. If you have any suggestions for future articles, leave them in the comments.

Monday, August 26, 2013

Stat post for The Huddle

With all the stat posts at ultiworld.com, I thought I'd post an article I wrote for a stats issue of The Huddle which never got published.


In the early 1990s, my teams (Earth Atomizer and Big Brother) recorded every pass of the season, entering them in a notebook using a shorthand notation during games, and some friends and I would compile them afterwards. Among other things, we found that forehands were thrown away about 50% more frequently than backhands, about 60% of hucks were complete (except for a certain anti-stat hothead who went 4 for 16), and 1.5% of passes were dropped. We did use that first piece of knowledge (coupled with “scouting” observations) about forehands to decide to force forehand most of the time. But what else did we gain for all that time spent?

I have come around to believe that for the time being, the concrete value of tracking individual statistics to predict or to evaluate is doomed by two things, context and sample size. We tried to make one adjustment for context, namely, separating out “tough games” from “chump games”. But then that fed into the second issue, sample size, since we now had fewer games to draw from. And was the line between “tough” and “chump” in the right place? Some of the games were tough because of bad playing conditions, others because we just played badly or exceptionally well in a game that would normally be a blowout, and still others because we had a skeleton crew. Oh, and some teams played zone and forced the handlers to pile up twice as many throws as usual (without, I hope you realize, playing twice as well). But we counted them all equally.

(I should also add that “opportunities” is something that must be accounted for when trying to analyze. But it’s not always as simple as dividing by the number of touches. In the seminal basketball analytics book “Basketball on Paper”, Dean Oliver (now in charge of stats at ESPN) highlighted that player efficiency decreases with increased usage as the players who bear the brunt of the offensive load have to make plays that are closer to the margin. Furthermore, these players will also draw the toughest defenders.)

But we could still tell who was good at completing passes, right? Well, as I like to say at my job where I analyze my company’s engineering performance, it depends. They recorded individual stats on last year’s NexGen tour. I was very excited to get this dataset, because every game was against a quality opponent, almost everyone played almost every game, and each game was a showcase and not just one of many in a long weekend. As it turns out, this dataset too suffers from some confounders such as the first half of the tour beings spent figuring out how to play together and what roles to settle into, but it was still the purest dataset I know of. For the complete tour, turnover percentage* of the players ranged from 3.4% to 12.4%. But for the most part, the guys at the higher end of the turnover range also threw a higher percentage of their passes for goals, while the low-turnover guys didn’t throw as many goals. Here’s the graph for all of them, split out by how often they touched the disc per point: *They didn’t separate out drops from throwaways so we’ll have to use this instead of incompletion rate.

Note also that the high touch players were in the lower left corner. I can think of two explanations for this besides them being conservative handlers. One, an in-bounds pull almost always results in an uncontested completed pass. Two, passes in general in that half of the field are typically easier to complete because the defense has to respect the threat of the long pass, and I think that handlers have a higher percentage of the touches there than they do closer to the endzone, where pass frequency is more evenly distributed. At the other end of the graph, deeps are going to be catching more of their passes near the endzone, resulting in relatively more opportunities for goal throws but also with each completion a little more difficult because of the reduced space. The risk/benefit of a few extra yards changes near the goal line as well. I wrote in “Ultimate Techniques and Tactics”, a book co-authored with Eric Zaslow and published by Human Kinetics and still available through your favorite Internet reseller, that being in the endzone instead of just on the goal line increases your chance of scoring as much as being 10 yards closer elsewhere on the field.

I once set up a simulation of an offense where the players were equally talented (i.e., had the same incompletion rate per yard of throw) but had different roles in the offense and different throw choices. The first thing I noticed is that a particular player sometimes had MVP-level tournaments and sometimes had tournaments where he would have been benched. The more important point, though, was that the players’ stat lines resembled those of real teams such as NexGen, with some players racking up the goals and turnovers while others had lots of touches but few fantasy league stats. This leads me to conclude that much of the difference between the stat lines of any two players is not a difference in effectiveness but simply a matter of taste. (Note that there are still some players who stand out, either good or bad, but you generally don’t need a calculator to know that.) Two equally-efficient players can have drastically different stat lines due not to any difference in skill or on-field decision-making but to the difference in their roles.

So what do these detailed individual stats (at this stage in our history, where we have only the stats of our own team against a wide range of opponents in vastly different environments) bring to the table? Accountability and self-awareness. Lord Kelvin wrote, “If you cannot measure it, you cannot improve it.” Simply being aware of your actual completion percentage on hucks should force you to contemplate whether you are making good choices. I remember going over each of my turnovers in a weekend (with the aid of the stat pad) and being shocked to learn how many of them were simply poor risk/reward decisions, and I was able to eliminate some of those.

Lest you think I’ve given up on stats, I haven’t. But I think the payoff for now would come on analyzing team decisions. The first priority would be to get realistic baselines for performance. I routinely see people write that five turnovers in a game is typical or that drops never happen or that hucks are completed 75% of the time. While there are certainly examples of these happening, I would guess that they aren’t the typical performance. The other area I would like to quantify the value of particular scenarios. For instance, how much harder is it for a team to score off a deep, high pull vs a low pull vs a brick, and how consistent are good pullers at achieving good pulls? Could someone who is an otherwise bad defender still be a good D player simply by virtue of his pulls? On the offensive side, exactly how costly is it to rest one of your top players? How deadly is it to turn it over in your own half of the field? Might the Huck-‘n’-Hope offensive style actually be a reasonable strategy due to the long field left after a turnover? We might have opinions about those now, but until we measure these, we don’t know.

Friday, March 01, 2013

Future of Ultimate

I'll be attending the Ultimate Players and Coaches Conference tomorrow as a panelist for "the future of ultimate". Back in 2005, I blogged about trends in ultimate over the previous 10 years, over the upcoming 10 years, and that didn't happen. I have a feeling that the focus on tomorrow's discussion will be around the further professionalization of ultimate. This doesn't mean only whether the pro leagues will take off and how widespread they'll be, but what elite-level ultimate will look like. Will top players in 10 years still be attending Potlatch and Paganello and Poultry Days or will it just be unacceptable for them to risk getting hurt while wearing overalls? How much will decisions be about the players on the field versus the rest of the ultimate world or the spectating world? There has already been a movement toward less freedom, but how far will it go? Will Men's split off from Women's (and both say goodbye to Mixed) if a sponsor comes calling? Those were the kinds of questions that immediately popped into my head, and I brought up the topic at skydmagazine But then after I got into a conversation with one of the other commenters, I realized that once again I neglected 95% of the ultimate out there by thinking only about the competitive season in the US (or in those places where they could reasonably expect to challenge the top US teams). So maybe I need to think some more about how summer leagues, youth camps, fun coed tournaments, and semi-competitive (i.e., playing tournaments without devoting all your spare time to the game) ultimate.

Thursday, July 07, 2011

2011 mid-year recap

Last year, of course, I had surgery in March and spent the rest of the year recovering. I managed to play in six tournaments anyway, but at only somewhere between 50% and 85%. I was still feeling a little stiff at Nationals, and I was definitely not at full strength due to the seven months of inactivity pre- and post-surgery.

But by springtime, I was back to 100%, though of course 100% ain't what it used to be. I did some sessions with a personal trainer through an online coupon, then I discovered a cardio/core group workout in town and have been going once or twice a week since then. Add in the usual basketball/softball/tournaments/other workouts and I'm actually feeling pretty strong these days (again, see above 100% comment).

Because it was free, I applied for the World Championship of Beach Ultimate team, and got picked for the Masters team. When applying, I thought that I probably wouldn't go if selected, but once the selection actually happened, I got a bit stoked about it, so I'll be heading to Italy this August.

My frisbee season kicked off at another Italian beach tournament, Paganello, which is like Spring Break but with a four-day ultimate tournament thrown in. I played again with the team known this year as Los Rabbit. We had 17 players, up from about 11 two years ago when we lost in the finals as Los Ox. (The team won last year as Los Tiger but I couldn't make it.) This time I spent the day in Milan on my way there and walked around the city. I'm always impressed by the huge churches, in this case the Duomo, which when built was supposed to be able to accommodate all 40 000 of the city's inhabitants. As always, hanging out with friends and taking part in the event's festivities are a large part of the tournament. We had cocktail hour at the seaside hotel every night, including one night where the hotel had a wine and cheese party for its guests (we assumed at first that there was a private function, but then we found out it was for us, fresh off a late game). The big tournament party as always featured lots of people wearing weird costumes to fit the theme.

This was the tournament where I felt most like a role player. I belonged on the team, and I could have played more without the team getting worse as a result, but I could have also played less without the team getting worse. PT was fairly even in pool play (we never called subs), and I was moving and playing very well. Prior to the quarterfinals, for some still undetermined reason, I completely hit the wall and felt like I was running in very thick and deep sand. I couldn't even play without feeling like I couldn't make it through the point if we turned it. (I did get a layout block early but am pretty sure it was gift-wrapped for me by the thrower.) I took myself out of the game because it was so close and we had lots of options. I recovered a bit for the semis later that day but still felt pretty crappy. Even the next day after a relatively calm Sunday night, I still felt like crap, so in some ways, my performance in the finals should rank among my career highlights, even though I only played 4 or 5 points (about half of our O points), since I had to go all-out just to play (and I distinctly remember hearing myself breathing fast while running down the field). Anyway, got my first Paganello championship. Perhaps my biggest accomplishment, though, was in making my flight back despite the Italian transportation system doing its best to thwart me. Don't believe it when you hear "at least the trains run on time."

A few weeks later was the White Mountain Open. Rain forced us to move to a multi-purpose sports facility in Quechee. But never before had I seen a combination driving range/inclined polo field. We started off the day with only 7 players and added two late in the first round. We played well enough through 1.5 games before collapsing. I had to start calling timeouts to give us some extra rest. (It didn't help me that I had done a particularly hard cardio/core workout the day before.) We got a few extra people on Sunday and that made a big difference, and we stormed back to take 9th place. At 13-13 in the finals we threw it away in their end zone, but Alex made the defensive play of the day. He ran "full speed" into an opponent and his girlish yelp of pain/fear threw off the cutter enough that he stopped his cut to see what was going on and the disc (which was in the air) hit the ground. We punched it in, then got a break to win 15-13.

Next was the GM qualifier. One of the teams bailed and blamed the USAU for their not knowing what was going on, so we played only two games. Again I had a hard cardio/core the day before so was a bit fatigued, but it didn't matter. Our whole team played a bit sloppy. We won, though, and qualified for the GM championship, which is this weekend in Ohio.

A few weeks later was the Boston Invite. The Masters RC was able to work it out with the TD that we could have a pool of Masters teams on Saturday, thus counting as a Masters tournament that will require one fewer team at fall Masters Regionals in order to avoid the anti-wildcard. We had our best day of DoG Masters in quite some time, winning all four games, including 15-10 against the Canadian team GLUM (who weren't at full strength). We played a team of Dominicans + Brodie + a couple other Americans in the 9-24 pre-quarters, jumped out to an 8-4 lead, and limped home to a 14-12 win. This put us in the 9-16 quarters against Mephisto. We were already starting to lose players and so did open subbing. We started out well, going up a break, and even had a second break but it was called back on a pick that the defender would have had no chance on, we turned it, and they didn't look back. We were then scheduled for two consolation games, but we were down to fewer than 10 people who _could_ play and nearly 1 who actually _wanted_ to play, so we discussed with the other teams and arranged it so that we didn't have to play and the teams who wanted to play could play.

And as I mentioned, this leads us to today. We are seeded 2nd in the GM tournament, with a likely semifinal matchup against Surly. Top seed and defending champ Old And In The Way is most likely not going to be as strong as last year due to having to leave Colorado this year (and the rest of us will not have to acclimate). It's always a pleasant change to go from playing against young kids who are eager to lay out into you to playing against old guys who are even more afraid of hurting themselves.

Friday, March 25, 2011

Value of a top player

I got a comment on the previous thread and posted a response but wanted to make a new post about it:
I would like to hear your statistically informed opinion on the following thought experiment: assume that there are thirteen players of roughly average (on the scale of all ultimate players) and equal ability (compared to each other). The fourteenth is a player of outstanding ability--someone widely thought to be one of the best players in the game.

They play a pickup game in which everyone is trying their best to win. What is the probability that the team with the elite player wins?

Hey, good question. I did some simulations about 15 years ago for a UPA Newsletter article. I will use the chart in there to make estimates.

(First, what is an "average" ultimate player? What is the average income between a homeless guy, Joe the Plumber, and Bill Gates? When you have such a range between high and low, "average" becomes a funny concept. I'll assume "average" is someone who would fit in nicely on a low-level regionals team.)

Two teams that score at equal rates will of course win an equal amount of the time (with a slight advantage to the team that receives in the first half, but we'll ignore that). A team that has a 5 percentage point advantage (e.g., 40% vs 35% of the time they touch the disc, they score) will win 65-75% of the time (with the bigger advantage when the percentages are at the lower end). A 10 point advantage goes from 76-87.

With the average groups, I'll assume that teams score about 30% of the time. Top Open teams playing against top Open teams in moderate wind might be around 50%. What effect does this awesome player have?

First, I think the effect on defense will be less than on offense. He will get some poach blocks but since there is no star on the other team he won't be able to thwart their offense. Let's assume he gets 3 additional blocks but otherwise has no effect on their offensive efficiency (such a player at the elite Open level would be possibly the best player in history). Previously they were 15/50 in a game to 15, change that to 15/53, that's a drop to only 28.3%. To lower their % to 25%, he'd need to get 10 blocks a game.

Let's pause for a minute and consider what a superstar team would do against this team. I'd guess 15-1 or 15-2 is a fairly typical score for a game like this, though there is a question of whether they are trying their best to win, if for no other reason than they have 4 games that day (but so does the other team, and I'll guess they aren't in as good shape so would be further from peak efficiency). If they had 5 turnovers, that'd only be 75%. So, adding 7 elite players to an average team would take you from 30% up to 75%. I suspect that most of the benefits come from the first one or two, and almost nothing from 5-7. (Dennis suggested 20 years ago that the highest marginal value is provided by the second player, because that gives the first player someone to throw to). So, to get those 45 percentage points, I'll say it's 14, 14, 9, 4, 2, 1, 1 for each added player.

That puts the O efficiency at 44%, D efficiency at 28%. That means the O will score 15/34 times instead of 15/50. The other team will score 28.3% of 33 times or 9.3 goals. Set the point spread at 5.5.

Using a Pythagorean exponent of somewhere between 4 and 6, which my earlier research has suggested, that gives an expected winning percentage of 87-95%. Interpolating my table would give an estimate of about 93%.

Also, IIRC, a 40 point difference in RRI translated to a 1 point difference in expected score.