Tuesday, August 15, 2006

softball season, with stats

The Cougars finished up our softball season on Monday this week, getting knocked out in the first round of the B Division playoffs in the Sudbury Men’s Softball modified fast-pitch league. We gave up five runs in the last of the seventh to lose 21-20 after getting crushed 27-11 in Game 1 of the best of three.

We almost duplicated the Miracle of Castel di Sangro of getting promoted two years in a row. As longtime readers will remember, we won the C division last year to earn promotion to B. We started this year 1-3 amid a bunch of rainouts and began to fear that we would be relegated again, but won a couple close ones, then ran off our last six games to finish 10-6. Unfortunately for us, our rivals for first place won their last game to finish in a tie and took the tiebreaker.

We averaged almost exactly two runs per inning this year with remarkably little power. We probably got outhomered by a ratio of 2 or 3 to 1 this year (gave up 5 or 6 in our last playoff game versus none for us) but got on base well. We had a line (avg/OBP/slg) of .419/.475/.569 (my line was .535/.549/1.070; a late surge in walks prevented me from the dubious distinction of having an OBP less than my average (sac flies count for OBP but not avg)), our opponents were probably .350/.400/.600 or so. It’s Ichiro (but even moreso) vs Manny Ramirez.

One of the quests of baseball statistics (and all sports statistics) is to take the individual actions and figure out how they contribute to the greater good. There are many, many, many stats that do this for major league baseball, but they generally have problems dealing with extreme cases. Do you consider what a team of that player would score, or do you insert a player into a team of league average players, or do you replace the player on his actual team with an average player? For most players, there is not much difference between these methods, but if you have a Barry Bonds, it matters a lot.

Such is the problem with evaluating the Cougars. I tried using a couple run estimators (Bill James’ Runs Created, linear weights) and they dramatically underestimated how many runs we should have scored, and I’m not sure how to go about reconciling the difference. It’s probably due to a combination of the high OBP/low power offense and to having the power concentrated in the bats of a few. About ¾ of the at-bats were taken up by guys who had 140 singles, 4 doubles, and 3 singles (.378/.440/.424). I don’t think the problem is due to plays scored as errors, since our total of at-bats minus hits is pretty close to the number of outs we have made (using innings played). Adding wild pitches would get us back about a run per game, but that’s still not nearly enough to bridge the gap.

It’s not really going to help pick an MVP or anything like that, since small sample size and luck overwhelm many differences (a homer in a 40 at-bat season adds 100 points to slugging, for instance). But it probably could reveal something about the optimal strategies at this level. How often does a bunter need to get on base in order for it to make sense (note: I have never seen a sacrifice bunt attempt in this league, only bunts for hits)? Should I start uppercutting in order to hit more home runs (at the expense of other hits)? Should I start swinging down in order to get more singles and reached on errors (at the expense of power)? How good does a hitter have to be in order for it to make sense to walk him every time (although of course I would call that a pussy move)?

Anyway, Jim through the years:
2006: .535/.549/1.070
2005: .439/.465/.756
2004: played 2 games
2003: .536/.567/1.000
All: .508/.554/.958

I do not remember grounding out this year, and can only remember a few ground balls at all. Most of my outs came on poorly-hit fly balls or popups. About ¾ of my well-struck balls were line drives, with the others about split between ground balls and fly balls, and it seems that 80-90% of those fell in for hits. I did hit a few balls to the right side this year, including a fielder-aided home run, but those were all mistakes. Real men, if indeed they play softball, pull the ball.


Anonymous said...

wow, it's amazing what we'll read with no new frisbee posts

parinella said...

Oh yeah? Well, stay tuned then for a shot by shot recount of my last round of golf.

Tybor said...

Estimators like Runs Created won't work for your softball league because your league's run environment is vastly different from the MLB run environment from which they were derived. Bill James' Runs Created only "works" when the on-base percentage is between .300 and .400. There is limited external validity to his findings - you can't generalize them to your very high run-scoring environment.

You should look into BaseRuns as created by Dave Smyth, which will allow you to appropriately account for the effect of various offensive events in your appropriate environment.

TangoTiger has written extensively about this concept, both on his website (How Runs are Really Created) and in "The Book" by Tom Tango. He also discusses how you can appropriately calculate the impact of an individual star player in a line-up (Obviously, Runs Created won't work here either because the player is outside of the .300 - .400 range. Applying Run Estimators from a league to the stats for individual players in an example of the Ecological Fallacy.)

parinella said...

Since you asked...

I looked at BaseRuns. The customized linear weights table on TangoTiger's site only goes up to a 10 RPG environment (per 9 innings) while my league is more of an 18 RPG environment. I tried curve-fitting to find the proper linear weights for that environment, but ended up with values that gave a team total of like -100 runs. (An out would be worth -1.05 runs, for instance.) We don't record WP/PBs, which would probably be worth about 1 RPG. Errors would also be worth 1-2 RPG, but we're still off by more than 50 runs, and that's before subtracting DPs and baserunning blunders.

I don't remember reading about that in The Book.

Julian said...

3 things:

1) Miracle of Castel di Sangro! Fantastic book, thanks for the reminder.

2) I can't believe I'm reading about your softball league. My life is so small...

C) I was gonna point you to Baseball Prospectus for some advanced metrics that might be useful, but if you know who tangotiger is, you probably don't need my help.


Tangotiger said...

Start with the basic BaseRuns equation here:

Once you understand that, try the more complicated one here:

If that still doesn't work, you need to tweak some of the coefficients so it fits to your environment.

If you can email me your league stats, I'd be interested to take a look. Also tell me how you handle reaching on errors, and how often those happen.

Tangotiger said...

I address your league here:

And, you may find this useful as well: