Saturday, March 6, 2010

Good Offense beats good defense?

I suspect I’m like a lot of you in that I often have more faith than I ought to in Pomeroy’s adjusted efficiency numbers, and more specifically in the single game predictions.  I sometimes take it as gospel that, given a team’s adjusted offensive efficiency and their opponent’s adjusted defensive efficiency, and no other information, the best prediction we can make for the team’s offensive efficiency is:

(Team Off Eff + HFA) x (Opponent Def Eff + HFA) / Ave Eff

This is the formula is used as the basis both for creating the adjusted numbers from the raw numbers, and for the individual game predictions shown on kenpom.com.  In general it does a very good job of predicting game efficiencies – there’s the gospel part – but I wondered if might break down at the extremes.  Why?  Simply because close games demand a different level of effort from a team than blowouts do.  When Kansas played Alcorn State earlier this year, the game was over before it started, and I wouldn’t have blamed a soul on either team if they didn’t give 100% effort that night.  Maybe this evens out in the end – the offense plays at 90% effort, the defense plays at 90% effort, and it cancels out.  But maybe offense is more fun, so players don’t slack off as much at that end.  Or maybe effort is more important to offensive rebounding than to defensive rebounding, and so the offensive efficiency suffers more than the defensive.  At any rate, I wanted to check, so I did what I do – dumped thousands of data points into an Excel spreadsheet and made some pretty charts.

I took the offensive and defensive efficiencies of all the teams from 2009 and grouped them into fifths by their percentile rank.  For example, Utah State was ranked 17th in offense, but 158th in defense, so their offense would be in the “top 20%” group and their defense would be in the “average” group.  I then took every game from the 2009 season and calculated the predicted offensive efficiency for each team, using the formula above.  I subtracted that from the actual recorded efficiency to get a value for how much the offense over- or under-performed in that game.  Then I binned those numbers according the offensive and defensive groups to produce the chart below.

image

To read this, look up the quality of the offense on the left, and the quality of the defense at the top – the cell where their respective row and column intersect shows how the offense performed on average, compared to what the efficiency formula predicted (in points per 100 possessions).

You can see a clear pattern – the highest values are in the upper left and lower right corners, with positive values strung out between them, while the other two corners are severely negative.  What this says is that when both the offense and the defense were very good or very bad, the offense did better than you’d expect.  On the other hand, when there was a severe mismatch – either the offense was way better than the defense, or the defense was way better than the offense – the offense did worse than expected. (Keep in mind that the quality of the offense/defense should already be taken into account by the formula).

Why would this pattern exist?  I’m sure it’s a combination of reasons, but I think the main underlying factor might be how close the game is.  Those negative corners should be where the blowouts are concentrated, while the positive areas are between more closely matched teams.  It seems natural that either A) offenses play a little sloppier once a game is lopsided, or B) once the scrubs take the court, their poor shooting and execution lowers offensive efficiency.

Hopefully soon I can take a look at the numbers from other years and see if the pattern is consistent.

Tuesday, March 2, 2010

Similarity Scores: Title Contenders, Part 2

Now that we've seen the historical comps for the current AP Top 10, let's take a look at the 2010 comps for the 6 teams that I placed in the top tiers at the end of the last post: Kansas, Syracuse, Kentucky, Duke, Villanova, and Ohio State. [Again, data is from the games through March 2, 2010.]

When I first looked at 2010 comps over at UFR, the most similar team to Kansas was Kentucky, which made for a great storyline.  Since then, however, I've updated the data, and Kentucky has dropped a touch.  They're still in the top ten, but not at the #1 spot.

2010 Kansas - 2010 Comps
SCORE
YR
TEAM
91
2010
Maryland
91
2010
Minnesota
91
2010
Baylor
90
2010
Kentucky
89
2010
Texas
89
2010
Brigham Young
89
2010
Duke
89
2010
Syracuse
88
2010
Ohio St.
88
2010
Georgetown

Maryland is, unfortunately, a team I don't know much about this year.  However, I'll get a chance to watch them tomorrow while I'm wishing I could watch Kansas State @ Kansas.  Still, a score of 90 for Kentucky is reasonably similar, and they're the most similar of the elite teams.

Similarity Scores: Title Contenders

This is part 3 (or 5, depending on if you count the posts at UFR) of the Similarity Scores series. The post on how the scores are calculated is here, in case you missed it.  Tonight I'll be taking a look at which teams from the past are most similar to the current AP Top 10, and how those teams fared in the NCAA tournament.

[Data is from games through March 1, 2008.]

The Big 12 post ended up checking in at a Posnanskian length, which is usually a bad thing for anyone other than Joe, so I'm going to keep the commentary to a minimum this time.  Also, there will be a couple changes to the lists themselves.  First, I'm only including teams that made the NCAA tournament.  If they weren't good enough to make it, they can't be THAT good of a comp.  And second, I'm adding two new columns to these graphs: NCAA seed, and PASE (Performance Against Seed Expectation).  This tells to what extent each team exceeded or fell short of expectations, relative to their seed in the big dance.

2010 Syracuse - Historical Comps
SCORE
YR
TEAM
SEED
W's
PASE
90
2009
Syracuse
3
2
0.1
88
2005
Syracuse
4
0
-1.5
88
2008
Kansas
1
6
2.6
88
2006
Kansas
4
0
-1.5
87
2005
North Carolina
1
6
2.6
87
2007
Kansas
1
3
-0.4
87
2006
Florida
3
6
4.1
86
2007
Georgetown
2
4
1.6
86
2004
Providence
5
0
-1.1
86
2008
Georgetown
2
1
-1.4


Average
2.6
2.8
0.5

Pretty all-or-nothing here - 3 champs, and 3 first round upsets.  But notice that the upsets are all 4/5 seeds, meaning they may have had good numbers, but they apparently didn't take care of business as well as this year's Orangemen.  Limit it to seeds 1 through 3, and you're looking at an average of 4 wins, and a +1.3 PASE.  Or, looking at just 1 seeds, where 'Cuse expects to end up this year, we see 5 wins and a +1.6 PASE.

Monday, March 1, 2010

Similarity Scores: Big 12

I introduced team similarity scores yesterday (summary at UFR, complete description here), and today I’ll be applying them to the Big 12.  If you’re here via UFR, you’ve already seen the lists for Kansas, Kansas State, and Missouri, so you can skip down to Texas.  Everybody else, dig in.

2010 KANSAS

2010 Kansas - Historical Comps
SCORE
YR
TEAM
NCAA W's
94
2007
Kansas
3
94
2005
Louisville
4
93
2008
Kansas
6
93
2004
Cincinnati
1
93
2004
Connecticut
6
92
2008
Memphis
5
92
2007
Texas A&M
2
92
2004
Gonzaga
1
92
2009
Gonzaga
2
92
2006
Florida
6

The 2004/9 Gonzaga teams are on this list partly because they have great raw stats from beating up on the WCC.  Ditto for 2004 Cincy, as they were in Conference USA back then.

As you'd expect, there are a couple Kansas teams on here, as Bill Self has a distinct style that he's installed in Lawrence: great interior defense with lots of steals and blocks (see 2004 UConn, 2008 Memphis) and great shooting and good rebounding on offense (see 2006 Florida, 2005 Louisville).  What may be surprising to some is that last year's 2009 Kansas not in the top 10.  That's because this year's team has improved almost literally across the board.

Also, it's nice to see 3 champions and 2 other Final Four teams on there (and 2005 North Carolina was close, at 91).  You might think any team rated this highly will automatically have a bunch of great comps, but that's not quite true.  One of the closest teams in that dataset to 2010 KU, by Pythag rating, is 2005 Duke:

Similarity Scores: Method

This is comprehensive description of how I created the similarity scores used in a series of posts at Upon Further Review. If you like what you read below, please head over to UFR for more, or see all posts labelled Similarity Scores.

The first step in creating similarity scores is deciding which statistics to use. In this case I’ve decided to use Ken Pomeroy's tempo-free stats, both because he has an easily accessible database and because I think tempo-free stats are an improvement over traditional metrics. The use of his data means a couple things. First, I will be looking at teams from the 2003-04 season through today, because the stats go back no further. And second, I have about 40 categories to choose from, which is just too many (see the Scouting Report section here for the possible categories). I’ve narrowed it down by only using stats I could get in .CSV form, which eliminated the "Strength of Schedule" and "Personnel" sections. I also threw out the point distribution categories, because those are basically a function of some other categories (3PA/FGA, FTA/FGA, shooting percentages). That leaves 26 categories, which I grouped according to what skill they relate to:

Making/Preventing Shots (8 stats):
eFG% (off/def)
2P% (off/def)
3P% (off/def)
Block% (off/def)

Preventing/Causing Turnovers (4 stats):
Turnover% (off/def)
Steal% (off/def)

Rebounding (2 stats):
Off Reb% (off/def)

Making/Preventing FT's (4 stats):
FTA/FGA (off/def)
FT% (off/def)

"Style" categories (6 stats):
3PA/FGA (off/def)
A/FGM (off/def)
Defensive Fingerprint*
Adjusted Tempo

Overall Team Strength (3 stats):
Adjusted Efficiency (off/def)
Pythagorean Rating

*[kenpom.com lists this as just a category (e.g. "Mostly Man"), but the .CSV file I have shows an actual numerical value, so I used it. For a description of what goes into the value, see Ken's site]