The Audacity of Hoops: Similarity Scores: Method

This is comprehensive description of how I created the similarity scores used in a series of posts at Upon Further Review. If you like what you read below, please head over to UFR for more, or see all posts labelled Similarity Scores.

The first step in creating similarity scores is deciding which statistics to use. In this case I’ve decided to use Ken Pomeroy's tempo-free stats, both because he has an easily accessible database and because I think tempo-free stats are an improvement over traditional metrics. The use of his data means a couple things. First, I will be looking at teams from the 2003-04 season through today, because the stats go back no further. And second, I have about 40 categories to choose from, which is just too many (see the Scouting Report section here for the possible categories). I’ve narrowed it down by only using stats I could get in .CSV form, which eliminated the "Strength of Schedule" and "Personnel" sections. I also threw out the point distribution categories, because those are basically a function of some other categories (3PA/FGA, FTA/FGA, shooting percentages). That leaves 26 categories, which I grouped according to what skill they relate to:

Making/Preventing Shots (8 stats):
eFG% (off/def)
2P% (off/def)
3P% (off/def)
Block% (off/def)

Preventing/Causing Turnovers (4 stats):
Turnover% (off/def)
Steal% (off/def)

Rebounding (2 stats):
Off Reb% (off/def)

Making/Preventing FT's (4 stats):
FTA/FGA (off/def)
FT% (off/def)

"Style" categories (6 stats):
3PA/FGA (off/def)
A/FGM (off/def)
Defensive Fingerprint*
Adjusted Tempo

Overall Team Strength (3 stats):
Adjusted Efficiency (off/def)
Pythagorean Rating

*[kenpom.com lists this as just a category (e.g. "Mostly Man"), but the .CSV file I have shows an actual numerical value, so I used it. For a description of what goes into the value, see Ken's site]

Next, I stole a page from Ken's book and converted all of the stats into Z-scores. This tells us how many standard deviations above or below average a team is in each category. For most offensive categories the best team is somewhere near 3.0, the worst is near -3.0, and the average is always 0.0. To calculate how "different" two teams are in a single category, I simply find the difference of their Z-scores.

The next step is to decide how to weight these differences. The simplest method would be to just sum them all, but there are two problems with that. First, some categories are clearly less important than others. And second, when finding similar teams ("comps"), I want to focus on categories where the featured team is much better or worse than normal. To give an example, 2010 Kansas State gets to the FT line more frequently than any other team in the entire 7-year sample. When I look for teams that are similar, I want to put particular emphasis on that category, because it’s one of their defining characteristics. So, I decided to weight each category by a factor that is the product of two separate values:

[Overall Weight] = [Category Importance] x [Team Relevance]

To make Team Relevance relatively objective, I’m using the absolute value of the team's Z-score in the category in question, with one restriction: Team Relevance can be no lower than 0.5. (Without this restriction, categories where a team is average would not count at all, as their Z-score would be 0.)

Now, Category Importance. This is where it gets waaaaay more subjective. I played around with this a lot, until I finally arrived at a set of weights where the comparisons pass the sniff test AND I can semi-justify the weights themselves, philosophically. Let's show that list of categories again, except this time I'll include the individual weights I've settled on, along with the total weight for each section. And I'll re-order it, so the sections are in decreasing order of importance:

"Style" categories (4.5 pts):
Defensive Fingerprint ... 0.5**
Adjusted Tempo ... 1
3PA/FGA (off) ... 1
3PA/FGA (def) ... 0.5
A/FGM (off) ... 1
A/FGM (def) ... 0.5

Overall Team Strength (3 pts):
Pythagorean Rating ... 1
Adjusted Efficiency (off) ... 1
Adjusted Efficiency (def) ... 1

Making/Preventing Shots (3 pts):
eFG% (off) ... 0.5
eFG% (def) ... 0.5
3P% (off) ... 0.5
3P% (def) ... 0.5
2P% (off) ... 0.5
2P% (def) ... 0.25
Block% (off) ... 0
Block% (def) ... 0.25***

Preventing/Causing Turnovers (2 pts):
Turnover% (off) ... 1
Turnover% (def) ... 0.5
Steal% (off) ... 0
Steal% (def) ... 0.5****

Rebounding (2 pts):
Off Reb% (off) ... 1
OffReb% (def) ... 1

Making/Preventing FT's (1 pt):
FTA/FGA (off) ... 0.25
FTA/FGA (def) ... 0.5
FT% (off) ... 0.25
FT% (def) ... 0*****

**[roughly half of the "Defensive Fingerprint" score comes from 3PA/FGA and A/FGM, which is why the defensive versions of those are weighted less than the offensive versions.]

***[I split the weight for defensive 2P% in half and gave the other half to blocks because when I looked at the best teams in terms of defensive 2P%, those with lower Block% seemed to be teams who faced a weaker schedule and had a worse adjusted defensive efficiency. This made me think that the blocks were an important clue for which teams were actually causing low 2P%, and which were merely benefitting from bad-shooting opponents. I didn't do the same for offensive 2P% and Block% because it didn't seem to correlate as well. This is something I want to look at in more detail later.]

****[I gave credit for defensive Steal%, but not offensive Steal% merely because fans and commentators focus on steals from the perspective of the defense. Nobody ever says "Team A allows their opponents to steal it too much." It's always "they commit too many turnovers."]

*****[I split offensive FT weights into FTA/FGA and FT%, but didn't do the same for defense, because generally a team doesn't control how well their opponents shoot from the line. They can force the issue a bit by fouling bigs more than guards, but I suspect the effect is small.]

Now that we're past all the asterisks, here's where you guys rip me to shreds for manipulating the data. OK, so it’s not quite that bad, but I’ll admit that I haven’t found a way to eliminate the subjectivity. I tried tying the weights to how well the stat correlated to Pythagorean rating, or to NCAA tourney wins, but then I ended up with 0 weight on style-related categories like 3PA/FGA. Any suggestions for how to make this less subjective are welcome.

Anyway, so now we have the weights, we have the teams, we have the stats. The last thing I did was just a simple normalization so that the similarity between a theoretical "best at everything" team and a "worst at everything" team would be 0, and the similarity of a team with itself is 100. There are no “best” and “worst” at everything teams, so in the end the most dissimilar teams usually end up in the 40s and 50s.
Now that you know the process, head (back) over to UFR and check out the comps for some KU, KSU, and MU, or see my HTB post on the rest of the Big 12 teams. Also, use the comments here for discussion and criticism of the method, or to make a request for a team you’d like to see the comps for. Any team from the 2003-04 season through today is possible.

[NOTE: After basically finishing my method, I found some discussion of others doing the same thing, only with fewer categories. If you like this post, you should check out these others. See here and here.]

Where The Buck Stops

Founding Fathers (Recommended Reading)

The Will Of The People (Popular Posts)

Past Indiscretions (Archive)

Subjects To Debate

Monday, March 1, 2010

Similarity Scores: Method

No comments:

Post a Comment