Saturday, December 11, 2010

Interaction Effects and Diminishing Returns


This continues what’s turned into an ongoing conversation between myself and Nathan Walker (aka @bbstats) of  the basketball distribution, which started with my last post, continued over at Nathan’s site, and has been supplemented on Twitter.  I started by trying to find out the effect that turnovers have had on the Michigan State offense, which I did by calculating what was essentially an opponent-adjusted version of what Nathan later more intuitively converted to:

[Pts/Possession] – [Pts/(Possessions – TO)]

This tells us how much a team’s offensive efficiency would change if their turnovers all magically disappeared.  Turnovers are a very simple case: there can be only 1 or 0 turnovers on each possession; and when there is a turnover, a team never scores on that possession.  Contrast that with rebounds: in theory, a team could gain 20 offensive rebounds in one possession, yet not score; another team could score after every single offensive rebound.  The only way I could think of to track this kind of thing is to look at play-by-play data, which can get extremely time consuming, extremely quickly.

Nathan came up with another way of looking at the efficiency impact of rebounding and the other Four Factors, though.  He published an Excel spreadsheet (in this post) that uses a regression equation to ask, for example, “What would Arizona’s predicted offensive efficiency be if we changed their eFG% to the league average value of 48.5%, and what’s the difference between that value and their actual efficiency?”


This a pretty cool set up, in that you can quickly and easily see how each of a team’s Four Factors is contributing to their success.  But because it uses a regression to come up with a single parameter for each factor, which is the same across all teams, we lose what I was trying to get at in my post about the Spartans – a turnover for MSU costs more points than a turnover for, say, Alcorn State, because even when the Braves hold onto the ball, they still average under a point per possession.

To try to address this, I used Nathan’s same idea, but I included interaction effects in my regression.  That’s a statistics term which means that besides using eFG%, TO%, OReb%, and FTRate in my equation, I also used combined terms like [OReb% x TO%] or [eFG% x eFG%].  This lets me account for the fact that increasing your eFG% makes your TO% more important (because every turnover is now costing you more points).  Stats nerds who want to see the details of the regression, including significance, can check this image to enlarge:


Only some interactions were statistically significant, so only some were included in the equation  Here they are, along with what the sign (+/-) of the parameter says about the interaction:

  • eFG% x TO% … this makes sense – the better you shoot, the more you’re losing when you turn it over.
  • eFG% x FTRate … this one is negative – the better you shoot from the field, the less important it is for you to get to the FT line.
  • eFG% x eFG% … this one surprised me – it’s negative, meaning there are diminishing returns as you increase your eFG%.  I’ve got no good explanation for this.
  • TO% x FTRate … again, the more free throws you shoot, the more points you’re losing when you commit a turnover.
  • TO% x OReb% … same story – a higher OR% means a better offense, which means you lose more when turn the ball over.
  • OReb% x OReb% … this one is positive, meaning there is an amplification effect - increasing your OReb% by 2% is more than twice as valuable as increasing it by 1%.  I’m guessing because an offensive board often leads to another shot, which leads to another opportunity for a rebound.

Note that TO% is up there in combination with all 3 other factors.  In fact, in one version of the equation, with some defensive stats included, TO% itself was not significant, but all the interactions were.  This makes perfect sense to me, as a turnover’s value really is dependent on what a team does on the non-turnover possessions.]


OK, now, the reveal, right?  Wait, one more thing.  I’m also calculating the impact a different way.  Instead of calculating Impact as

[team’s actual raw efficiency] – [team’s predicted efficiency when setting one factor to league average]

I’m calculating it as

[team’s predicted efficiency using correct four factors values] – [team’s predicted efficiency when setting one factor to league average]

The reason for this is that the predicted efficiencies when using the correct four factors values can be off by up to ~5.5 points.  That error gets added into each factor’s impact when calculating it the first way, while it gets cancelled out in the second way.  I’m listing the error in a 5th column in the spreadsheet.  At first I was tempted to label it “Intangibles Impact,” but I imagine it’s mostly just a result of the random distribution of events in a game (e.g. whether a missed FT is the front end of a 1-on-1).


Here’s a new version of Nathan’s Four Factor impact spreadsheet, using the new equation that contains the interaction effects, and using my alternate way of calculating Impact:

Excel file Four_Factors_Impact_v2.xlsx

For those of you who skipped down to this part, this shows how much of a team’s raw offensive efficiency is due to them excelling at or doing poorly in one of the Four Factors.  Here’s a one-line example showing that Kansas’s great offense is almost entirely due to their high eFG%, with a bit of help from TO%.  They’re pretty close to average in rebounding and getting to the free throw line:


Anyway, take a look, pick it apart, and leave some constructive criticism, because the only thing here I’m sure of is that it isn’t perfect.


  1. I don't know why I didn't check this before, but I am now realizing that these interaction terms don't change the values a whole bunch. They do bump some of them a whole point in some cases, but I'm not sure if that effect is worth the trouble.

  2. David, interesting. I think the four factors on each side are mostly orthogonal (i.e. not collinear), so I wouldn't expect to see much interaction. FWI, I just did some regression analysis on the FF for the NBA. You might be interested:

  3. TheCity2 - Thanks for pointing me to your site. You did a nice job presenting your concepts in a way that makes it seem relevant and understanding to somebody who's not already familiar with the subject.