Great Expectations, Part III
Posted by Neil Paine on November 4, 2008
That's right, it's all expectations all the time here. To refresh everyone's memories, on Friday we looked at a very simple way to set up preseason expectations for each team using a linear regression model with the previous season's SRS. Then, yesterday we took that same dataset and added team minute-weighted age as a variable, which helped to (marginally) improve the model's fit.
Today, we're going to work on something a commenter named Mountain suggested last week, which is the incorporation of month-by-month performance as a predictor of improvement/decline in the following season. The idea here is pretty simple: we would expect a "breakout" team in year Y (the year we're setting expectations for) to have at least given some hint of their future greatness in the previous season (year Y-1). As the assumption goes, young, improving teams should theoretically get better as the season goes on; consequently, teams that play their best in the second half (specifically the months of March/April) may be good candidates to make "the leap" in the following season.
Is this true, though? Well, just like we did when testing the significance of SRS and age in year Y-1, we can build a regression model that uses month-by-month performance as variables and see if it improves our ability to create accurate expectations for year Y. And that's exactly what I did: I took every NBA season since 1962-63 and found each team's average margin of victory/defeat for each month of the season. Out of necessity, I lumped October games in with those of November, and grouped March & April into one combined "month" (teams didn't start playing regular-season games in April until 1975). Likewise, results from the lockout-shortened 1999 season were thrown out completely, as no games were played until February. Then I regressed wins in season Y on average age and the month-by-month point differentials in season Y-1. The resulting equation:
Wins_Y = 41 - (0.592 * agaa_Y-1) + (.479 * oct/nov_Y-1) + (0.165 * dec_Y-1) + (0.297 * jan_Y-1) + (0.36 * feb_Y-1) + (0.564 * mar/apr_Y-1)
That's an interesting result, to say the least. Remember, the monthly variables all represent point differentials, so we can see the relative significance of each term simply by comparing the coefficients -- and it appears Mountain was on to something: a team's performance in the final 2 months of the season is fairly important in predicting their wins the following season. I had feared that the "tanking" phenomenon we've witnessed in recent seasons would dampen the impact of late-season games (and it may well have still done so), but we can clearly see their predictive power nonetheless. Also of interest is the fact that performance in October/November was the 2nd-most significant factor in predicting success in the next season, perhaps because each team is relatively intact in the early going -- this year's Blazers notwithstanding, in-season injuries don't typically begin to take a huge toll that early in the year.
So that's the good news: Mountain's intuitive hypothesis about month-by-month performance looks to be true in terms of predicting wins the next season. The bad news, though, is that the inclusion of this data doesn't really improve on our SRS-based model from Monday. The r-squared on that model, if you recall, was 0.4443, while the r^2 value of the regression we just performed is 0.4428 -- a better predictor than simply using the previous year's SRS alone, but not as good as a model that incorporates SRS and average age.
Why did this happen, since there is clearly a relationship between performance late in the year and success the following season? Mainly because we had to use unadjusted point differential and not SRS. Calculating simple ratings on a month-by-month basis is of questionable value because of the small sample sizes involved, so we had to use average point differential by month as variables -- and while that is a better metric than straight W%, it doesn't take into account opponent strength like the SRS does, and therefore doesn't describe a team's "true talent" as well as the SRS.
Even so, here are the biggest overachievers since 1963-64 by the month-by-month regression:
Year Team agaa_Y-1 oct/nov-1 dec-1 jan-1 feb-1 mar/apr-1 Wins xWins Diff 1998 SAS 2.22 -7.07 -5.00 -6.00 -11.21 -8.96 56.0 24.6 31.4 1980 BOS 1.23 -5.86 2.31 -7.53 -0.75 -9.48 61.0 30.0 31.0 2008 BOS -3.07 0.14 -3.67 -5.50 -5.83 -2.76 66.0 37.0 29.0 1990 SAS -1.27 -3.46 -6.50 -4.60 -15.38 -7.11 56.0 28.1 27.9 1972 LAL 1.65 5.10 3.59 2.88 5.94 -4.83 69.0 43.3 25.7 2005 PHO -2.56 -0.07 -5.47 -4.71 -2.58 -4.86 62.0 36.5 25.5 1970 MIL -0.17 -8.14 -6.37 -3.79 -0.47 -5.23 56.0 31.9 24.1 1996 CHI 0.92 1.92 6.00 4.60 -0.46 8.38 72.0 48.3 23.7 1989 PHO 0.65 -4.30 -2.40 -7.13 -3.14 -4.89 55.0 32.1 22.9 2002 NJN -0.21 -1.67 -7.87 -6.47 -2.58 -5.65 52.0 33.0 19.0
And the biggest underachievers:
Year Team agaa_Y-1 oct/nov-1 dec-1 jan-1 feb-1 mar/apr-1 Wins xWins Diff 1965 SFW -0.89 4.00 3.40 6.44 5.78 6.20 17.4 51.5 -34.1 1997 SAS 1.87 6.00 7.60 5.00 5.57 6.78 20.0 51.3 -31.3 1999 CHI 4.01 3.00 9.07 5.81 8.38 8.96 21.3 51.4 -30.0 1983 HOU 2.14 -5.53 -0.08 0.92 6.86 -0.64 14.0 39.5 -25.5 2007 MEM 1.56 4.13 3.71 1.33 -1.77 7.68 22.0 46.8 -24.8 1973 PHI 1.63 -3.26 -5.50 -0.36 -4.79 -4.40 9.0 33.3 -24.3 1985 NYK 0.86 3.18 2.64 8.75 3.44 2.78 24.0 47.9 -23.9 2008 MIA 2.74 -6.80 -3.20 0.40 4.17 0.76 15.0 37.6 -22.6 1991 DEN 2.40 5.14 3.80 -1.54 0.62 -0.04 20.0 42.4 -22.4 1998 LAC -2.18 -2.63 -4.50 -2.42 1.67 -2.82 17.0 38.6 -21.6
We're pretty much reshuffling the same teams we saw on our earlier lists at this point, although the 1997-98 Clippers join the ranks of the disappointments (who would have thought Loy Vaught, Bo Outlaw, and Malik Sealy were so important?). And as always, here are the model's expectations for this season:
Year Team agaa_Y-1 oct/nov-1 dec-1 jan-1 feb-1 mar/apr-1 xWins 2009 BOS 1.15 13.73 13.86 5.67 5.38 11.44 59.3 2009 UTA -1.46 8.82 -0.63 8.46 4.62 11.04 56.4 2009 LAL -0.09 4.13 5.14 6.93 12.53 7.48 54.7 2009 ORL -0.08 6.94 -0.20 2.93 3.77 10.73 52.6 2009 DET 1.49 5.64 14.65 0.47 9.50 6.54 52.5 2009 NOH 0.28 3.47 3.93 12.50 0.83 5.36 50.2 2009 HOU 1.48 2.00 -0.57 4.71 12.62 5.33 49.9 2009 DAL 2.12 4.94 1.73 8.14 1.64 5.65 48.6 2009 SAS 4.64 8.82 4.42 0.00 8.45 3.73 48.4 2009 PHO 2.49 5.38 5.93 6.56 -0.36 5.75 48.1 2009 TOR -0.46 4.44 -0.81 7.38 7.92 -0.44 48.1 2009 DEN 1.74 3.82 2.77 -0.33 5.77 5.54 47.4 2009 GSW -1.27 0.73 4.00 0.87 4.60 1.72 45.6 2009 PHI -1.50 -2.80 1.00 -4.40 4.69 2.83 42.7 2009 CLE 0.33 -3.24 -4.36 5.29 -0.21 0.70 40.4 2009 WAS 0.46 -0.63 4.77 -0.20 -4.43 -0.58 39.2 2009 POR -2.70 -5.31 5.53 2.50 -5.29 -1.70 38.8 2009 ATL -2.61 -2.13 1.31 -3.60 -3.79 -1.08 38.7 2009 IND -0.11 -1.76 0.40 -4.57 -4.38 1.30 38.1 2009 SAC 0.50 -4.60 -2.71 0.40 -2.57 -2.00 36.1 2009 CHI -1.04 -8.31 -0.44 -1.38 -1.77 -3.88 34.3 2009 CHA -0.45 -4.07 -6.07 -1.29 -14.67 -0.54 32.3 2009 NJN 0.58 -6.80 -3.13 -7.80 -0.15 -6.13 31.1 2009 MEM -2.32 -3.00 -6.93 -3.44 -11.67 -6.88 30.7 2009 MIN -2.13 -7.57 -9.81 -6.33 -4.92 -5.56 30.2 2009 MIL -0.97 -2.79 -9.38 -7.71 -4.82 -8.00 30.1 2009 NYK -1.09 -8.40 -8.00 -2.38 -4.62 -8.58 29.1 2009 LAC 1.78 -5.29 -4.67 -3.08 -5.64 -12.81 26.5 2009 MIA 0.62 -3.67 -6.81 -10.86 -7.08 -12.32 25.0 2009 OKC -1.39 -8.59 -3.29 -10.67 -6.83 -11.83 24.9
And, yes, I promise that this is the last post on preseason expectations for a while. But I can say that we will probably revisit these later in the season, just to see which teams overperformed and underperformed our 3 regression models.
November 5th, 2008 at 3:47 am
Thanks for the substantial follow-thru.
Appreciate the strong work.
November 6th, 2008 at 1:37 pm
Neil, what does the history say about the meaning of the different coefficients? Are they statistically significant? While the explanation for October/November makes sense, I can't see why December would be such a relatively poor predictor.
November 7th, 2008 at 12:23 am
They're all significant at the 5% level, but I don't really get it either... except that maybe performance during the midseason grind before the All-Star break isn't representative of players' and teams' true abilities. Baseball has its "dog days" in late July and August -- could the NBA's version be happening in December & January? Also, I think nagging injuries probably start to hit after a month or two of play. You play through them, but they definitely affect your performance.