Great Expectations, Part II
Posted by Neil Paine on November 2, 2008
On Friday, in an effort to establish preseason "expectations" for each team, we built a very simple model for projecting future performance at the team level. We included both W-L records and SRS scores from the 5 previous seasons in a linear regression, and we discovered that in both cases the only past season that's significant at the 5% level is the year directly before the one we're trying to predict (year "Y-1"). We also found that past SRS scores (which are essentially average scoring margins, but adjusted for strength of schedule) better predict future W-L than past W-L do -- which just confirms what people like John Hollinger have been saying for a long time. Finally, using our regression model, we ranked the biggest positive ('08 Celtics) and negative ('99 Bulls) surprises in NBA history, and showed what the model predicted for the current season as well.
However, the model's fit (r^2 of 0.44) still left something to be desired, and two commenters (Ben and Mountain) had some suggestions that could potentially make our set of "expectations" more accurate. Ben said:
One variable to consider adding would be the previous year’s win share weighted average age. That could be one number that might pick up the direction a team’s headed in.
And I think that's a pretty good idea. After all, if two teams (one old and one young) had similar SRS scores in a season, you would naturally expect the younger team to improve the next season and the older one to decline -- but our old model would make no distinction between the two teams. So let's add minute-weighted (not Win Share-weighted, for reasons explained here) team age into the mix as a variable.
Then Mountain, always a valuable source for fresh ideas, had this to say:
I wonder what you’d find if you weighted previous year performance by month in a way that gave somewhat greater weight to later months.
I like this thought too, because intuitively you would expect a young/"up-and-coming" team to improve from month to month, while an old team might clue us in to an impending collapse if their performance declined over the course of the season.
Anyway, over the next two posts, let's put both of these suggestions to the test and see if they help us build a better system. The model I'll create today is one that basically extends our model from Friday -- instead of just using SRS from the previous season (SRS_Y-1) as our lone variable, we'll also add the previous year's minute-weighted age minus the league's average age (agaa_Y-1) to the equation. Regressing those variables on wins in year Y for every season since 1963, when the NBA started tracking split-season stats for traded players, we get this equation:
wins_Y = 41 + (1.956 * SRS_Y-1) - (0.549 * agaa_Y-1)
Both of these variables are significant at 5%, so Ben's intuition was correct: adding age as a variable does in fact improve the model's predictive power. Unfortunately, it doesn't really improve it that much -- the r-squared value for our original equation was 0.4404, while the r^2 for this new model is 0.4443. No matter, though, here are the new Top 10 surprise teams:
Year Team srs_Y-1 agaa_y-1 xWins Wins Diff 1998 SAS -7.926 2.22 24.3 56.0 31.7 2008 BOS -3.706 -3.07 35.4 66.0 30.6 1980 BOS -4.775 1.23 31.0 61.0 30.0 1990 SAS -7.450 -1.27 27.1 56.0 28.9 2005 PHO -2.941 -2.56 36.7 62.0 25.3 1970 MIL -5.067 -0.17 31.2 56.0 24.8 1989 PHO -4.801 0.65 31.3 55.0 23.7 1996 CHI 4.311 0.92 48.9 72.0 23.1 1972 LAL 3.264 1.65 46.5 69.0 22.5 2002 NJN -5.303 -0.21 30.7 52.0 21.3
And our new biggest disappointments:
Year Team srs_Y-1 agaa_y-1 xWins Wins Diff 1965 SFW 4.390 -0.89 50.1 17.4 -32.7 1997 SAS 5.975 1.87 51.7 20.0 -31.7 1999 CHI 7.244 4.01 53.0 21.3 -31.6 2007 MEM 3.738 1.56 47.5 22.0 -25.5 1983 HOU -0.393 2.14 39.1 14.0 -25.1 1973 PHI -3.441 1.63 33.4 9.0 -24.4 1985 NYK 3.789 0.86 47.9 24.0 -23.9 1991 DEN 1.562 2.40 42.7 20.0 -22.7 2008 MIA -1.209 2.74 37.1 15.0 -22.1 1998 TOR -2.555 -2.54 37.4 16.0 -21.4
So the age variable does make a difference, but we're seeing basically the same teams in both lists, albeit in a slightly different order. And the two newcomers, the 2002 Nets and the 1998 Raptors, fit the existing schemata we laid out on Friday -- New Jersey added a superstar in Jason Kidd, while all of the Raps' best players from Y-1 simultaneously had bad years. For curiosity's sake, this is how our new model sets the expectations for the 2008-09 season:
Year Team srs_Y-1 agaa_Y-1 xWins 2009 BOS 9.307 1.15 58.6 2009 LAL 7.344 -0.09 55.4 2009 UTA 6.867 -1.46 55.2 2009 DET 6.671 1.49 53.2 2009 NOH 5.464 0.28 51.5 2009 ORL 4.788 -0.08 50.4 2009 PHO 5.138 2.49 49.7 2009 HOU 4.835 1.48 49.6 2009 DAL 4.702 2.12 49.0 2009 SAS 5.104 4.64 48.4 2009 DEN 3.739 1.74 47.4 2009 GSW 2.381 -1.27 46.4 2009 TOR 2.469 -0.46 46.1 2009 PHI 0.188 -1.50 42.2 2009 POR -0.520 -2.70 41.5 2009 CLE -0.525 0.33 39.8 2009 WAS -0.605 0.46 39.6 2009 ATL -2.228 -2.61 38.1 2009 IND -1.864 -0.11 37.4 2009 SAC -1.854 0.50 37.1 2009 CHI -3.191 -1.04 35.3 2009 CHA -4.484 -0.45 32.5 2009 MEM -5.752 -2.32 31.0 2009 NJN -5.146 0.58 30.6 2009 MIN -6.254 -2.13 29.9 2009 NYK -6.543 -1.09 28.8 2009 MIL -6.912 -0.97 28.0 2009 LAC -6.561 1.78 27.2 2009 SEA -8.037 -1.39 26.0 2009 MIA -8.530 0.62 24.0
Tomorrow, we'll depart from the SRS for a bit, but we'll still use average point margin and age as variables, and we'll also try to incorporate Mountain's idea about month-by-month performance into our model. Stay tuned!
November 3rd, 2008 at 12:05 pm
Wow, I didn't expect it to improve r^2 by much, but thought it might do better than that. :) I don't suppose quadratic or cubic terms would make much of a difference. Also, the effect with minutes is so small, I don't imagine a performance weighted average would make much of an improvement either. Nonetheless, I like the effect on the 2009 predictions - it does what you'd want with Phoenix, Dallas, and San Antonio.