Comparing Projection Systems
Posted by Justin Kubatko on January 9, 2009
Back in October, Basketball Prospectus' Kevin Pelton introduced the SCHOENE projection system. Kevin's system is quite detailed, so rather than summarize it here I would encourage you to read his article. Around the same time Kevin released SCHOENE, I came up with the Simple Projection System, or SPS. The SPS is very simple (hence the name), and is described in full in the previous link. Since we are nearing the midpoint of the 2008-09 season, I thought it would be interesting to see how these two new projection systems are faring.
In order to assess the accuracy of the projections I decided to compare a player's actual Hollinger game score per 36 minutes (GS/36) to his projected GS/36. The player pool included all players who (a) had played at least 250 minutes through January 7, 2009 and (b) had a projection in both systems. I should note that while SCHOENE provides projections for rookies, the SPS does not.
Let's start by looking at some summary statistics for the two systems:
SCHOENE SPS Sample N 258 258 258 Mean 11.26 10.26 10.74 Std Dev 2.98 2.37 3.38
While the SCHOENE projections tend to be a bit high and the SPS projections tend to be a bit low, the two systems missed the sample mean by roughly the same amount. Because the SPS makes heavy use of regression to the mean, the standard deviation of the SPS predictions is smaller than the standard deviation of the SCHOENE projections.
Scatterplots of actual versus projected results for both systems suggested a linear relationship (as one would expect), so I calculated the correlation coefficients between the actual results and the projections:
SCHOENE SPS Corr 0.796 0.817
The correlation between actual GS/36 and SPS GS/36 is a bit stronger than the correlation between actual GS/36 and SCHOENE GS/36, but statistically speaking the difference is not significant.
Next I fit linear regression models using actual GS/36 as the response variable and projected GS/36 as the explanatory variable (I omitted the intercept). Two items are of particular interest here: (1) the parameter estimates for projected GS/36 (they should be close to 1) and (2) the residual standard errors.
Here are the results for SCHOENE:
Estimate Std. Error t value Pr(>|t|) schoene 0.94988 0.01096 86.64 <2e-16 Residual standard error: 2.052 on 257 degrees of freedom
and for SPS:
Estimate Std. Error t value Pr(>|t|) sps 1.06018 0.01172 90.43 <2e-16 Residual standard error: 1.969 on 257 degrees of freedom
Both parameter estimates are relatively close to -- but statistically different from -- 1, but SPS has a slightly smaller residual standard error. Now, the error here isn't exactly relevant, because if I added ten to all of the projections I would still get the same residual standard errors. In other words, the projections would all be way off, but the errors wouldn't change. If I forget the regression and just compute the root mean square error (RMSE) for each system, I get the following:
SCHOENE SPS RMSE 2.130 2.063
SPS is a slight winner here, although once again the difference is not particularly meaningful.
I also thought it would be interesting to look into combining these two projections to form a single projection. Nate Silver had a good idea in his 2007 Hitter Projection Roundup on Baseball Prospectus. Silver built a regression model with actual OPS as the response variable and projected OPS from various systems as the explanatory variables. The general idea was to measure which systems provide the best information, with the idea being that the systems that are both accurate and unique will carry the most weight. I'm going to do something similar here. In my regression model, actual GS/36 will be the response variable and SCHOENE GS/36 and SPS GS/36 will be the explanatory variables (no intercept will be fit). I found the results to be a bit surprising:
Estimate Std. Error t value Pr(>|t|) schoene 0.37582 0.09023 4.165 4.26e-05 sps 0.64398 0.10057 6.403 7.23e-10 Residual standard error: 1.909 on 256 degrees of freedom
This suggests that if you wanted to "blend" these projections to create a single projection, you would use roughly 5 parts SPS and 3 parts SCHOENE. The fact that SPS carries more weight surprised me, as it's only slightly more accurate than SCHOENE and it's not unique: SPS is just a weighted average of past results with regression to the mean and a basic age adjustment thrown in, things that should be a part of any projection system.
I was also interested in the biggest misses for both systems. Here are the five biggest over-projections by SCHOENE:
Player SCHOENE SPS Actual Diff SCH Diff SPS Ricky Davis 9.27 9.18 1.34 -7.93 -7.84 Amare Stoudemire 22.62 17.39 16.54 -6.08 -0.86 Brent Barry 10.86 8.01 5.32 -5.54 -2.68 Andrew Bynum 17.83 12.32 13.28 -4.55 0.96 Earl Watson 11.12 9.76 6.72 -4.40 -3.04
Both systems were way off for Ricky Davis, who has been horrible so far this year. SCHOENE was overly optimistic about both Amare Stoudemire and Andrew Bynum, while SPS was much closer to the mark. Let's look at the five biggest over-projections by SPS:
Player SCHOENE SPS Actual Diff SCH Diff SPS Ricky Davis 9.27 9.18 1.34 -7.93 -7.84 Elton Brand 14.28 15.05 11.07 -3.21 -3.99 Chuck Hayes 7.29 8.11 4.14 -3.15 -3.97 Glen Davis 8.79 9.46 5.78 -3.01 -3.68 Matt Carroll 7.90 8.69 5.12 -2.78 -3.57
The errors using both systems were very similar for these players. Now we'll move on to the biggest under-projections. First SCHOENE:
Player SCHOENE SPS Actual Diff SCH Diff SPS Shaquille O'Neal 11.21 9.83 16.89 5.69 7.06 Dwyane Wade 15.80 17.25 21.13 5.33 3.89 Nene Hilario 9.20 12.08 14.53 5.33 2.45 Matt Bonner 7.27 9.10 11.72 4.45 2.62 Devin Harris 14.17 11.85 18.30 4.13 6.44
Both systems came up with under-projections for all of these players. Now SPS:
Player SCHOENE SPS Actual Diff SCH Diff SPS Shaquille O'Neal 11.21 9.83 16.89 5.69 7.06 Devin Harris 14.17 11.85 18.30 4.13 6.44 Danny Granger 13.32 11.14 17.17 3.85 6.02 Marcus Camby 11.27 10.21 15.21 3.94 5.00 Zydrunas Ilgauskas 11.84 10.23 14.90 3.06 4.67
Once again, both systems came up with under-projections for all of these players.
I also looked at the projections for the top 10 players in actual GS/36. I expected SCHOENE to outdo SPS here, as SCHOENE comes up with a wider distribution of projected values, while SPS's are more clustered around the mean. Sure enough, SCHOENE did better:
Player SCHOENE SPS Actual Diff SCH Diff SPS LeBron James 22.32 18.62 22.93 0.62 4.31 Chris Paul 20.86 17.11 21.41 0.55 4.31 Dwyane Wade 15.80 17.25 21.13 5.33 3.89 Kobe Bryant 18.70 16.89 19.63 0.93 2.74 Carlos Boozer 16.80 14.23 18.60 1.80 4.37 Dwight Howard 17.28 13.87 18.33 1.04 4.46 Devin Harris 14.17 11.85 18.30 4.13 6.44 Dirk Nowitzki 16.36 15.80 18.10 1.74 2.30 Brandon Roy 15.55 12.93 17.52 1.96 4.59 Tim Duncan 16.99 13.56 17.40 0.42 3.84
SCHOENE was better in 9 of the 10 cases, and in most cases much closer to the actual GS/36 than SPS. This suggests to me that one or more of the following could be true: (a) SCHOENE has a built-in advantage when it comes to projecting "star" players; (b) SPS has too much regression to the mean; or (c) the SPS age adjustment needs to be tweaked. Please understand, I'm not trying to make excuses: I actually hope the answer is (a).
To wrap all of this up, I compared the absolute errors for all 258 players. The system that produced the smallest absolute error for each player was given one point, which produced the following result:
SCHOENE 115 SPS 143
SPS was closer in 55.4% of all cases, with the caveat that SCHOENE, as shown above, does a better job projecting the top players.
Given all of the information in this post, I would have to declare SPS the mid-season champion, although it's very close, and there is still half a season to go. Hopefully I'll have time to revisit this issue at the end of the regular season.
January 10th, 2009 at 9:51 am
Hello!
Very interesting system. I'm a sports stat geek myself. :)