Comparing Projection Systems

Posted by Justin Kubatko on January 9, 2009

Back in October, Basketball Prospectus' Kevin Pelton introduced the SCHOENE projection system. Kevin's system is quite detailed, so rather than summarize it here I would encourage you to read his article. Around the same time Kevin released SCHOENE, I came up with the Simple Projection System, or SPS. The SPS is very simple (hence the name), and is described in full in the previous link. Since we are nearing the midpoint of the 2008-09 season, I thought it would be interesting to see how these two new projection systems are faring.

In order to assess the accuracy of the projections I decided to compare a player's actual Hollinger game score per 36 minutes (GS/36) to his projected GS/36. The player pool included all players who (a) had played at least 250 minutes through January 7, 2009 and (b) had a projection in both systems. I should note that while SCHOENE provides projections for rookies, the SPS does not.

Let's start by looking at some summary statistics for the two systems:

            SCHOENE      SPS    Sample
    N           258      258       258
    Mean      11.26    10.26     10.74
    Std Dev    2.98     2.37      3.38

While the SCHOENE projections tend to be a bit high and the SPS projections tend to be a bit low, the two systems missed the sample mean by roughly the same amount. Because the SPS makes heavy use of regression to the mean, the standard deviation of the SPS predictions is smaller than the standard deviation of the SCHOENE projections.

Scatterplots of actual versus projected results for both systems suggested a linear relationship (as one would expect), so I calculated the correlation coefficients between the actual results and the projections:

            SCHOENE      SPS
    Corr      0.796    0.817

The correlation between actual GS/36 and SPS GS/36 is a bit stronger than the correlation between actual GS/36 and SCHOENE GS/36, but statistically speaking the difference is not significant.

Next I fit linear regression models using actual GS/36 as the response variable and projected GS/36 as the explanatory variable (I omitted the intercept). Two items are of particular interest here: (1) the parameter estimates for projected GS/36 (they should be close to 1) and (2) the residual standard errors.

Here are the results for SCHOENE:

            Estimate Std. Error t value Pr(>|t|)    
    schoene  0.94988    0.01096   86.64   <2e-16

    Residual standard error: 2.052 on 257 degrees of freedom

and for SPS:

        Estimate Std. Error t value Pr(>|t|)    
    sps  1.06018    0.01172   90.43   <2e-16

    Residual standard error: 1.969 on 257 degrees of freedom

Both parameter estimates are relatively close to -- but statistically different from -- 1, but SPS has a slightly smaller residual standard error. Now, the error here isn't exactly relevant, because if I added ten to all of the projections I would still get the same residual standard errors. In other words, the projections would all be way off, but the errors wouldn't change. If I forget the regression and just compute the root mean square error (RMSE) for each system, I get the following:

            SCHOENE      SPS
    RMSE      2.130    2.063

SPS is a slight winner here, although once again the difference is not particularly meaningful.

I also thought it would be interesting to look into combining these two projections to form a single projection. Nate Silver had a good idea in his 2007 Hitter Projection Roundup on Baseball Prospectus. Silver built a regression model with actual OPS as the response variable and projected OPS from various systems as the explanatory variables. The general idea was to measure which systems provide the best information, with the idea being that the systems that are both accurate and unique will carry the most weight. I'm going to do something similar here. In my regression model, actual GS/36 will be the response variable and SCHOENE GS/36 and SPS GS/36 will be the explanatory variables (no intercept will be fit). I found the results to be a bit surprising:

            Estimate Std. Error t value Pr(>|t|)    
    schoene  0.37582    0.09023   4.165 4.26e-05
    sps      0.64398    0.10057   6.403 7.23e-10

    Residual standard error: 1.909 on 256 degrees of freedom

This suggests that if you wanted to "blend" these projections to create a single projection, you would use roughly 5 parts SPS and 3 parts SCHOENE. The fact that SPS carries more weight surprised me, as it's only slightly more accurate than SCHOENE and it's not unique: SPS is just a weighted average of past results with regression to the mean and a basic age adjustment thrown in, things that should be a part of any projection system.

I was also interested in the biggest misses for both systems. Here are the five biggest over-projections by SCHOENE:

    Player             SCHOENE     SPS    Actual    Diff SCH    Diff SPS
    Ricky Davis           9.27    9.18      1.34       -7.93       -7.84
    Amare Stoudemire     22.62   17.39     16.54       -6.08       -0.86
    Brent Barry          10.86    8.01      5.32       -5.54       -2.68
    Andrew Bynum         17.83   12.32     13.28       -4.55        0.96
    Earl Watson          11.12    9.76      6.72       -4.40       -3.04

Both systems were way off for Ricky Davis, who has been horrible so far this year. SCHOENE was overly optimistic about both Amare Stoudemire and Andrew Bynum, while SPS was much closer to the mark. Let's look at the five biggest over-projections by SPS:

    Player             SCHOENE     SPS    Actual    Diff SCH    Diff SPS
    Ricky Davis           9.27    9.18     1.34        -7.93       -7.84
    Elton Brand          14.28   15.05    11.07        -3.21       -3.99
    Chuck Hayes           7.29    8.11     4.14        -3.15       -3.97
    Glen Davis            8.79    9.46     5.78        -3.01       -3.68
    Matt Carroll          7.90    8.69     5.12        -2.78       -3.57

The errors using both systems were very similar for these players. Now we'll move on to the biggest under-projections. First SCHOENE:

    Player             SCHOENE     SPS    Actual    Diff SCH    Diff SPS
    Shaquille O'Neal     11.21    9.83     16.89        5.69        7.06
    Dwyane Wade          15.80   17.25     21.13        5.33        3.89
    Nene Hilario          9.20   12.08     14.53        5.33        2.45
    Matt Bonner           7.27    9.10     11.72        4.45        2.62
    Devin Harris         14.17   11.85     18.30        4.13        6.44

Both systems came up with under-projections for all of these players. Now SPS:

    Player             SCHOENE     SPS    Actual    Diff SCH    Diff SPS
    Shaquille O'Neal     11.21    9.83     16.89        5.69        7.06
    Devin Harris         14.17   11.85     18.30        4.13        6.44
    Danny Granger        13.32   11.14     17.17        3.85        6.02
    Marcus Camby         11.27   10.21     15.21        3.94        5.00
    Zydrunas Ilgauskas   11.84   10.23     14.90        3.06        4.67

Once again, both systems came up with under-projections for all of these players.

I also looked at the projections for the top 10 players in actual GS/36. I expected SCHOENE to outdo SPS here, as SCHOENE comes up with a wider distribution of projected values, while SPS's are more clustered around the mean. Sure enough, SCHOENE did better:

    Player             SCHOENE     SPS    Actual    Diff SCH    Diff SPS
    LeBron James         22.32   18.62     22.93        0.62        4.31
    Chris Paul           20.86   17.11     21.41        0.55        4.31
    Dwyane Wade          15.80   17.25     21.13        5.33        3.89
    Kobe Bryant          18.70   16.89     19.63        0.93        2.74
    Carlos Boozer        16.80   14.23     18.60        1.80        4.37
    Dwight Howard        17.28   13.87     18.33        1.04        4.46
    Devin Harris         14.17   11.85     18.30        4.13        6.44
    Dirk Nowitzki        16.36   15.80     18.10        1.74        2.30
    Brandon Roy          15.55   12.93     17.52        1.96        4.59
    Tim Duncan           16.99   13.56     17.40        0.42        3.84

SCHOENE was better in 9 of the 10 cases, and in most cases much closer to the actual GS/36 than SPS. This suggests to me that one or more of the following could be true: (a) SCHOENE has a built-in advantage when it comes to projecting "star" players; (b) SPS has too much regression to the mean; or (c) the SPS age adjustment needs to be tweaked. Please understand, I'm not trying to make excuses: I actually hope the answer is (a).

To wrap all of this up, I compared the absolute errors for all 258 players. The system that produced the smallest absolute error for each player was given one point, which produced the following result:

    SCHOENE    115
    SPS        143

SPS was closer in 55.4% of all cases, with the caveat that SCHOENE, as shown above, does a better job projecting the top players.

Given all of the information in this post, I would have to declare SPS the mid-season champion, although it's very close, and there is still half a season to go. Hopefully I'll have time to revisit this issue at the end of the regular season.

This entry was posted on Friday, January 9th, 2009 at 2:07 pm and is filed under Analysis, Projections. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

One Response to “Comparing Projection Systems”

Regnard Says:
January 10th, 2009 at 9:51 am
Hello!

Very interesting system. I'm a sports stat geek myself. :)

« Layups: Who’s the East’s Best PG?

Layups: And the Best PG in the East is… »