Alternatives to the BBR Rankings: The Maximum Likelihood Method
Posted by Neil Paine on January 13, 2010
If you didn't catch it the first time around (since I know all of you read the PFR Blog now that I occasionally post over there), I highly recommend that you check out this series of posts that Doug Drinen wrote about various computer ranking systems and the methods behind them:
A very simple ranking system (If you ever wondered where the ubiquitous SRS comes from, this is it)
Another ranking system
Another rating system: maximum likelihood
That last link is the topic I wanted to talk about today.
Every Friday in the BBR Rankings, I combine a pure won-lost rating with a strength of schedule component that factors in the point margin of each game. I combine the two this way because I think it's fair -- it rewards teams for wins and doesn't give undue credit to blowouts, while still acknowledging that the best indicator of a team's "true" strength is still its margin of victory/defeat. However, logically and mathematically, this method is not exactly the most rigorous one in the world. By combining the two elements in a somewhat arbitrary fashion, the aim of the rating is not crystal clear -- it's certainly not predictive (nor is it intended as such), but while I say it's retrodictive, it's not purely that either, because it does combine elements of predictiveness.
Obviously I'm still going to post them every week, but I also wanted to show you an alternative method that is the most purely retrodictive possible rating. It's called "maximum likelihood", and it seeks to find the set of team ratings that maximize retrodictive accuracy in the past.
Think about the way the season has progressed so far, starting with last night's game between Orlando and Sacramento. The Magic beat the Kings, which is a data point for any rating system to work with, and it implies that Orlando is better than Sacramento. Therefore, all else being equal, the system would seek to create a rating that ranked Orlando ahead of Sacramento. However, all else is not equal -- Orlando also has lost this season to Indiana, Washington, Utah, & Oklahoma City, all of whom Sacramento has beaten. Because the computer can't find a perfect ranking based on 100% internal consistency in the past, it can only maximize the rate at which it correctly retrodicts game results. It does this by establishing the probability of each win, and then multiplying these probabilities together for the entire season, producing the likelihood that, given a certain set of ratings, the season would have played out exactly the way it has in real life. In essence, we want to try different combinations of ratings until we maximize that likelihood, hence the name of the method.
If you want to know the math, for each game we'll assume the probability of the home team winning is p(hW) = exp(rH - rA + HC)/(exp(rH - rA + HC) + 1), where rH = the home team's rating, rA = the away team's rating, and HC = a home-court advantage term. When the home team wins in real life, the "likelihood" of the result is p(hW). When the road team wins, the likelihood is (1 - p(hW)). Now, instead of taking the product of the quotients involved, you can work with the natural logarithms of each individual game probability, and sum them for the entire season. The set of ratings that maximizes that sum is the set that best retrodicts the past. (If you have Excel, you can use the Solver tool to do this, telling it to maximize the sum of the natural logs by changing the team ratings and the home court term while keeping the sum of all ratings equal to zero.) This season, you get these ratings from the maximum likelihood method:
Team | Rating | W | L | WPct |
---|---|---|---|---|
CLE | 1.32637 | 30 | 10 | 0.750 |
LAL | 1.22496 | 29 | 9 | 0.763 |
BOS | 1.07509 | 26 | 10 | 0.722 |
DAL | 0.98798 | 25 | 12 | 0.676 |
ATL | 0.81961 | 24 | 13 | 0.649 |
ORL | 0.81689 | 26 | 12 | 0.684 |
PHO | 0.80430 | 24 | 14 | 0.632 |
HOU | 0.63461 | 21 | 17 | 0.553 |
DEN | 0.50077 | 24 | 14 | 0.632 |
SAS | 0.46512 | 23 | 13 | 0.639 |
POR | 0.44561 | 23 | 16 | 0.590 |
OKC | 0.40487 | 21 | 16 | 0.568 |
UTA | 0.29150 | 21 | 17 | 0.553 |
NOH | 0.21510 | 19 | 17 | 0.528 |
MEM | 0.17557 | 19 | 18 | 0.514 |
TOR | 0.11373 | 19 | 20 | 0.487 |
MIA | 0.08994 | 18 | 18 | 0.500 |
LAC | -0.12693 | 17 | 19 | 0.472 |
CHA | -0.17451 | 17 | 19 | 0.472 |
CHI | -0.31822 | 16 | 20 | 0.444 |
MIL | -0.42008 | 15 | 20 | 0.429 |
SAC | -0.42290 | 15 | 22 | 0.405 |
NYK | -0.55804 | 15 | 22 | 0.405 |
DET | -0.69794 | 12 | 25 | 0.324 |
GSW | -0.78356 | 11 | 25 | 0.306 |
PHI | -0.81500 | 12 | 25 | 0.324 |
WAS | -0.83306 | 12 | 24 | 0.333 |
IND | -0.96396 | 12 | 25 | 0.324 |
MIN | -1.55545 | 8 | 31 | 0.205 |
NJN | -2.72235 | 3 | 34 | 0.081 |
HCA | 0.61325 |
(Note: No, they don't all add to zero, but the solver will find the solution that both maximizes the sum of the natural logs and gets the average as close as possible to zero.)
So these are the ratings that best "retrodict" the past. They are only concerned with past wins and losses (even the SOS adjustment and the HCA term is based purely on W-L), which is the polar opposite of the SRS, which only concerns itself with point differential and is chiefly interested in predicting future outcomes. And the BBR Rankings, I suppose, are a hybrid of both approaches. As always, the approach that's best depends on the philosophical goal you're trying to achieve with the rankings.
January 13th, 2010 at 3:24 pm
Yay for MLE :)
It might have just been easier to say that this is a logistic regression, but I like getting people to think about the idea of maximum likelihood.
January 14th, 2010 at 1:39 am
Neil, I know this isn't the appropriate place to post this question, but I'm on the run. I'll get around to reading & responding to this blog tonight or tomorrow.
I've noticed an increase, especially on BBR's blog, of measuring a player's greatness/talent by Win Shares. Win Shares, to my understanding, is incredibly team-reliant. Wouldn't it make more sense, perhaps, to measure a player's individual ability and impact by WS%? Adding a WS% column to BBR's advanced stats, IMO, would be beneficial. If Jordan's team won 50 games, while James' won 66, yet both have comparable WS then the disparity should be made readily available in an alternative form of WS (in this case, WS%).
Cheers.
Unless of course, my understanding of WS is completely wrong.
January 14th, 2010 at 2:44 am
I love the use of MLE here. I think it would be interesting to use MLE with margin of victory instead of the binary win/loss. I doubt the results would be significantly different but it would be interesting none-the-less.
Keep up the good work.
January 14th, 2010 at 9:17 am
Ryan wrote:
I've noticed an increase, especially on BBR's blog, of measuring a player's greatness/talent by Win Shares. Win Shares, to my understanding, is incredibly team-reliant.
This is a common misperception. Two players (playing on different teams) with the same playing time, same number of possessions, same offensive rating, and same defensive rating will have the same number of Win Shares. Now, defensive rating does have a team component, but all in all I would not call Win Shares "incredibly team-reliant". More details are available here.
January 14th, 2010 at 11:02 am
According to this page:
http://www.basketball-reference.com/players/t/turkohe01.html
Hedo Turkoglu, from '03 in Sac, to '04 in SA, to '05 in Orl, had his DRtg go from 102 to 94 to 110; his DWS from 1.5 to 4.5 to 1.0.
DWS per 484 minutes (.50 = avg) moved from .62 to 1.05 to .28 in this time, while his OWS/484 rose steadily from .47 to .52 to .60 .
Last year to this year, his DWS/484 has gone from .76 to .04, Orl to Tor.
This is "incredibly team-reliant".
Meanwhile, what about a column for WS per X minutes?
January 14th, 2010 at 12:22 pm
Mike, I don't really see how you can properly factor in defense as half of the game (as WS does) and not have fluctuations like that when a player moves from a good defensive team to a bad one. How does eWins handle defense? If it doesn't make a team adjustment at some point, it's essentially saying, "100% of a player's defensive ability is described by his blocks, steals, and DReb." I don't know how Justin feels, but I'd rather be "wrong" on a few players by adjusting the defensive component of the stat for the team's defensive rating than claim that blocks, steals, and DReb describe 100% of a player's defensive contribution.
January 15th, 2010 at 7:56 am
"How does eWins handle defense? If it doesn't make a team adjustment at some point, it's essentially saying, '100% of a player's defensive ability is described by his blocks, steals, and DReb.'"
eWins makes team adjustments all along the process. Points and assists are scaled to opponent points, rebounds to opp. reb.
Monta Ellis is averaging 26 PPG, but for a team that allows 112 PPG. So, that 26 Pts are just (100/112) 89% as much a contribution as they'd be for an avg team. For Hou he might be expected to avg 23; for Cha, perhaps 21.5.
eWins doesn't bother to distinguish between offense and defense per se. Rather, productivity is scaled to 'rest of the league' performance in the games a player is in : vs GSW, in Ellis' case.
April 7th, 2010 at 7:31 pm
How would one go about translating the power ratings given above to points? Expressing power ratings as points or wins or what-have-you... something real... seems to be more understandable than using a dimensionless number.
The idea I had was to find the league-wide HCA, and then translate the power ratings into points based on that. So if the HCA league-wide is 3 points, then, for example, Cleveland's rating is +6.49 points.
Also, I know that Neil said the ratings don't add to zero exactly, but if the sum is as close to zero as it is (0.00002), it's not worth worrying about.