Lost in Translation
Posted by Neil Paine on August 10, 2009
Over the past year, I've dabbled a bit in the realm of what I like to call "translating" stats -- that is to say, the process of taking a player's numbers out of one context and plopping them down in another context. Now, this doesn't necessarily mean the usual "what would Player W from Year X have averaged had he switched places with Player Y in Year Z?" strain of time-travel fantasizing, but more like, "given that Player A's averages were worth B wins in Year C, what would have had to average in Year D to create the exact same number of wins?" The difference is a nuance, a shade of meaning, but still very important, because typically we're in the business of making value judgments in the latter sense, and we leave the former to the alternate-history crowd.
This time, though, I'll make an exception. Why? Because it's fun, and it's the offseason, and besides, what else is there to talk about in August of a non-Olympic year (aside from the dregs of the UFAs and what could be an impending NBA financial crisis)? But just the same, apply these translations at your own risk.
What's the method? Well, in the past we simply adjusted for pace, then league. Trouble is, pace adjustments assume a linear increase across the board -- 20% more possessions equals 20% more rebounding chances equals 20% more turnovers... you get the idea. And worse yet, we can't calculate pace accurately for seasons prior to 1973-74, due to a galling lack of statistical tracking on the NBA's part, so we often end up having to make assumptions and estimations that tend to make everyone uncomfortable. So what's the solution, then? Well, Bill James once suggested that instead of getting hung up on park factors, why don't we worry about the player's actual context -- his team's stats and those of his opponents?
As James writes in the Historical Baseball Abstract:
"My original thinking was that by using team data, rather than league data, we would be approximating the effect of park illusions. If the park in which Del Pratt played in 1916 tended to diminish offense, then Pratt's team would tend to score or allow fewer runs than the league average; if it was a hitter's park, they would tend to score and allow more. So I would just place Del Pratt for 1916 in the context not of the American League, but in the context of the St. Louis Browns."
"But after I had been doing this for a few hundred players, I realized that it was not only an acceptable substitute [for park factors], but actually a preferable alternative. Why? Because the team is the only thing that is truly relevant to the player."
[...]
"...The construct of the 'league', in fact, has nothing at all to do with the value of what the player has accomplished. The 'league' is simply some other teams playing some other games that are utterly unconnected with the activities of this particular player."
"Though those games may be important to him, they should not be used to inform the evaluation of what he does. If Tony Pena is in St. Louis driving in three runs in a game in an attempt to help St. Louis win the pennant, he might be vitally interested in the score of a game in Chicago -- but whether the score of that game is 1-0 or 16-12 has nothing at all to do with the won/loss impact of Tony Pena's performance in St. Louis."
All of which is to say, I'm not going to deal with the league at all when making these translations. What if, instead, each team had their own FG/48 min. environment, their own TRB/48 environment, and so on? Then we could simply adjust each category by its own individual context (for instance, the 1973 Celtics' rebounding environment, or the 1971 Phoenix Suns' free-throw attempting environment), and not have to worry about pace or possessions at all.
The only catch? If we want to include all of NBA history, we have to pretend the 3-point shot never existed, especially if we're translating backwards to a season before the bonus sphere was introduced in 1980. Such is life. Still, let's put ourselves in a time machine and travel back 40 years...
The year is 1969, and the Boston Celtics are on their way to their 11th NBA championship, capping a run of 10 in 11 years. Wes Unseld is the league's MVP, while Elvin Hayes led the league in scoring with 28.4 PPG. Wilt Chamberlain snagged an NBA-best 21.1 rebounds per game, and Oscar Robertson paced the Association is assists with 9.8 a night. But what if we transported our modern players from 2009 back to 1969, dropping them into a neutral environment across the board? The new leaders might look like this:
Player PPG -------------------- Dwyane Wade 35.9 LeBron James 34.0 Dirk Nowitzki 30.0 Kobe Bryant 29.7 Kevin Durant 28.5 Elvin Hayes 28.4 Chris Paul 28.1 Al Jefferson 27.4 Tony Parker 27.4 Chris Bosh 27.4 -------------------- Player RPG -------------------- W. Chamberlain 21.1 Nate Thurmond 19.7 Bill Russell 19.3 Jerry Lucas 18.4 Dwight Howard 18.3 Wes Unseld 18.2 Elvin Hayes 17.1 Troy Murphy 15.2 David Lee 15.1 Al Jefferson 15.0 -------------------- Player APG -------------------- Chris Paul 12.8 Deron Williams 11.0 Steve Nash 10.0 Oscar Robertson 9.8 Jason Kidd 9.6 Rajon Rondo 9.1 Jose Calderon 9.0 Dwyane Wade 8.5 LeBron James 8.5 Lenny Wilkens 8.2 --------------------
Obviously, once again this is more of a fun, frivolous exercise than anything else, but it is informative insofar as it tells us more about the ways the game has changed over the past half-century. You can even download the Excel spreadsheet here, and feel free to play around with different contexts, seeing how the player stats change when you copy stats from the "Leagues" tab and paste into "RefLg".
August 10th, 2009 at 5:08 am
oh, I thought the article was about text translations ;)
(I like the "lost in translation" issue)
August 10th, 2009 at 10:34 am
Neil - How do you create a "neutral environment across the board"?
August 10th, 2009 at 12:34 pm
It would just be a theoretical environment in which an average number of FG, FGA, FT, FTA, TRB, AST, & PF were accumulated per 48 min. by the team and the opponent combined. (For instance, in 1969 those numbers were 8.68, 19.70, 4.97, 6.96, 11.32, 4.60, and 5.07, respectively. By contrast, in 2009 they were 7.37, 16.07, 3.79, 4.91, 2.19, 8.20, 4.17, and 4.18.)
August 10th, 2009 at 12:35 pm
I should also note that it's 48 "player-minutes" (or 5 * team minutes).
August 10th, 2009 at 4:16 pm
Why do you think it is that the '09 players seem to dominate the scoring and assists so thoroughly but the rebounds still belong to the '69 bigs?
Also, strictly out of fan-geek curiosity, could you average the numbers in multiple seasons to create a more general neutral environment? Would the usefulness of the data hold up when using cumulative league and player seasons? It would be interesting if we could take a long (say 50 season span) and use that to equalize any player seasons (or ranges of player seasons) for one to one comparison. If so that might be sort of a golden goose - sort of an opposite approach to all the metrics that try to measure players from disparate eras against each other by breaking down the box-score stats by estimated possessions.
August 10th, 2009 at 10:53 pm
Jason, it's a little thing called FG%.
August 11th, 2009 at 9:09 am
Dave, that's certainly true, and I'm asking a question that doesn't belong on this blog because it's not really numbers-related (if someone can find a way to answer it with numbers, I'll send him a batch of cookies). WHY did they miss so many shots in the '60s? I'd like to say pace, but the pace in the '80s was pretty high as well, and the shooting percentages then were through the roof.
Was it poor shot selection? Were star player usages lower back then, meaning the best scoring options weren't taking a high enough percentage of the shots? Was the game so physical that making a shot was a more difficult proposition? Were the skills or strategies to score over defenses still in a developmental stage and not reliable yet (that might account for why teams tried to shoot early in the clock)? My memory only goes back to the '80s (the late 80's at that), and when I talk to older guys, they never really have an answer for me. In fact they're usually pretty shocked to find out the FG% disparity exists at all.
August 11th, 2009 at 10:55 am
"in the past we simply adjusted for pace, then league... why don’t we worry about the player’s actual context — his team’s stats and those of his opponents?"
Yay, progress.
August 11th, 2009 at 11:04 am
In a neutral environment, Wilt gets the same 21.1 RPG, E gets his 28.4 PPG, and Oscar gets identical 9.8 APG? But in 1969, they didn't get these numbers in equivalent environments.
August 11th, 2009 at 12:39 pm
I recognize that, Mike, I was just inserting their actual numbers as an illustration -- a point of comparison for the 2009 guys we threw in the "time machine". As for the "yay, progress" remark, A) thanks for the unnecessary snark, and B) I take it to mean this is also how you "normalize" stats across eras?
August 11th, 2009 at 4:07 pm
Yes to B, for starters.
I'd also suppose a guy who gets 25 PPG with a high TS% would get more shots (and more points) than the guy who scores 25 with lower TS%, when put into the same environment.
How do you account for unknown opponent rebounds in 1969?
What about fewer assists (per made FG) in the '60s? Any adjustment for that?
August 11th, 2009 at 4:48 pm
I find it very hard to believe that 7 guys in 1969 could have have avg 9 or more APG.
August 11th, 2009 at 7:18 pm
In 1969, teams averaged 43.7 FG per game, of which 53% were assisted.
In 2009, of 37.1 FG/G, 56.5% were assisted.
Assuming .565 of 43.7 FG are assisted, that's another 17.8% added to everyone's 2009 assist rates. So a 7.7 APG player from 2009 could be a 9 APG guy in 1969. Assuming, perhaps, that he took his scorekeeper along on the time machine.
There were 7 players over 7.7 APG this season.