Thoughts On the 2010 MIT Sloan Sports Analytics Conference
Posted by Neil Paine on March 8, 2010
On Saturday, I had the distinct honor and privilege of attending the MIT Sloan Sports Analytics Conference in Boston, where a virtual "who's who" of basketball analysts were on hand to listen to panelists that ranged from Daryl Morey, Mark Cuban, and Kevin Pritchard to John Hollinger, Dean Oliver, and even Bill Simmons. It was such a great experience and a thrill to meet many of my fellow APBRmetricians, so here are some of my hoops-related impressions from the conference:
- Cuban was definitely the star of the event, which I'm told was nothing new. He was featured in the most high-profile panel discussion, "What Geeks Don't Get: The Limits of Moneyball," and mentioned how he wished the NBA would officially track events like deflections and tipped balls so that his employees didn't have to waste time and money doing it themselves. This is something most of us have wished for, and it's not the first time Cuban has mentioned the need for it, but it was still nice to hear that he's pushing the powers that be for some action in that area.
- When asked if taking the tracking responsibility away from teams and giving everyone the same information would erode a possible competitive advantage for the teams that were the best at tracking, Cuban didn't really see that as a downside. His view was that those events are basically facts, so it's what you do to analyze them after the fact that gives you the competitive advantage, not the act of tracking itself. Plus, an official league-wide database of new stats would make things consistent across teams in terms of the definition of a "deflection", for instance, so it would be easier to look at the numbers for personnel decisions involving players to acquire from other teams. From Cuban's perspective, there wasn't really any downside to having the NBA track those events.
- Plus-minus was a recurring topic, and as Dean Oliver was quick to point out, from a media perspective the distinction needs to be clearer between "Plus-Minus" (which is a raw fact about what happened on the floor) and "Adjusted Plus-Minus", which is a statistical manipulation of the raw +/- data. Cuban was excited about what the latter meant in terms of lineup evaluation, saying that he could tell which teams used advanced stats and which didn't simply by looking at who their most frequent lineups were -- some teams put out combinations where Cuban said he understood why the coaches thought it would be good for the team, but the +/- numbers showed that the team was getting hammered (i.e., never outscoring the opposition) when that group was on the floor. As we get the sense more and more that APM at the player level is plagued by considerable noise, I have a feeling that lineup +/- is going to be emphasized more, especially since you don't have to parse out who did what when you're looking at the full 5-man unit.
- Simmons was pretty stat-friendly on the whole, which impressed me but didn't really surprise me from what I've read in the past. He said his biggest issue with stats was that it was difficult for the everyday fan (i.e., Simmons' dad or even B.S. himself) to understand what the numbers meant and relate that to what happens on the court/field. I agree, and sometimes people like me become blind to the fact that much of the tacit knowledge we take for granted isn't a given for the majority of fans simply because they don't have a statistical background. So he's right that it falls to us to make some of this stuff more relatable to the average fan, someone who knows a lot more about basketball than he/she knows about linear regression.
- I also liked Simmons' comments about certain moves being made because of chemistry, which is one of the "next frontiers" for analytics. James Harden was his example -- he felt like OKC drafted Harden over Tyreke Evans in part because Harden had good character/intangibles, and Evans had question marks in that area. Plus, and this is my opinion, part of the chemistry of that decision is the realization that Evans is a high-usage player and OKC already has one of those who ranks among the NBA's best. So evaluating players based on team fit is one of those things we're still working on; just because Evans is the better player and has the higher PER, etc., he might not have been as good for OKC's specific situation as Harden is.
- There was an interesting discussion about whether clutch performance was something to be considered when making personnel decisions, and Morey & Cuban were actually split on the matter. Cuban said one of the motivations behind the Jason Kidd trade (which, incidentally, Simmons had to concede has worked pretty well for Dallas after he ripped them at the time) was the idea that Kidd's Winval (or whatever they use now) was higher in clutch situations than in the rest of the game. Morey said he couldn't trust clutch stats enough to base decisions on them, ostensibly because the sample sizes are so small that randomness appears to rule even if players do have differing levels of clutch skill.
- John Huizinga and Sandy Weil presented during lunch on the value of blocked shots ("from Tim Duncan to Dwight Howard", so named because the former's blocks are so much more valuable than the latter's). I found this very interesting because anecdotally, we've all heard about Bill Russell tipping the ball to teammates instead of swatting it out of bounds, and Huizinga/Weil found significant differences along those lines. They calculated expected points based on the initial conditions of a possession for both the blocked team and the blocker's team after the block, and evaluated each shot-blocker based on how often their blocks saved points on D and led to points on offense. Duncan almost never goaltends and tips a lot of balls to teammates, making his blocks highly valuable (Theo Ratliff's blocks were also extremely valuable); Howard goaltends a lot and swats the ball out of bounds, making his net points per block significantly lower than someone like Duncan.
- Dean Oliver once recounted a story about how the Sonics noticed a statistical hole in the Spurs' shot defense from midrange and attacked it in the playoffs, giving San Antonio a tougher series than people expected. Well, apparently this kind of analysis is highly prevalent now across many of the stat-oriented teams: Cuban and Pritchard talked about how Juwan Howard's game-winning shot vs. the Mavs on January 30 was actually influenced by the same mesh of scouting and stat analysis. From the numbers, Dallas knew Howard's chances of making a midrange jumper were extremely low, so they consciously decided to concede that shot to him and take away Portland's more high-percentage plays. The fact that it didn't work was the exception that proved the rule for Cuban, who lamented that it was "the only 15-footer he’s hit this year." (Pritchard then jokingly corrected Cuban by pointing out that Howard actually has hit two 15-footers.)
- I didn't see the paper being presented, but Sandy Weil told me later about a study being done on "swallowing the whistle" by officials -- in other words, omission bias. As it turns out, there's some pretty strong evidence that officials tighten up and become conservative under pressure as well, often choosing to abstain from making a key (and often correct) game-changing call so that the outcome won't rest on his or her shoulders. In baseball you see this with ball-strike counts at the extremes: on 3-0, the strike zone becomes huge because the umpire wants to give the pitcher a chance to climb out of the hole, and vice-versa on 0-2 counts, where the zone becomes very small. In basketball, officials' rates of calling discretionary fouls and plays on loose balls becomes less as the game goes on, culminating in the final 5 minutes of the game, where referees are hesitant to make any call because a bad split-second decision could mean the difference in the game. Ironically, by "letting them play", refs are actually contributing more (and more adversely) to the outcome than if they called the game normally, but the incentive for them is that the fans/media generally won't single any of them out for a call they didn't make (and should have) like they would for a bad call they did make.
- Across all sports, two items stood out to me as recurring themes: psychological/personality evaluations, and predicting injuries. There's still so much we don't know about both areas, and analytics can try to help sort out the mess that both are currently in. 49ers Vice President Paraag Marathe quipped that to judge future football players, we make them play track & field (the combine), which would be like judging future baseball players on ping-pong... But the panelists did concede that data like combine numbers could actually be used to predict the durability of a player's body, if only we knew what to track and how to put it all together. The baseball panel was concerned with pitcher injuries (who isn't?), and Dan Duquette talked about a medical team that analyzed the key points in a pitcher's throwing motion to determine how likely he was to get injured based on previous pitchers with similar deliveries. Basketball doesn't have anything quite so scientific, but there still could be applications on the analytics front about maximizing practice time and workout regimens to reduce the risk of injury.
- Psych evals are interesting for basketball because personality is important in a game where all 5 people have to work together for a common goal. Kevin Kelley is a high school football coach who famously decided never to punt, but in the coaching panel he talked about how football coaches have the luxury of putting the ball in the hands of their best decision-maker (the QB) and letting him choose the final outcome of every play. In basketball, though, he said it was too fluid, too constantly changing, and too dependent on teamwork to take a similar approach. Certainly football takes a lot of teamwork as well, but everyone has a rigidly-defined role, while in basketball the roles change on the fly, and every player has to eventually make a decision for the team. Most panelists wanted to use psych testing to find the players who could handle those moments best, as well as the players who had true "makeup" (courage, unselfishness, mental toughness, and leadership) rather than simply those who said "yes, sir" and "no, sir" to scouts.
- Avery Johnson was entertaining as a member of the coaching panel, and he talked about some of his interactions with Wayne Winston and the Winval system. In 2005, the numbers said Jon Barry in a small lineup was having a field day against Dallas' bigger units, so Avery went small to counteract that, and Dallas came back from down 2-0 to win the series. Then, in the infamous Dallas-Golden State series from 2007, Avery was told that Golden State killed Dallas when Erick Dampier was at C, so for Game 1 he tried a new lineup with Dirk Nowitzki at the 5... and lost. Avery felt like the move hurt the Mavs' confidence, so he went back to his old lineup for the rest of the series. They won Game 2 that way, but eventually lost the series; to this day Avery says going small was the right move, but he was swayed by the loss and other factors. If they had stuck to the original plan, they may very well have ended up winning the series.
- Another paper presentation I missed but heard about later was Brian Skinner's work on "The Price of Anarchy". The idea is that teams will actually do best when their best player doesn't completely dominate the offense and his teammates shoot more. This goes with the concept of skill curves, which we've discussed at length before, but the Nash Equilibrium (named after John, not Steve) is actually a lot lower level of usage for the star than we think. As it turns out, superstars like Ray Allen (and ostensibly Kobe Bryant, LeBron James, etc.) should be taking nowhere near 30% of the shots when on the floor -- instead, they should basically take the same rate as their teammates, or 20%! This corresponds to other real-world scenarios like traffic modeling, where the most efficient road would cease to be so if everyone used it; or as Skinner puts it, "the global optimum solution is the one where the roads are used equally, even though they are not of equal quality." It's a fascinating study, and you can read the whole paper here.
- Joe Sill won the prize for best non-academic paper for his groundbreaking Regularized Adjusted Plus-Minus work, which means you should check out the research at his site if haven't already.
- The Dallas guys threw Gerald Green and, ostensibly, Josh Howard (Kevin Pelton suggested it could also have been Marquis Daniels) under the bus during the various panels. In the basketball panel, Green was singled out by Cuban as a hugely talented player who, regrettably, didn't understand the game well enough to put his tools to good use. Then, in the coaching panel, Johnson told a story about a (nameless) young player who had an amazing work ethic early in his Mavs career, coming to practice early, staying late, and always hanging out in Avery's office during his free time, asking about how he could improve as a player... But sadly, after the player was rewarded for his hard work with his first big contract, he suddenly stopped working as hard, and was nowhere to be found in the coach's office. Avery declined to name the player, but the speculation was that it was either Josh Howard (re-signed to a 4-year, $40M contract extension before 2006-07) or Marquis Daniels (re-signed to a 6-year, $38.6M contract before 2004-05).
- Some of the biggest excitement in the Basketball Analytics panel came when Synergy (and video analysis in general) came up. Mike Zarren of the Celtics said that he could now use Synergy to call up in seconds all of a player's possessions on a certain side of the court with a certain play called against a certain defense, which in the past they wouldn't even have been able to have a video guy do because it would have been far too time-consuming. John Hollinger and Dean Oliver felt like video-scouting was one of the major directions analytics would go in the future, simply because of the technological advances Zarren mentioned.
- Oh, and did I mention that Cuban is a major investor in Synergy? Pritchard lamented this, and confessed his paranoia that Cuban was using the software to spy on the other teams who used it! They also talked about developing a scaled-down public version of Synergy, which would be totally awesome.
If I can read more of my hand-scrawled notes, I'll post more impressions later...
March 8th, 2010 at 5:29 pm
Great stuff, Neil.
March 8th, 2010 at 5:38 pm
Indeed, quite fascinating. in your opinion, Neil, who is the best coach / GM at interpreting and using stats?
March 8th, 2010 at 5:40 pm
Fantastic overview, Neil.
March 8th, 2010 at 6:11 pm
Thanks, guys. I think Morey is definitely the best GM when it comes to stats, but Pritchard impresses me every time I hear him talk. Here's a guy who played the game at the highest level, but he's naturally curious and he's not afraid to take a chance in the name of better information. In fact, he even said he liked stats when he played, and liked them even more when he was coaching and scouting. So I think he's really sharp, and I also like Sam Presti, he does a good job of combining traditional scouting and analytics. Ferry has some great stats guys and it feels like his team was built around LeBron the way an APBRmetrician would have put it together, so I'm sure there's some impact there as well. The panelists said that more than half of the franchises in the league have some kind of stats team, so there's a good chance the list of great stats GMs will grow in the next few years.
March 8th, 2010 at 6:34 pm
Thanks for all the great info, Neil! Could you give a general feeling of the crowd / presenters' reactions to some of the presentations? I would think the "Price of Anarchy" paper in particular would generate a lot of surprise - I'm certainly surprised by the 20-20-20-20-20 results!
March 9th, 2010 at 12:50 am
I wanted to comment on the 'Value of a blocked shot' piece that I read on ESPN.
Though I only saw summary commentary I think the whole idea put forward is quite flawed because it seems to assume a normal distribution of expected value and doesn't factor in any diminished expected value caused by the defender prior to them blocking a shot.
For example, if I was playing in the NBA and had the choice of driving to the rim with Jermaine O'neal's old legs trying to block my shot or D12's 40 inch vert and 300 pound frame trying to block my shot, I am choosing to drive against Jermaine every time. As such the higher frequency of layups available for Jermaine to block would make it more likely that he gets opportunities to block layups.
Further, the fact that D12 is able to close out on NBA guards and block shots speaks volumes to his shotblocking ability. The fact that Jermaine rarely blocks jump shots could just as easily be an indication of his inability to get in a position to block jump shots as opposed to any extra value he creates. Tim Duncan would now be suffering from a similar phenomenon given how much he has slowed in the last two years compared to his youth. Also, given it is more difficult to initiate a fast break blocking a jump shot than a layup due to the trajectory of the ball prior to being blocked, the fact that one block leads to a fast break and another doesn't seems like a faulty measure of a shotblocker's value as Jermaine and TD physically could not block some of the shots D12 does.
In general you should expect a Center's value according to this measure to increase with age as they become less mobile and therefore more likely to be unable to move far from the rim on defence. As such this measure appears fundamentally flawed in assessing the value of a shotblocker on face value.
If there is more info on this article and their previous stuff on 'the hot had' I would be interested to read it more thoroughly if you are able to post it online or email it around.
March 9th, 2010 at 12:54 am
As far as the "Price of Anarchy," here is an excellent article about the subject. There are great charts, analogies, and math. Highly recommended
http://gravityandlevity.wordpress.com/2009/05/28/braesss-paradox-and-the-ewing-theory/
March 9th, 2010 at 1:37 am
I haven't read the whole Anarchy paper yet, but I have trouble believing that the expected values will be better with 20-20-20-20-20. Just a theoretical example, say that Steve Nash and Grant Hill have equally viable shots for the whole game. Nash has a higher FG% and EFG% than Hill, so the expected value of points with Hill taking as many shots as Nash is less than that if Nash took more shots than Hill. Thus, letting Nash take a portion of Hill's shots would give the team more points as long as he takes the right shots.
March 9th, 2010 at 3:17 am
I yearn to be a member of the who's who club. One day...
March 9th, 2010 at 10:26 am
Being a member of the 'lucky thousand', I have to say this recap is a good one. I spent all of my time in the research paper room before lunch (when 'price of anarchy' and 'adjusted plus-minus' were presented), and noticed the following:
-at LEAST Cuban and Morey were in the room for the 'adj +/-' presentation. There was an entire row of experts, possibly a dozen of them, all clustered together in the middle of the audience. Afterwards, during the question-and-answer part of the program, Mr. Cuban made note of HOW his team uses adjusted +/-, and how extremely specific and situational the data needs to be in order for statistical analysis to TRULY have merit for predictive purposes. I didn't have time to get direct quotes, but he made general mention of the following points:
1) The data is only worth noting if you have a large enough sample (he mentioned pulling the trigger on the Evan Eschmeyer signing BECAUSE his metrics showed so much promise for the young NorthWestern Alumnus; and noted that they NEVER jump to conclusions based on paltry data samples because of it.
2) The data needs to be specific in order to be important. (Eg: Stats in the 2nd night of a back-to-back; stats in the 4th game in 5 days; stats on the last game of a road trip; stats in the 4th quarter, against a double-team, when down by 10 or more points; (he continued on for 7 or 8 more examples, which truly showed how ahead of the pack the Mavs' 'statistical war chest' has become). -how do you like the parenthetical-in-a-parenthetical? Thats how I ROLL!)
On that note, I have spoken too much.
-Dresch
March 9th, 2010 at 6:41 pm
Good writeup Neil. I attended for a third time and enjoyed the event, as always.
If I understand the anarchy presentation correctly, I think it has some validity and application, but it fails to explain "The Ewing Theory".
First of all, and this may be besides the point, the Knicks very rarely did better without Ewing. By looking at the Knicks record in Ewing's BR splits, it is apparent that they win more frequently when he plays. I estimate that this methods shows he adds at least 7 wins per 82 games over his Knicks carer. I can expand on this conclusion if people want to see it. In fact, if I do this for other superstars, this usually ends up being a decent proxy for their expected value using statistics, etc. A similar method is used to approximate value in Basketball on Paper.
To me there appears to be mild usefulness for applying this to skill curves. The problem lies with how the team determines who shoots what shot. I feel teams should look for the easiest shot first and the bailout shot last. Thus, the last shot that should be passed up is an open layup then an open three (depending on the player of course), etc. If the team can set up a good strategy that basically looks for this progression, then you can clearly envision the effects of the "skill curve". In Ray Allen's example, the first shot would be successful at 75% (using the given .75 - .62x curve provided with x being usage%), the next would be less than that and his hardest shots when he is taking 60% of his team's shots would have virtually no chance of success. (Keep in mind that there is limited applicability to estimate success beyond the observed data points used to determine the yield curve. There are few observable situations where a player takes more than 60% of his team's shots.) By this method, which assumes a linear decrease in eFG%, the point in which Allen's next shot is expected to succeed at less than 50%, he has a 62.5% cumulative eFG%, which corresponds to the optimal eFG% quoted in the "Price of Anarchy" presentation. If I understand the assumption of traffic theory, it is assumed that when Ray Allen's eFG% is 50%, every shot he takes has a 50% success rate.
In reality, teams employ a combination of the two. A team will often look for a good shot type such as a close shot or open three and give the ball to their best shot creator, or bailout guy as the shot clock winds down. This is why eFG%'s decrease with the shot clock. However, teams will often have their best player shoot difficult shots to keep the defense honest and to suck the defense in and open things up on other possessions.
The best use for this study would be analyzing shot types. I always thought that being able to shoot a 15 footer was very valuable in keeping the defense honest, even though it is the poorest shot in terms of expected points produced. I always wondered what the optimal 2 point shooting usage would be in relation to its actual success. Shooting this more might suck the defense in and open up easier 3's and close shots for others. This could shed some light on the value of keeping defenses honest.
March 9th, 2010 at 10:11 pm
Nice comments, Scott.
March 14th, 2010 at 11:29 am
Thank you very much Neil for this post.
November 10th, 2011 at 7:47 pm
Hi There. Your a formidable writer with unique talent and imaginative thoughts. This is excellent work. I'm considering about starting my own site. I'd like to ask if it is demanding to run your own website? I certainly enjoy commenting. Merci.