Virginia vs. Michigan State: A Statistical Analysis

The upcoming game between Virginia and Michigan State is intriguing on several levels. First, this is a battle between two contrasting programs. Long established national power Michigan State, with legendary coach Tom Izzo, fields a team of sharpshooters who face off against a Virginia team that has slowly built its way back to national prominence through the grind-it-out style basketball coach Tony Bennett emphasizes. Second, this game is the only Sweet Sixteen matchup where the Vegas betting line favors the lower seed. In other words, Michigan State is considered the favorite. Last but not least, especially in regards to this piece, according to various statistical models Virginia is anywhere from a moderate favorite to the ever so slightest of underdogs. These intriguing points raise three questions: how do these teams match up? What current analytic model is most accurate? And how can we improve upon these existing models to come up with the best possible estimate of each team’s probability of advancing to the East Region final?

Starting with the “simplest” of models, Ken Pomeroy’s formula over at estimates that Virginia is a 63% favorite to advance to the Elite Eight. His model uses tempo-adjusted offensive and defensive efficiencies to assess team strength using Pythagorean Expectation. Essentially, he is calculating the expected winning percentage of one team against an average Division-I team. However, when two teams match up, it’s highly unlikely one of them is the average D-I team. He compares two teams by using what’s called a log5 formula to calculate the expected winning percentage for one team over the other. This is how his formula derives that Virginia should beat Michigan State 63% of the time. His model can also account for home-court advantage using an adjustment to the tempo-adjusted offensive and defensive efficiencies for the home team. However, since this game will be played in a neutral venue, we’ll skip past his adjustments for home-court advantage. Kenpom’s method also has an advantage over purely margin-of-victory-based models such as Georgia Tech’s Bayesian LRMC model or Raymond Cheong’s rankings which do not account for the tempo of play. In these models, a 30 point win is counted the same, regardless of whether this margin of victory was achieved in a 50 possession per team game or an 80 possession per team game. Clearly, achieving such a large margin of victory in a 50 possession per team game shows that the winning team was extremely efficient, reaching this margin with significantly fewer chances to do so when compared to an 80 possession per team game. Pomeroy’s model does predict margin of victory as described before using tempo-adjusted efficiencies.

However, as useful as this model is, Pomeroy himself admits and understands that this model cannot account for several factors. These include things like injuries or suspensions, on-court matchups, and other intangibles whose effects on a particular game are hard to quantify (examples include player experience, depth, coaching, officiating, etc.). We will focus on the two factors that can be adjusted for, injuries or suspensions, and matchups.

Nate Silver at and ESPN’s BPI are two models that do account for injuries or suspensions. Silver’s missing player adjustment uses a concept called win shares, which is roughly equivalent to measuring the impact a player’s absence has on the point differential per game. The example he gives is that Brandon Davies’ suspension in 2011 hurt BYU by about 1.7 points per game. ESPN’s missing player adjustment de-weights games where one or both teams are missing key players and makes the adjustments on a minutes-per-game basis. This is a good adjustment, because it is tempo independent (whether a game has 80 possessions or 50 possessions per team per game, if a player missed half the game, he missed approximately half of the possessions). One additional item to point out is that Silver’s calculations for 2013-2014 season now include ESPN BPI as one-seventh of his base power rankings, and then the base power rankings are adjusted for missing players. So there is a full missing player adjustment plus another one-seventh missing player adjustment. Since Silver’s missing player adjustment has an extra one-seventh component adjustment in his method, we choose to use ESPN’s BPI, which accounts for tempo and where a specific weight is given for each game.

The second adjustment we explore here is for the type of matchup this game presents. There are no rating methods (to my knowledge) that account for the nuances of matchups. However, there is some recent work in this area as ESPN, in conjunction with Liz Bouzarth, John Harris and Kevin Hutson of Furman University, has led the way in matchup-based analysis. They apply their model simply to identify potential NCAA tournament upsets in what they call their “Giant Killers” model. They use a technique called cluster analysis to group similar teams together and identify which groups of significantly lower-seeded teams have the potential to upset much higher-seeded teams (the seed differential for their model to apply must be at least 5).

We build off their ideas, and use cluster analysis to group all 351 NCAA Division-I teams into 8 distinct groups based on their style of play. We then analyze all the win/loss results from the 2013-2014 season and compare each group’s winning percentage over every other group in relation to the expected winning percentage calculated by Pomeroy using an adjustment to these winning percentages through BPI’s tempo-free missing player de-weighting. In simpler terms, if Group A was expected to beat Group B at a 57% clip (according to Pomeroy and adjusted for missing players), but Group A actually beat Group B 65% of the time, we might conclude, based on the sample-size, that this is a significant difference and that Group B outperforms expectations against Group A based on how they match up. We then apply this to the Virginia/Michigan State game to make one final adjustment to Pomeroy’s winning percentage, giving us two adjustments, one for missing players and one for the matchup.

The missing player adjustment is fairly straightforward. Using BPI’s weightings, we see Virginia’s results are deflated by missing players only slightly as the Wahoos and their opponents played mostly full strength all year. The Cavaliers’ adjusted offensive efficiency increases by 0.1% while their defensive efficiency decreases (they become less efficient defensively) by 0.3%. However, Michigan State was struck by injuries for half the year, so their offensive and defensive efficiencies are boosted by about 0.6% and 0.1%, respectively. Recalculating the projected winning percentage with these additions, Virginia drops from a 63.0% favorite over the Spartans down to a 60.2% favorite. In other words, even accounting for Michigan State’s injury troubles throughout the year, Virginia is still expected to produce overall better results against the rest of NCAA Division-I.

However, we have only adjusted for missing players up to this point. We still need to adjust for the on-court matchup. According to the results of my cluster analysis there are eight distinct styles (or “types”) of teams. Each type has similar features that distinguish it from other types of teams. This produces 64 possible matchup combinations (eight types of teams can face eight other types of teams). UVa is a “Type 6” team. These teams are very efficient defensively, forcing an extremely low effective field goal percentage and a very high rate of turnovers. They tend to be good rebounding teams on both ends, but are only slightly above average in effective field goal percentage. Michigan State is a “Type 7” team. These teams are extremely efficient shooters, usually win the turnover battle, and are great on the defensive boards. Type 6 and Type 7 teams are very good and quite similar overall, with Type 6 teams better defensively and Type 7 teams better offensively, especially at shooting. Type 6 and Type 7 style teams produce teams that have the two highest average rankings according to Pomeroy’s rankings adjusted for missing players.

According to the missing player-adjusted winning expectations, Type 6 teams tend to be slightly higher rated than Type 7 as a whole. For matches played in the 2013-2014 season between one Type 6 team against one Type 7 team, Type 6 teams were predicted to win 64.6% of the time over Type 7 teams using Pomeroy’s ratings that have been adjusted for missing players. However, in reality the win rate was about 20% lower than the expected missing player-adjusted win rate, with a 51.7% actual win rate. Based on the sample size the p-value associated with this is <0.01, and we can indeed conclude that Type 7 teams pose a particularly tough matchup for Type 6 teams. In other words, Type 6 teams tend to dominate teams that are inferior, but struggle more than expected against other top-tier teams who are more efficient offensively and less efficient defensively. As a result, we adjust UVa’s expected win ratio of 60.2% down by the average 20% giving the ‘Hoos a 48.1% chance of winning the game against Michigan State.

When compared to other analytic models, this model actually produces the least favorable chance for a Cavalier victory. Adjusted for tempo, this yields a result of Michigan State winning on average by just over 1 point. Currently, the Vegas consensus is Michigan State favored by 2 points with 67% of the money coming down on the Spartans. Thus, bettors seem to be underestimating the Cavaliers’ chances of victory.

The next entry discusses the intangibles for each team, the impact of which is hard to quantify statistically, and delves into what Virginia needs to do to turn the tables in their favor.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s