Tuesday, 18 August 2015

xG Hexagonal Maps

First popularized by Kirk Goldsberry and then introduced to hockey via War-On-Ice's Hextally plots, hexagonal plots are a great tool for helping to visualize sports. I have created my own version's below in the form of apps. Two quick caveats, my current 2014/2015 data seems to have some bugs in it so take those seasons with a grain of salt and the individual attempts map also seems to be buggy for reasons currently unknown. I am working to fix both of those issues but just keep them in mind.

Here are some of the features of my xG Hexagonal Maps:

  • If you are unfamiliar with xG (Expected Goals) you can read my post detailing the methodology here. Simply, it provides the probability of any given shot resulting in a goal. 
    • A slight change between this xG and the one from that post is that these numbers also included missed shots now
  • The size of each hexagon is the frequency of shots from that specific location. The larger the hex, the more often a player shoots from that location
  • Each hex is coloured by the efficiency (xGe) of a player/team/goalie from that specific location.
    • Efficiency here is measured as the difference between how many goals we expected them to score from that location (their xG) and how many they actually scored from that danger zone.
    • A Blue Hex means that their xG was greater than their actual G, implying that they may have under-performed. 
    • Red Hex means that their xG was less than their actual G, implying that they may have over-preformed. 
  • Danger Zones are denoted by the light-pink and light-purple lines, high/medium/low. 
  • Not every red hex means a player over-preformed and not every blue hex means a player under-preformed. If you play in front of Henrik Lundqvist, your On-Ice Against xG is probably always going to be higher than your actually goals against. 
The links to all the different maps are posted below. Please let me know if you have any thoughts, questions, concerns, suggestions find anymore bugs . You can comment below or reach me via email me here: DTMAboutHeart@gmail.com or via Twitter here: @DTMAboutHeart  

Team Attempts Map

https://dtmaboutheart.shinyapps.io/app-1

Goalie Map

https://dtmaboutheart.shinyapps.io/Tendy

Player On-Ice Attempts Map

https://dtmaboutheart.shinyapps.io/PuckOn

Player Individual Attempts Map

https://dtmaboutheart.shinyapps.io/Single

Friday, 7 August 2015

Team Rankings and Projections

Note: I have since named this projection system MONDO for no reason other than it is also my friends name.

Projecting hockey isn't an easy task. Just ask SAP, NHL.com's partner in arms, contracted to help this multi-billion dollar industry tackle concepts developed over the past decade by hobbyists. SAP claimed to have developed a model that boasted 85% accuracy, never mind the fact that is basically impossible to achieve. These past playoffs SAP finished with a record of 9-6 (or 10-5 if you are allowed to change your picks once the series has ended).  I will not dive into the world of subjective rankings and predictions either, every outlet on the globe that covers hockey will probably share their gut feelings for next season. Starting at the player level, building up to the team level, I wanted to objectively quantify an individuals impact on their team and then be able to project how each team will perform in the future.

Most of the models we see in hockey deal tend to only operate at the team level, which isn't all that bad. However, in building my model (based off of the basketball version created by 538) I wanted to build something that was truly founded at the player level and malleable to changing circumstances. My model will adapt to injuries, trades and lineup adjustments. That goes both for players and goalies.


Methodology 

Players

The player projections here are based off of Corsi Plus-Minus (CPM). If you don't know what CPM is you would be best off reading about it here. The most basic definition of CPM is that it reflects the impact a given player has on their team's Corsi when said player is on the ice independent of the strength of that player's teammates. 

The projection system used here is what is known as a Marcel Projection system, originally derived by Tom Tango. It involves three basic components. The first step is weighting past seasons based on recency. Here we use a 5/4/3 method, meaning that if we are projecting the 2015/2016 season we would weighted 2014/2015 stats 41.66%, 2013/2014 stats 33.33% and 2012/2013 stats 25%. Simple enough. Using three years worth of data helps ensure we don't put too much stock into one extremely/good or bad season. Giving recent seasons more weight helps us to account for players that might be trending up or down. 

Second step is to apply a regression to the mean based on games played over the past 3 seasons (once again weighted by recency). Reminder, regression doesn't always mean getting worse. Regression to the mean here implies that we are pulling a player's numbers closer to league average (0 for CPM stats). Players who didn't have a single game in a season are given 10 games at league average play for those missing seasons. Also, if a player did not meet a certain threshold of weighted games played over the past 3 seasons instead of pulling their numbers towards zero, their numbers are pulled towards -1.5 which is a about replacement level. That may sound like a ton of regression but in actuality most of it is just to handle outlier players who would otherwise have unrealistic numbers. The logic basically goes, the less experience a player has at the NHL level the more cautious we need should be about their abilities. Rookies and other players who did not play a single NHL game last season they are given projected values of 0 (league average).

Finally, we apply an aging curve. I wanted this aging effect to be present yet not overbearing. I designated peak age to be 26 with a 4% increase up until then and 2% decrease after that. If over 26, Age Adjust = (Age - 26) * .002. If under 26, Age Adjust = (age - 26) * .004. Not crazy, but it serves its purpose. 

Now that we have our projected OCPM (Offense), DCPM (Defence) and CPM (Overall) values, we have to convert them into goals. Based on a recommendation from Steven Burtch, I looked at each teams shot distributions by danger zone. Using these shot distributions and the average Corsi Shooting% from each zone we can derive an expected On-Ice Shooting% for the players on each team. I then used the same process but looking at corsi against to create an expected On-Ice Sv% for players. 

I then used the Marcel system (from above) but adapted for a player's on-ice shooting percentage. This produces an expected On-Ice Corsi Sh% multiplier for each player (ex. when Tyler Seguin is on the ice, you should take his team's Sh% and multiply it by 1.18). Forwards can have an impact on their On-Ice Shooting%, Defenceman cannot. Therefore when converting a players OCPM into goals we will incorporate this multiplier while defenceman will simply receive the expected team average shooting% (calculated via the method above). Neither forwards nor defenceman can consistently influence their On-Ice Sv% so each player is assigned the team expected Sv%.

The last piece is to convert our CPM from rate stats to counting stats. Simply, we need to factor in playing time. This is where lineup construction comes in handy. I assign each player the league average time on ice based on their position (Forward vs. Defense) and spot in the depth chart (1st vs. 3rd line). First liners will have more of an impact than fourth liners but it is important to remember that what happens when your lesser players are on the ice still counts. Combining all of these factors leaves us with our expected offensive, defensive and overall values:

Goals For Above Average = Projected OCPM * Projected Time on Ice * Expected Team CSh% * On-Ice CSh% Multiplier

Goals Against Above Average = Projected DCPM * Projected Time on Ice * Expected Team CSv%

 Total Goals Above Average = Goals For Above Average + Goals Against Above Average 

Goalies

Goalies are voodoo, I know. Thankfully, we can projected them just the same way we did with players. Steps one and three are identical so I won't go over them again. Step two is the same in principle but I will clear up some of the details.

Goalies with little to no experience in a season are given 10 games at league average play. Also, if a goalie did not meet a certain threshold of weighted games played over the past 3 seasons instead of pulling their numbers towards league average (~0.923), their numbers are pulled towards replacement level (~0.910). If a goalie hasn't played any NHL games in the past 3 seasons they are given a rating of replacement level. 

I then split goalies by either starter or back-up. Starters are assumed to play 57.5 of their team's games, while back-ups will play the other 24.5. This is the average playing time split in the NHL. Each goalie is always assumed to face the league average even-strength shots against per game (~23.2). Our final number for each goalies impact is: 


Goals Saved Above Average = (Projected SV% - League Average Sv%) * Projected GP * League Average Shots Against


Season Simulation

Now that we have our player and goalie ratings, we simply simply add up all 18 players and 2 goalies impact for each team. This will give us Team Goals Above Average, a simple way to think about it is just a team's projected goal differential. Using the pythagorean expectation (developed by Bill James) we can project a team's winning percentage based on how many goals we expect them to score and allow, using this formula: 

Win % = goals scored1.8 goals scored1.8goals allowed1.8

Assuming that a team playing on home ice has an inherit 55% chance of winning just by the nature of playing on home ice we can using this formula (also created by Tom Tango), for predicting the odds of a home team winning a given game:


Home Team Win Probability = [(Home Team Win%) * (1 - Away Team Win%) * .55] / ([(Home Team Win%) * (1 - Away Team Win%) * 0.55] + [(1 - Home Team Win%) * (Away Team Win%) * (1 - 0.55)]))

I am sorry if that looks convoluted and hard to follow but we are really just plugging in both teams rating. Using that formula we can now, for any combination of of two team's, figure out the odds of either team winning. We assume that every game has an equally likely chance of going to overtime, low event teams are more likely to go to overtime but its not a substantial difference. Winning or losing a overtime/shootout is equal for each team.

Hockey is a game of probabilities. If Team A is favoured 60% to 40% over Team B, it is important to keep in mind that 4 times out of 10 the lesser team will win that game. To simulate this randomness, for any given game we will roll an imaginary dice. This is a basic example but hopefully can help explain the process:


This shows the outcomes if we had two perfectly equal teams facing each other. Of course we aren't dealing with perfectly equal teams in real life. So when we apply this to actual games the numbers will not be as simple. 

Using the actual 2015-2016 schedule and the above formulas we can derive the win/loss/OT probabilities for every game. Then generating a random number (rolling our imaginary dice) we can find out the result of each game. Do that for every game and congrats, you just used the Monte Carlo method to simulate an entire NHL season. Just like a single dice roll, simply doing this simulation once could generate some weird results. This is why we need to simulate the season 10,000 times to smooth out those weird results (10,000 times might be a tad excessive but it doesn't hurt).


Rankings and Projections

Here are the results below. I also divided the results up to show you the splits between a team's forwards on defence. To get Team Goals either add For + Against + Goalies or Forwards + Defense + Goalies. 




Final Notes

  • I will do my best update these rankings/projections with every trade and injury
  • Once a majority of the 2015-2016 season has been played and I have new CPM data, I will update the model accordingly
  • During the season, the expected point totals will be divided into:
    • Total points gained so far
    • Total expected points for the remainder of the season
    • Total expected points at seasons end
  • Despite my best efforts, some players are still really screwy. Most notably Michael Bournival for Montreal who gives them one of the best 4th lines in the league almost single handedly
  • No, substituting career AHL player A for career AHL player B will not change your team's projections
  • In looking at the methodology behind the model you can make your own changes to the model as you see fit. Think your better goalie will play 70 games this season? (despite the fact that over the past two season only 2 goalies have played that many) Then you can go ahead and add a few goals to your teams projection
  • This isn't the worlds most sophisticated model, it is not supposed to be
  • Below is the data used in the model.
    • Role - Position in the lineup
      • 1 - 2 - 3 - 4 : Forwards
      • 10 -11 -12 : Defense
      • 1 -2: Goalies
    • GFAA - Goals For Above Average
    • GAAA - Goals Against Above Average
    • TGAA - Total Goals Above Average
    • GSAA - Goals Saved Above Average
Please let me know if you have any thoughts, questions, concerns or suggestions. You can comment below or reach me via email me here: DTMAboutHeart@gmail.com or via Twitter here: @DTMAboutHeart