Friday, 31 October 2014

Puckerings archive: Factors Affecting NHL Attendance (29 Oct 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on October 29, 2002.
 

Factors Affecting NHL Attendance
Copyright Iain Fyffe, 2002


This paper builds upon the work of Wiedecke, who examined factors affecting NHL attendance using a multiple linear regression model. A summary of this work follows.

Data from the 1997/98 NHL season were used, giving 26 data observations. The dependent variable used was the percentage of capacity (called "Attendance Capacity"). That is, if a team averaged 15,000 fans in an arena with a capacity of 15,500, the team had an Attendance Capacity of 97% (15,000 divided by 15,500). The independent variables used were standings points, goals scored, and penalty minutes (which are all self-explanatory), and location (explained below).

Location for each team was assigned a value of 1, 2 or 3 based upon the team's geographic location. A value of 1 was assigned to the northernmost teams (Calgary, Edmonton, Montreal, Ottawa, Toronto and Vancouver). A value of 2 was assigned to Boston, Buffalo, Chicago, Colorado, Detroit, New Jersey, New York Islanders, New York Rangers, Philadelphia, Pittsburgh, and St. Louis. A value of 3 was assigned to the southernmost teams (Anaheim, Carolina, Dallas, Florida, Los Angeles, Phoenix, San Jose, Tampa Bay, and Washington.

(1) by incorporating a larger data set;
(2) by redefining the dependent variable; and
(3) by introducing a new indepdendent variable.

 
Rather than using only the 1997/98 season, I will use data from 1995/96, 1996/97, 1997/98, 1998/99, 1999/2000, 2000/01 and 2001/02, giving 193 data observations.

 
I will use average attendance as the dependent variable, rather than percentage of capacity. By using the percentage, a team which fills 14,800 of 15,000 seats (98.7%) is considered superior to a team which fills 19,700 of 20,000 seats (98.5%). This does not reflect reality well, as the second team draws a full 33% more fans.

 
The independent variable added is Novelty. A value of 5 is assigned to a team in its first year in the league (after either an expansion or franchise relocation), and this is reduced by one for each subsequent year in the league until it reaches 0. The purpose is to determine if new teams get an attendance boost simply by being new, as if often postulated. The four independent variables used by Wiedecke are also used.
 

Variable Correlations
 
A variable correlation analysis is performed to examine the data for possible cross-correlation effects. Only one pair of variables, goals and standings points, has a significant correlation (positive 0.64). Therefore if both goals and points are found to be significant, care must be taken in their interpretation due to cross-correlation. Other pairs with less-significant correlations are attendance and points (positive 0.39), attendance and location (negative 0.31), and location and novelty (positive 0.30).

 
The following table indicates the coefficients of correlation for all variables used: attendance (ATT), points in standings (PTS), goals scored (GF), penalty minutes (PIM), location (LOC) and novelty (NOV).

 
 ATT  PTS  GF  PIM  LOC  NOV
 ATT  -  .39  .25  -.04  -.31  -.17
 PTS  .39  -  .64  -.28  -.17  -.19
 GF  .25  .64  -  .10  -.22  -.17
 PIM  -.04  -.28  .10  -  .04  -.01
 LOC  -.31  -.17  -.22  .04  -  .30
 NOV  -.17  -.19  -.17  -.01  .30  -
 

Results of the Model
 

The results of the multiple linear regression model are as follows.
 Constant (y-intercept)  13,326
 Standard error of estimate  2,071
 R-squared  0.223
 Variable  Coefficient  St. error  t-stat
 PTS  61.08  13.56  4.50
 GF  -6.90  7.16  -0.96
 PIM  0.80  0.61  1.31
 LOC  -778.93  211.43  -3.68
 NOV  -47.92  119.85  -0.40
 

Discussion of Results
 
The t-statistics of GF, PIM and NOV indicate there is little evidence that they affect attendance in any significant way. On the other hand, there is very strong evidence that PTS and LOC significantly affect attendance. These findings agree with Wiedecke.

 
Overall, the model is not extremely useful; the R-squared figure indicates only 22.3% of the variability in attendance is explained by the model. This may indicate there are other independent variables that should be considered.
The correlation between the two significant independent variables (PTS and LOC) is -0.17, indicating there is no significant cross-correlation effect.

 
Interpretation

 
According to the model, having a good team is the most significant factor affecting attendance. Ceteris paribus, each additional standings point increases attendance by 61 fans per game. A 90-point team therefore has a 610-fan advantage in average attendance over an 80-point team.

 
The location coefficient indicates that the further south a team is, the worse its attendance is. All else being equal, a team in the southern US averages 1,558 fans less per game than a team in Canada. This is significant because the NHL's recent strategy has been to put as many teams in the southern US as possible, either through expansion or franchise relocations (including moving teams from Canada to the southern US). The results of this model suggest that this strategy is seriously flawed. In this case, analysis agrees with common sense: why are markets where there are hockey fans ignored in favour of markets where there are no hockey fans? At least the most recent expansion was more logical, and didn't put any more teams in the Sun Belt.
 

Reference
 

Wiedecke, Jennifer. 1999. Factors Affecting Attendance in the National Hockey League: A Multiple Regression Model. Master's thesis, University of North Carolina, Chapel Hill.

Friday, 24 October 2014

Puckerings archive: Win-Things Theory (18 Oct 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on October 18, 2002.
 

Theory: Win-Things
Copyright Iain Fyffe, 2002


The most common perspective put forward on win theory can be summarized as follows:

Before a game begins, each participating team has a 50% chance to win (a .500 expected winning percentage), ceteris paribus. As the game progresses, and as each team does things that affect their chances of winning or chances of losing, the expected winning percentage of each team changes. For instance, if a team scores a goal after 5 minutes of play, their percentage may change to .550, and the opponent's would therefore be .450, since the percentages necessarily sum to one.

At the crux of this theory lie two ideas: (1) before a game begins, a team's winning percentage is .500, and (2) a team does two types of things that affect its chances of winning: good things (which we'll call "win-things") and bad things (which we'll call "loss-things".)

As a team, you have no significant control over what your opponents do. Therefore, at least from an analytical perspective, you can assume they will do an average number of things to win. At the beginning of a game, you have not yet done anything to win, and have no guarantee that you will do so. Therefore, your expected winning percentage before a game is not .500, but .000.

Teams try to win games, they do not try to lose them. Therefore a loss-thing is merely a failed attempt at a win-thing. Just as darkness is merely the absence of light, loss-things are merely the absence of win-things. Therefore win-things are what matters, and this is why I refer to this theory as Win-Things Theory.
 
The idea that you cannot control your opponent's actions is carried throughout the thoery. For instance, in the traditional theory, scoring a goal is a very good thing (i.e., it has a high Win-Things value). Under Win-Things Theory, whether or not a shots actually produces a goal is irrelevant to the shooting side. The Win-Things were produced by the shot itself, with a higher-quality shot producing more Win-Things. Conversely, the opponent's Win-Things on the play depend on whether or not the shot is stopped. Stopping the shot produces Win-Things about equal to the Win-Things resulting for the other side by taking the shot. Not stopping the shot produces no Win-Things (it does not produce Loss-Things).

 
It should be noted that the .000 beginning expected winning percentage applies only to one-team analysis. In two-team analysis, where the actions of both teams are considered, the expected percentage would depend on the Win-Things each team has accumulated. But generally speaking, one-team analysis is more useful in analyzing what contributes to winning, by assuming opponents to be average in all regards.

 
Traditional theory focusses much attention on expected winning percentage. Win-Things Theory does not. The point is not to get your expected winning percentage up; the point is to accumulate more Win-Things than your opponents. Since you cannot control how many Win-Things your opponents accumulate, the best way to ensure this is to accumulate as many Win-Things as possible.

 
This theory supports Bill James' Win Shares system for baseball, which I have adapted into the Point Allocation method for hockey. Win Shares has been criticized for not considering "Loss Shares". Using this new theory, Loss Shares are irrelevant, and the criticism is therefore invalid. Opportunity should still be considered, but fortunately in hockey games are timed, while in baseball the opportunities vary greatly from game-to-game, based on a multitude of factors.

Friday, 17 October 2014

Puckerings archive: Shots and Save Percentage (18 Oct 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on October 18, 2002.
 

Theory: Shots and Save Percentage
Copyright Iain Fyffe, 2002


In my investigation into the validity of Goaltender Perseverence, I looked into the relationship between the number of shots a goaltender faces per game and his save percentage. I found that, as the number of shots per game increases, save percentage does not decrease, on average, as the fundamental assumption of Perseverence argues. In fact, there is some evidence of a positive relationship; that is, as shots increase, save percentage increases.

This evidence was met with an "it doesn't make sense" reaction from those I presented it to. Well, common sense is often dead wrong. To explain this phenomenon, I present the following theory.

For simplicity, I will discuss only two types of shots: easy and tough (referring to the goaltender's perspective). There are in actuality many varying degrees of toughness of shots, but these two will suffice for our purposes.

Easy shots are largely discretionary. They are shots that result from situations where a player could choose to shoot, or choose another play. They are of lower quality than tough shots, because they are usually taken from a greater distance than tough shots, or less favourable circumstances.

Since easy shots are discretionary, there must be a reason that teams do not simply shoot every time, in order to maximize their goals scored. The reason could be twofold: you give up the opportunity to make a pass, which could result in a higher-quality shot, and the shot is more likely to produce a turnover, allowing a possible scoring chance for the opposition. Therefore, it is not always wise to take the shot rather than another play.
 
Save percentages on tough shots are low, and save percentages on easy shots are high. And since easy shots are primarily responsible for variation in shots faced by a goaltender (since the number of tough shots faced is relatively consistent), save percentage will increase as shots faced increases.

 
For example, let's say that the average tough shots faced per game is 5, and the save percentage on such shots is .800. This is the same for every goaltender. Any difference in shots faced is due to easy shots, which we'll say have a save percentage of .900.

 
A goaltender facing 25 shots will therefore face 20 easy shots (25 less 5). Goals against on tough shots is 1.0 (5 less .800 times 5), on easy shots 2.0 (20 less .900 times 20). 3 goals against on 25 shots is an .880 save percentage.

 
A goaltender facing 35 shots will have the same 1.0 goals against on tough shots, but will have 3.0 on easy shots (30 less .900 times 30). 4 goals against on 35 shots is an .886 save percentage. The goaltender facing more shots on average has a higher save percentage.

 
That is my theory of how save percentage can increase as shots increase. Unfortunately, this theory cannot be tested using information that is currently available. The NHL does track certain shot data (type, location) for shots that produce a goal, but not for shots that do not produce a goal. If this information were recorded for all shots, it could be used to test this theory.

Friday, 10 October 2014

Puckerings archive: The Cost of a Penalty (18 Oct 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on October 18, 2002.
 

Theory: the Cost of a Penalty
Copyright Iain Fyffe, 2002


The value of odd-man play is often debated. In the mass media, much ado is made about the power-play (and, to a lesser extent, penalty-killing), calling it a key to success. Others, such as Klein and Reif, downplay its importance, noting that even-strength play is better for predicting success. 

This essay takes a conceptual approach to this problem. What, in theory, is the importance of odd-man situations? To examine this question, I will examine a theoretical team, one which is average in all respects.

This team plays in three types of situations: even-strength (ES), power-play (PP) and short-handed (SH). Examining each of these situations reveals the answer we are looking for.

Even-strength: The team is completely average. Therefore, they will score exactly as many ES goals (ESGF) as they allow (ESGA). Thus, their expected net goal differential per minute of ES time (ESMIN) is calculated as follows:

( ESGF - ESGA ) / ESMIN

Which, for reasons discussed above, is zero. 

Power-play: On the PP, a team scores about three times as often as at ES, while goals against are cut in half. PP time (PPMIN) produces a net goal differential as follows, using 1998/99 figures:

( PPGF - SHGA ) / PPMIN
= ( 1533 - 220 ) / 16326 ... minutes figure is estimated
= 0.08

Short-handed: Since PP time for one team is SH time for another, SH situations produce the converse of PP, or -0.08 goals per minute.

Taking this all together, as average team will have a winning record if they can obtain more PP opportunities then they give. That's badly phrased, since a team with a winning record cannot be average, but you know what I mean. This is most easily accomplished by taking as few penalties as possible, since you have rather limited control over your opponent's actions.

From this perspective, odd-man situations are extremely important, as they decide games. The team taking fewer non-coincident penalties should win, on average.

If this perspective is valid, then we should be able to predict success based upon PP opportunities for and against. I tested the coefficient of correlation between net PP opportunities and standings points for a selection of recent seasons:

 1990/91  0.11
 1991/92  0.26
 1994/95  0.02
 1995/96  -0.02
 1998/99  0.63
 1999/00  0.23
 average  0.21

The correlations provide, on average, some support for the theory. They are generally positive, but not that strong (aside from 1998/99, which is very strong). But remember, we are not considering the quality of the teams, unless you consider taking few penalties to be a quality (which you should.) So there is some evidence that this theory is valid.

Friday, 3 October 2014

Puckerings archive: Greatest Teams of All Time (09 Oct 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on October 9, 2002.
 

The Greatest Teams of All Time
Copyright Iain Fyffe, 2002


The most thorough discussion of teams possibly deserving nomination as the greatest of all time is in Klein and Reif's Hockey Compendium. They base their conclusion that the 1929/30 Bruins are the greatest of all time on the team's .875 winning percentage, which is the highest of any team playing the minimum number of games.

There are, of course, two problems with basing the analysis solely on winning percentage. For one, an artificial games limit has to be introduced, to keep those 8-0-0 Montreal Victorias of 1898 and 10-0-0 Montreal Wanderers of 1907 from dominating the list. If we could avoid artificial restrictions like these, we could improve the analysis substantially. As it stands, these teams have no chance of being considered, no matter how great they may have been.

In addition, using winning percentage alone ignores the league context. That is, how good are the other teams in the league? Are there a few weak sisters to beat up on, or is parity the order of the day? Obviously, the greater the parity in the league as a whole, the more difficult it is to run up a high winning percentage. You don't get those cheap points; you have to fight for each win.
Therefore the analysis should be based on the degree by which a team dominates the competition, and the range of quality of said competition. One method to do this is explained below, by way of example. 

Let's examine the top two teams by Klein and Reif's analysis. The Boston Bruins of 1929/30 played in a league where the standard deviation of winning percentage was .188, which is fairly high for the era. Boston's winning percentage of .875 is .375 higher than the average (which is .500), or 1.99 standard deviations above the mean (.375 divided by .188). This is called a z-score, and this is what I will base my analysis on. It encompasses both how far above the competition a team was, and how much variation in quality there was between teams. Boston's Winning Percentage Z-Score (WPZS) is therefore 1.99, which is very impressive, but as we'll see, not the best of all time.

The 1943/44 Montreal Canadiens, rated #2 by Klein and Reif, had an .830 winning percentage in a league that a had a standard deviation of winning percentage of .215 (high due to the disparity in talent caused by the war). There was less parity in this league-year than in 1929/30. Montreal's WPZS is 1.53, which while quite high is nowhere near the best of all time.

This means that, relatively speaking, Montreal had a greater benefit of weaker teams to play against than Boston did. By analyzing teams in this way, we consider both the quality of the league and we remove the need for any arbitrary restrictions. Below is the list of the top 48 teams of all time (all those with a WPZS of 1.50 or greater), from among the NHL and its predecessors, as well as the PCHA and WCHL/WHL, and the WHA. 

The surprises start at the very top. The greatest team of all time, by this analysis, is the 1995/96 Detroit Red Wings. Their .799 winning percentage had them #7 on Klein and Reif's list. But the standard deviation that year was a mere .116, quite low for the era. Other than Detroit, the best winning percentage was .634. 19 of the 26 teams were between .400 and .600. Parity was the rule, yet Detroit was able to completely dominate the league. Their 2.58 WPZS is far and away the best of all time.

The next two spots come from two teams from the same season. The epic battle between Calgary and Montreal in 1988/89 is revealed to be of truly historic proportions. Other than these two teams, no team had a winning percentage of greater than .575, or less than .381. The parity this year was amazing; the standard deviation was only .100. Calgary's percentage was .731; Montreal's was .719. While both teams miss Klein and Reif's top 20, they're #2 and #3 here. Never has there been two teams which stood futher above the rest of the league.

Spot #4 is the 1976/77 Canadiens. Montreal's 1970's dynasty also makes appearances at #9, #16, #19, and #26. That's a hell of a decade, and it's no surprise that it shows up here.

Two more recent Red Wings sides take the 5 and 7 spots, with the Dallas South Stars outstanding 1998/99 campaign sandwiched in between. The great Bruins of 1929/30, ranked #1 by Klein and Reif, finally appear at #8.

If I were to ask you which Flyers teams was the best in their history, I doubt you would answer "the 1979/80 edition, of course!" But here they are in a tie for 9th with the best the Oilers have to offer, the 1985/86 team. Another 1980's Flyers squad (1984/85) appears at #22, well above the their best of the 1970's (1973/74), which comes in at a tie for #40. 80's Oilers teams also appear at #18, #36, #42, and #45. Not quite the 1970's Canadiens, but not bad.

The highest-ranked team of the pre-NHL era turns out to be the 1912/13 Quebec Bulldogs. In a league where the five other teams had records ranging from 10-10-0 to 7-13-0, Quebec went 16-4-0 to dominate the field.

The Houston Aeros were the WHA's greatest team, no surprise, claiming spots 13, 34, and 38. No other WHA club appears on the list.

Montreal's other great dynasty shows up a few times as well. 1958/59 is #18, 1955/56 is #25, 1957/58 is #28, and 1959/60 is #46. This is probably less impressive than the 1980's Oilers, but more than the Islanders teams which show up at #14, #23, and #42.

The Bruins of the early 70's don't show as well as you might expect, because they played in an expansion era. They appear "only" at #16, #24 and #32. The original Senators also appear thrice, at #25, #34 and #40, the last two from their pre-NHL days.

Finally we have the two perfect clubs mentioned before. Because these teams played in eras notable for their lack of parity, their 1.000 winning percentages are knocked down quite a bit on this list. The 1898 Victorias stand in a tie at #36, while the Wanderers show at #38. These teams (as well as the 1910/11 Senators at #40) were completely blocked out of Klein and Reif's list due to the artificial games restriction. Here, they get a fair shot.

The complete list follows:

 Rank  Team  Year  League  WPct  WPZS
 1.  Detroit Red Wings  1995/96  NHL  .799  2.58
 2.  Calgary Flames  1988/89  NHL  .731  2.31
 3.  Montreal Canadiens  1988/89  NHL  .719  2.19
 4.  Montreal Canadiens  1976/77  NHL  .825  2.18
 5.  Detroit Red Wings  1994/95  NHL  .729  2.08
 6.  Dallas Stars  1998/99  NHL  .695  2.05
 7.  Detroit Red Wings  2001/02  NHL  .707  2.02
 8.  Boston Bruins  1929/30  NHL  .875  1.99
 9.  Montreal Canadiens  1977/78  NHL  .806  1.97
 9.  Philadelphia Flyers  1979/80  NHL  .725  1.97
 9.  Edmonton Oilers  1985/86  NHL  .744  1.97
 12.  Quebec Bulldogs  1912/13  NHA  .800  1.94
 13.  Houston Aeros  1976/77  WHA  .663  1.93
 14.  New York Islanders  1981/82  NHL  .738  1.92
 15.  Boston Bruins  1938/39  NHL  .771  1.86
 16.  Boston Bruins  1970/71  NHL  .776  1.85
 17.  Montreal Canadiens  1972/73  NHL  .769  1.81
 18.  Montreal Canadiens  1958/59  NHL  .650  1.79
 18.  Edmonton Oilers  1983/84  NHL  .744  1.79
 20.  Montreal Canadiens  1975/76  NHL  .794  1.78
 21.  Colorado Avalanche  2000/01  NHL  .720  1.77
 22.  Philadelphia Flyers  1984/85  NHL  .706  1.73
 23.  New York Islanders  1978/79  NHL  .725  1.72
 24.  Boston Bruins  1971/72  NHL  .763  1.70
 25.  Ottawa Senators  1926/27  NHL  .727  1.69
 25.  Montreal Canadiens  1955/56  NHL  .714  1.69
 27.  Montreal Canadiens  1978/79  NHL  .719  1.67
 28.  Montreal Canadiens  1957/58  NHL  .686  1.65
 28.  Buffalo Sabres  1979/80  NHL  .688  1.65
 30.  St.Louis Blues  1999/2000  NHL  .695  1.62
 31.  Quebec Nordiques  1994/95  NHL  .677  1.61
 32.  Montreal Canadiens  1915/16  NHA  .688  1.58
 32.  Boston Bruins  1973/74  NHL  .724  1.58
 34.  Ottawa Senators  1916/17  NHA  .750  1.57
 34.  Houston Aeros  1974/75  WHA  .679  1.57
 36.  Montreal Victorias  1897/98  AHAC  1.000  1.56
 36.  Edmonton Oilers  1981/82  NHL  .694  1.56
 38.  Montreal Wanderers  1906/07  ECAHA  1.000  1.55
 38.  Houston Aeros  1973/74  WHA  .647  1.55
 40.  Ottawa Senators  1910/11  NHA  .812  1.54
 40.  Philadelphia Flyers  1973/74  NHL  .718  1.54
 42.  Montreal Canadiens  1943/44  NHL  .830  1.53
 42.  Edmonton Oilers  1984/85  NHL  .613  1.53
 42.  New York Islanders  1980/81  NHL  .688  1.53
 45.  Edmonton Oilers  1984/85  NHL  .681  1.52
 46.  Montreal Canadiens  1944/45  NHL  .800  1.50
 46.  Montreal Canadiens  1959/60  NHL  .657  1.50
 46.  Montreal Canadiens  1968/69  NHL  .678  1.50

For those interested in this sort of thing, here is the distribution of the top 48 seasons of all time: Montreal Canadiens 14; Boston Bruins and Edmonton Oilers, 5; Detroit Red Wings, Houston Aeros, New York Islanders, Ottawa Senators (first edition) and Philadelphia Flyers, 3; Quebec Nordiques/Colorado Avalanche 2; Buffalo Sabres, Calgary Flames, Dallas Stars, Montreal Victorias, Montreal Wanderers, Quebec Bulldogs, St.Louis Blues 1. Notably, half of the Original Six teams (Rangers, Chicago, and Toronto) fail to take a single spot, while the Habs have 29% of the top 48 to themselves.

Friday, 26 September 2014

Puckerings archive: Goal and Assist Z-Scores (04 Jul 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on July 4, 2002.


Goal and Assist Z-scores
Copyright Iain Fyffe, 2002


Methods have been developed in the past to identify dominant single-season performances. For example, some years ago I developed something I called goal-scoring dominance, which was calculated as the leading player's goals-per-game average divided by the second-leading player's goals-per-game average. A similar calculation was made for assists. I later discovered that Klein and Reif had developed the very same method years before, calling it Quality of Victory.

But this method suffers from a serious flaw. What if two players have outstanding seasons? The Quality of Victory formula will show that no one performed in a dominant manner, because the second-leading player's average is so high. This is not fair, nor is it accurate.

Goal z-scores (GZ) and assist z-scores (AZ) were designed to resolve this problem. It was hoped that they would not create any new problems; unfortunately this is not the case (more on this later). What we do is compare a player's performance to two things: the average individual player performance that year (in terms of goals per game or assists per game), and the degree of variation in individual player performance that year. Standard deviation is a way to measure this variability. For instance, the sets {1,2,3,4,5} and {0,1,3,5,6} have the same mean (5.0), but the second set has more variation, and therefore a higher standard deviation (2.5, compared to 1.6 for the first set). A z-score is simply the number of standard deviations an observation is above the mean (or below the mean in the case of a negative z-score). So, if we have a set of numbers whose mean is 5 and whose standard deviation is 3, then an observation of 8 would have a z-score of 1 ((8 - 5)/3). It's that simple.

In a normal distribution of events, about two-thirds of all observations will fall within one standard deviation of the mean (i.e., have a z-score between -1 and 1). 95% of observations will be within two standard deviations (z-scores between -2 and 2), and almost all will be within three standard deviations (z-scores between -3 and 3). Using z-scores we can determine how outstanding an individual performance was. For instance, only an outsanding season would produce a z-score of 3 or more.

That was the set-up. As it turns out, the results of this study are not that interesting; but what the results indicate may be of interest. The problem with the z-scores is that the top seasons of all time are dominated by recent players. For instance, in the top 40 GZ seasons, we have 5 from the 2000's (in only three years), 17 from the 1990's, 10 from the 1980's, four from the 1970's, and two each from the 1960's (Bobby Hull) and the 1930's (Charlie Conacher). So really is shows only the best of recent seasons. The assist results were predictable; Gretzky has the top 10 almost to himself, with Lemieux following. The top goal results are interesting enough to note (minimum 20 games played):

 Rank/Player  Year  GP  GZS
 1. Brett Hull  1991  78  5.95
 2. Wayne Gretzky  1984  74  5.86
 3. Mario Lemieux  1993  60  5.82
 4. Cam Neely  1994  49  5.80
 T5. Mario Lemieux  1989  76  5.64
 T5. Mario Lemieux  1996  70  5.64

So Brett Hull's 1991 campaign, while technically falling short of Gretzky's goal record, is actually more impressive than any of Gretzky's goal-scoring seasons by this analysis. But the real king of the list is Lemieux. In addition to spots 3, 5, and 6, he holds down numbers 11, 17, and 30 on the top 40 (as well as #41). Gretzky has #2, 12, 34 and 38. No contest.

But as I said, the results aren't overly interesting, because they are dominated by recent players. But the fact that recent players dominate is in itself interesting. It indicates that modern players are able to dominate the average players by a larger degree than older players. The cause of this is unclear, as it can be affected by the performance of the top players, as well as what constitutes an "average" player. But it's interesting because it's the exact opposite of what has happened in baseball, where the degree of domination by the top players has decreased over time, rather than increased. Food for thought.

Tuesday, 23 September 2014

Hall of Fame Standards for the Challenge Era

Today we're going to wrap up our look at the Inductinator, which is a system I devised to determine implicit standards for Hall of Fame player selections. Well, not quite wrap up, since I should make some comment about European and female players, which I will at some point. But let's stick to the things we can mostly explain for now. In Hockey Abstract 2014, I discuss at some length the results from 1930 to the present, both to shed light on history and to make predictions of future inductions. I've already covered the period 1912 to 1929 here on Hockey Historysis. And now we look at the years up to 1911.

This time period, which I'll call the Challenge Era, calls for a somewhat different approach than more recent times. There are no individual awards or All-Star teams to draw information from. Player career statistics are all but useless, in large part because careers were so much shorter in the 19th century, so that comparisons between early professional players in the oughts and senior players before 1900 are not terribly informative.

As it turns out, we don't really need all that, because we have the Stanley Cup. Take note that of all the Hall of Fame players who played before 1911, none of them began their careers before the 1892/93 season, the first that the Stanley Cup was awarded. The best pre-Stanley Cup players such as Tom Paton, Allan Cameron, James Stewart and Jack Campbell have not been honoured. For the 25 Hall of Famers from this era, approximately 60% of their total Inductinator scores is made up of Stanley Cup-related exploits.

Of the 26 Hall of Famers from this era (see below), only five did not win a Stanley Cup championship. So we start by giving skaters 10 points for each Cup championship, and goaltenders 25 per title. Captains of Stanley Cup teams get extra points; players who captained one such team (such as Graham Drinkwater and Tommy Phillips) get 10 points, and each additional Cup captaincy earns a whopping 70 points apiece. Mike Grant, Dickie Boon and Bruce Stuart were captains twice each, while Harvey Pulford was three times, which is enough by itself to get him over the minimum score of 100 for the Inductinator to see the player as being a Hall-of-Famer.

I should say at this point that for this era, the number of Cup championships a player has is not as straightforward as it is for later players. During the challenge era, there were often multiple Cup series played in a single season. The current champion could be called upon by the trustees to defend their title several times in the same season, even sometimes in the middle of a season. In 1908, for example, the Montreal Wanderers had to defend against challenges from the Ottawa Victorias in January, and both the Winnipeg Maple Leafs and Toronto HC in March. For purposes of the Inductinator, we do not count a successful defence of an existing title to be a Stanley Cup championship; it's only when a new champion results from a series that it's counted. The Wanderers don't get credit for three Cup championships for 1908, they get one.

Players winning the Cup with multiple teams get a bonus of 40 points. While this may not seem to sensible, there's not other way to explain how Tom Hooper is in the Hall of Fame. Bruce Stuart, Tommy Phillips and Fred Scanlan also get these points, but they'd have enough points otherwise to still meet the implicit standards. Cecil Blachford also won Cups with two teams, but these points aren't enough to get him to 100. Which is good, since he's not in the Hall of Fame.

Games played, and especially goals scored, in Stanley Cup matches contribute a lot of points the to the Challenge Era Inductinator. Players earn points if they participated in 10 or more Cup matches, and goaltenders earn more per game (since there's so little else to go on for them). A player who did not play in a single Stanley Cup match suffers a penalty of 20 points; otherwise, there would be no way to explain how Herb Jordan is not in the Hall.

But in terms of Challenge Era players being recognized by the Hall of Fame, it seems nothing is as important as scoring goals in Stanley Cup matches. Of the Inductinator scores for the Hall of Famers, a full 26% is earned by Stanley Cup goals alone. Every single player from this era that scored at least 14 goals in Stanley Cup matches is in the Hall of Fame. Fred Whitcroft, who did not play very much top-level hockey but scored 14 goals in eight Cup matches, is in. He gets 100 points for his Stanley Cup goals. He has to, since he did nothing else of note in his hockey career, and we want to explain his induction. Frank McGee scored 41 goals in Cup games, and that explains why he's on the top of the list below.

But wait, you might be aware that Frank "Pud" Glass won a bunch of Stanley Cups with the Wanderers, including one as captain, and scored 13 goals in those games. So how do we explain his exclusion from the Hall? Simple; we consider goals per game as well. Glass took 11 games to score his goals (1.18 per game), while Whitcroft (for example) scored 14 in eight (1.75 per game). Since Glass scored at a subpar rate (for a Hall-of-Famer, anyway), his total goals aren't valued as highly.

Other points are earned for having reasonably lengthy senior careers (important for Hod Stuart and Blair Russel), for playing with one team for at least nine years (again, important for Blair Russel), and for finishing in the top four in goals for a top-level league, or the top two in goals for a lower-level league. Russell Bowie makes out like a bandit in this last category, earning 380 of his 409 points here. He lead a top-tier league in goals five times, was second four times and third once. No one else comes close to that level of production in the Challenge Era.

Finally, we get to the more arbitrary stuff. Tragic deaths are treated favourably by the Hall of Fame. George Richardson was killed in WWI, and although this was after his playing career was done it seems he was more fondly remembered because of it, since otherwise we would not be able explain his induction in this analysis. Hod Stuart's death was all the more noteworthy, as he died in mid-career and as a Stanley Cup champion. Almost all of his Inductinator score (80 out of 102) is derived from this.

All of this so far can be used to explain 22 of the 26 Hall-of-Famers from this era. We're left with Graham Drinkwater, Billy Gilmour, Jack Ruttan and Oliver Seibert.

With Gilmour, one suspects that the true reason he was inducted is that at McGill he played with Frank Patrick, whose brother Lester was of course extremely influential at the time as a member of the selection committee. His Cup wins and goals give him 50 points, so we need another 50. We can attribute that to personal connections and give up, or we can look for something else he might have been famous for. Well, he is one of the very few sets of three brothers who each won a Stanley Cup, and he did it with his brothers (dave and Suddy) on the same team. So, we can give him 40 points for that feat, and an extra 10 for winning the most Cups amongst his set of brothers. It's not the worst thing to recognize such a thing, I suppose, if in fact that's what was being recognized by the committee.

Drinkwater is 40 points short. The only thing I could find to set him apart was the fact that he was one of the three original Allan Cup trustees in 1909, well after his playing career was over. If I were on the committee, I wouldn't assign any player value to this, and maybe they didn't. But it's the only thing I can think of to get him over 100 points. The two other trustees were Dr. H.B. Yates and Sir Edward Clouston. Clouston might also be eligible to collect these points. He never played for the Stanley Cup, of course, since he was 44 years old by the time that mug was first awarded. But Clouston was one of the "Original 18", the 18 men who played in the hockey match at the Victoria Skating Rink in Montreal on March 3, 1875. Clouston played with James Creighton's side, who won that match two goals to one.

Just to make sure we're not being completely arbitrary here, we should also check the original Stanley Cup trustees, to see if they would be put over 100 with this bonus. The two original Stanley Cup trustees were John Sweetland and Philip Ross. Sweetland played no high-level hockey that I'm aware of, but Ross did. He played for McGill in 1879, and later in Ottawa for the famous Rideau Rebels in 1890 and the Ottawa Generals (later the Senators) in 1891. But he never played for the Stanley Cup, so even if we gave him the same 40 points we give Drinkwater, he still wouldn't be over 100 on the Inductinator scale, so we're safe.

The induction of Jack Ruttan, I'll tell you right now, cannot be explained by the Inductinator. He played five seasons of senior hockey in Manitoba in the early 1910s, and won the Allan Cup in 1913. He was very well-regarded as a player in Manitoba, but his accomplishments do not outshine dozens of other players who are nowhere near the Hall. He's a complete and total question mark. I can't explain him, not even close.

Finally, Oliver Siebert. He was certainly a good player. He gains 45 points for leading a lesser league (Western Ontario League) in goals, but loses 20 for never having scored a Stanley Cup goal, for a total of 25. We need another 75 points. Now, there is something that sets Siebert apart from other players, which I suppose we can assign a value of 75 points, although doing so is incredibly silly. That this is this: he has a son (in Earl Seibert) who is a Hall of Fame-calibre player. Oliver was inducted in 1961, and Earl in 1963. We can technically use this to get the elder Seibert over 100 points, though I feel a bit dirty doing so. I suppose such a thing would increase a player's fame, since that's a pretty vague term. I did check other players as well, to make sure such a bonus would not any non-Hall-of-Famers over 100. The closest is goaltender Bert Lindsay, who played after the Challenge Era. Being Ted Lindsay's father is not enough to get him over the threshold, so we can award this bonus to Seibert without producing undesirable results.

PlayerPosHoF?Score
Frank McGeeFyes416
Russell BowieFyes409
Bruce StuartFyes350
Tommy PhillipsFyes332
Harvey PulfordDyes272
Marty WalshFyes250
Harry WestwickFyes231
Alf SmithFyes226
Harry TriheyFyes224
Dan BainFyes172
Riley HernGyes167
Fred ScanlanFyes159
Mike GrantDyes122
Tom HooperDyes105
Graham DrinkwaterDyes102
Hod StuartDyes102
George RichardsonFyes101
Bouse HuttonGyes100
Blair RusselFyes100
Dickie BoonDyes100
Billy McGimsieFyes100
Fred WhitcroftFyes100
Art FarrellFyes100
Oliver SeibertFyes100
Billy GilmourFyes100
Bill NicholsonGno99
Herb JordanFno99
Pud GlassFno97
Archie HooperFno96
Billy BreenFno96
Lorne CampbellFno92
Cecil BlachfordFno90
Fred HigginbothamDno90
Suddy GilmourFno89
Rod FlettDno87
Gordon LewisGno87
Billy RoxburghFno82
Eddie GerouxGno79
Herb BirminghamFno76
Art BrownGno74
James McKennaGno74
George McKayFno74
Tony GingrasFno73
Robert MacDougallFno73
Oren FroodFno72
Bruce RidpathFno68
Clare McKerrowFno66
Ezra DumartFno65
Jack RuttanDyes0

You can see that, by these standards, there are a number of other players who could just as easily be in the Hall of Fame. Bill Nicholson, Herb Jordan, Pud Glass, Archie Hooper, Billy Breen are all extremely close, and several others are over 90 points as well. Would we view these players any differently today if they had a few more breaks and were elected to the Hall of Fame? Perhaps, but we really shouldn't. The Inductinator analysis reveals that some Hall of Fame selections from this early era seems almost arbitrary, so I cannot recommend putting too much weight on the honour.
Hostgator promo codes