Friday, 26 September 2014

Puckerings archive: Goal and Assist Z-Scores (04 Jul 2002)

What follows is a post from my old hockey analysis site puckerings.com (later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on July 4, 2002.


Goal and Assist Z-scores
Copyright Iain Fyffe, 2002


Methods have been developed in the past to identify dominant single-season performances. For example, some years ago I developed something I called goal-scoring dominance, which was calculated as the leading player's goals-per-game average divided by the second-leading player's goals-per-game average. A similar calculation was made for assists. I later discovered that Klein and Reif had developed the very same method years before, calling it Quality of Victory.

But this method suffers from a serious flaw. What if two players have outstanding seasons? The Quality of Victory formula will show that no one performed in a dominant manner, because the second-leading player's average is so high. This is not fair, nor is it accurate.

Goal z-scores (GZ) and assist z-scores (AZ) were designed to resolve this problem. It was hoped that they would not create any new problems; unfortunately this is not the case (more on this later). What we do is compare a player's performance to two things: the average individual player performance that year (in terms of goals per game or assists per game), and the degree of variation in individual player performance that year. Standard deviation is a way to measure this variability. For instance, the sets {1,2,3,4,5} and {0,1,3,5,6} have the same mean (5.0), but the second set has more variation, and therefore a higher standard deviation (2.5, compared to 1.6 for the first set). A z-score is simply the number of standard deviations an observation is above the mean (or below the mean in the case of a negative z-score). So, if we have a set of numbers whose mean is 5 and whose standard deviation is 3, then an observation of 8 would have a z-score of 1 ((8 - 5)/3). It's that simple.

In a normal distribution of events, about two-thirds of all observations will fall within one standard deviation of the mean (i.e., have a z-score between -1 and 1). 95% of observations will be within two standard deviations (z-scores between -2 and 2), and almost all will be within three standard deviations (z-scores between -3 and 3). Using z-scores we can determine how outstanding an individual performance was. For instance, only an outsanding season would produce a z-score of 3 or more.

That was the set-up. As it turns out, the results of this study are not that interesting; but what the results indicate may be of interest. The problem with the z-scores is that the top seasons of all time are dominated by recent players. For instance, in the top 40 GZ seasons, we have 5 from the 2000's (in only three years), 17 from the 1990's, 10 from the 1980's, four from the 1970's, and two each from the 1960's (Bobby Hull) and the 1930's (Charlie Conacher). So really is shows only the best of recent seasons. The assist results were predictable; Gretzky has the top 10 almost to himself, with Lemieux following. The top goal results are interesting enough to note (minimum 20 games played):

 Rank/Player  Year  GP  GZS
 1. Brett Hull  1991  78  5.95
 2. Wayne Gretzky  1984  74  5.86
 3. Mario Lemieux  1993  60  5.82
 4. Cam Neely  1994  49  5.80
 T5. Mario Lemieux  1989  76  5.64
 T5. Mario Lemieux  1996  70  5.64

So Brett Hull's 1991 campaign, while technically falling short of Gretzky's goal record, is actually more impressive than any of Gretzky's goal-scoring seasons by this analysis. But the real king of the list is Lemieux. In addition to spots 3, 5, and 6, he holds down numbers 11, 17, and 30 on the top 40 (as well as #41). Gretzky has #2, 12, 34 and 38. No contest.

But as I said, the results aren't overly interesting, because they are dominated by recent players. But the fact that recent players dominate is in itself interesting. It indicates that modern players are able to dominate the average players by a larger degree than older players. The cause of this is unclear, as it can be affected by the performance of the top players, as well as what constitutes an "average" player. But it's interesting because it's the exact opposite of what has happened in baseball, where the degree of domination by the top players has decreased over time, rather than increased. Food for thought.

No comments:

Post a Comment

Hostgator promo codes