*What follows is a post from my old hockey analysis site*

**puckerings.com***(later hockeythink.com). It is reproduced here for posterity; bear in mind this writing is over a decade old and I may not even agree with it myself anymore. This post was originally published on October 29, 2002.*

**Factors Affecting NHL Attendance***Copyright Iain Fyffe, 2002*

This paper builds upon the work of Wiedecke, who examined factors affecting NHL attendance using a multiple linear regression model. A summary of this work follows.

Data from the 1997/98 NHL season were used, giving 26 data observations. The dependent variable used was the percentage of capacity (called "Attendance Capacity"). That is, if a team averaged 15,000 fans in an arena with a capacity of 15,500, the team had an Attendance Capacity of 97% (15,000 divided by 15,500). The independent variables used were standings points, goals scored, and penalty minutes (which are all self-explanatory), and location (explained below).

Location for each team was assigned a value of 1, 2 or 3 based upon the team's geographic location. A value of 1 was assigned to the northernmost teams (Calgary, Edmonton, Montreal, Ottawa, Toronto and Vancouver). A value of 2 was assigned to Boston, Buffalo, Chicago, Colorado, Detroit, New Jersey, New York Islanders, New York Rangers, Philadelphia, Pittsburgh, and St. Louis. A value of 3 was assigned to the southernmost teams (Anaheim, Carolina, Dallas, Florida, Los Angeles, Phoenix, San Jose, Tampa Bay, and Washington.

(1) by incorporating a larger data set;

(2) by redefining the dependent variable; and

(3) by introducing a new indepdendent variable.

Rather than using only the 1997/98 season, I will use data from 1995/96, 1996/97, 1997/98, 1998/99, 1999/2000, 2000/01 and 2001/02, giving 193 data observations.

I will use average attendance as the dependent variable, rather than percentage of capacity. By using the percentage, a team which fills 14,800 of 15,000 seats (98.7%) is considered superior to a team which fills 19,700 of 20,000 seats (98.5%). This does not reflect reality well, as the second team draws a full 33% more fans.

The independent variable added is Novelty. A value of 5 is assigned to a team in its first year in the league (after either an expansion or franchise relocation), and this is reduced by one for each subsequent year in the league until it reaches 0. The purpose is to determine if new teams get an attendance boost simply by being new, as if often postulated. The four independent variables used by Wiedecke are also used.

**Variable Correlations**

A variable correlation analysis is performed to examine the data for possible cross-correlation effects. Only one pair of variables, goals and standings points, has a significant correlation (positive 0.64). Therefore if both goals and points are found to be significant, care must be taken in their interpretation due to cross-correlation. Other pairs with less-significant correlations are attendance and points (positive 0.39), attendance and location (negative 0.31), and location and novelty (positive 0.30).

The following table indicates the coefficients of correlation for all variables used: attendance (ATT), points in standings (PTS), goals scored (GF), penalty minutes (PIM), location (LOC) and novelty (NOV).

ATT | PTS | GF | PIM | LOC | NOV | |

ATT | - | .39 | .25 | -.04 | -.31 | -.17 |

PTS | .39 | - | .64 | -.28 | -.17 | -.19 |

GF | .25 | .64 | - | .10 | -.22 | -.17 |

PIM | -.04 | -.28 | .10 | - | .04 | -.01 |

LOC | -.31 | -.17 | -.22 | .04 | - | .30 |

NOV | -.17 | -.19 | -.17 | -.01 | .30 | - |

**Results of the Model**

The results of the multiple linear regression model are as follows.

Constant (y-intercept) | 13,326 |

Standard error of estimate | 2,071 |

R-squared | 0.223 |

Variable | Coefficient | St. error | t-stat |

PTS | 61.08 | 13.56 | 4.50 |

GF | -6.90 | 7.16 | -0.96 |

PIM | 0.80 | 0.61 | 1.31 |

LOC | -778.93 | 211.43 | -3.68 |

NOV | -47.92 | 119.85 | -0.40 |

**Discussion of Results**

The t-statistics of GF, PIM and NOV indicate there is little evidence that they affect attendance in any significant way. On the other hand, there is very strong evidence that PTS and LOC significantly affect attendance. These findings agree with Wiedecke.

Overall, the model is not extremely useful; the R-squared figure indicates only 22.3% of the variability in attendance is explained by the model. This may indicate there are other independent variables that should be considered.

The correlation between the two significant independent variables (PTS and LOC) is -0.17, indicating there is no significant cross-correlation effect.

**Interpretation**

According to the model, having a good team is the most significant factor affecting attendance.

*Ceteris paribus*, each additional standings point increases attendance by 61 fans per game. A 90-point team therefore has a 610-fan advantage in average attendance over an 80-point team.

The location coefficient indicates that the further south a team is, the worse its attendance is. All else being equal, a team in the southern US averages 1,558 fans less per game than a team in Canada. This is significant because the NHL's recent strategy has been to put as many teams in the southern US as possible, either through expansion or franchise relocations (including moving teams from Canada to the southern US). The results of this model suggest that this strategy is seriously flawed. In this case, analysis agrees with common sense: why are markets where there

*are*hockey fans ignored in favour of markets where there are

*no*hockey fans? At least the most recent expansion was more logical, and didn't put any more teams in the Sun Belt.

**Reference**

Wiedecke, Jennifer. 1999.

*Factors Affecting Attendance in the National Hockey League: A Multiple Regression Model.*Master's thesis, University of North Carolina, Chapel Hill.