Introduction
This paper seeks to make a contribution by exploring more closely an enigma that has equally frustrated both economists and the proletariat alike: What are the major determinates of points scored in a college football game? Further, on average, do the conventional methods and measurements matter? And finally, what are the implications of the results?
As much empirical analysis has been done on the economics of sports in general and baseball in particular (see Moneyball for baseball and The Blind Side – for football, both by Michael Lewis), much less has been done on football. This is due, in part, to cause and effect in football being much more difficult to assign than baseball. For example, in baseball, if one makes an error – blame is easily assigned, a home run or RBI is the same.
In football, however, when a quarterback throws an incomplete pass, the “error” is assigned to his passing percentage – while the actual cause of the incompletion could be a number of things: maybe the receiver dropped the pass, the receiver was tripped or fell, the pass was tipped at the line, or the quarterback was rushed by an ineffective offensive line. In short, the quarterback with the highest passing percentage is not necessarily the best passing quarterback.
Willoughby 2002 found that significant differences exist between factors that influence winning and scoring in Canadian football. For instance, interceptions matter more than fumble recoveries in determining winners and losers. One shortcoming of his analysis, as he notes, is that he is predicting only a binary outcome; a team can only win or lose. This fails to account for the magnitude in which a team wins or loses by.
Schwarz, Barsky 1977 did an extensive analysis of the “Home Advantage” on a variety of different sports. In regards to football they found that the advantage does exist in a significant way for collegiate football – but not professional football. This observation about college football is confirmed in my data below.
P. and S. Gray 1997 showed that most predictive models done in the world of organized sports betting had an overall tendency to overemphasize more recent performance while underestimating overall performance. Maybe a way to allow for this in the predictive model I have done below would be to use a larger average of data rather than only recent data. For instance, in predicting the scoring of an upcoming game it might be more useful to use the average yards passing, rushing, turnovers, and penalties of the entire season rather than the averages from the last two or three games.
Description of Data and Methods
Using a standard multiple linear regression technique (by way of OLS) I will analyze the major determinates of college football scores. For the sample, I have chosen all the games played by top 25 NCAA Division I teams during Saturday, 22 November 2008 (week 13) of the regular season. In this sample, 12 games were played by a total of 24 teams. The sample is small enough to be reported in its entirety and is as follows:*
TEAM
|
PTS
|
RUSH
|
PASS
|
TURNO
|
PENALT
|
H
|
Texas Tech
|
21
|
45
|
361
|
3
|
47
|
0
|
Oklahoma
|
65
|
299
|
326
|
1
|
96
|
1
|
Citadel
|
19
|
103
|
214
|
3
|
0
|
0
|
Florida
|
70
|
394
|
311
|
0
|
25
|
1
|
Brigham Y
|
24
|
214
|
205
|
6
|
85
|
0
|
Utah
|
48
|
108
|
307
|
0
|
51
|
1
|
Mich St
|
18
|
35
|
287
|
2
|
51
|
0
|
Penn St
|
49
|
138
|
419
|
0
|
38
|
1
|
Boise St
|
41
|
70
|
414
|
4
|
50
|
0
|
Nevada
|
34
|
144
|
241
|
0
|
25
|
1
|
Michig
|
7
|
111
|
87
|
2
|
15
|
0
|
Ohio St
|
42
|
232
|
184
|
1
|
20
|
1
|
USAF
|
10
|
150
|
11
|
0
|
23
|
0
|
TCU
|
44
|
183
|
321
|
0
|
30
|
1
|
Missippi
|
31
|
201
|
307
|
1
|
51
|
0
|
LSU
|
13
|
37
|
178
|
2
|
32
|
1
|
Pitt
|
21
|
35
|
229
|
2
|
50
|
0
|
Cincinnati
|
28
|
87
|
309
|
1
|
46
|
1
|
Orega St.
|
19
|
166
|
224
|
0
|
32
|
0
|
Arizona
|
17
|
139
|
158
|
0
|
30
|
1
|
NC State
|
41
|
187
|
279
|
0
|
55
|
0
|
N Carolia
|
10
|
56
|
147
|
6
|
20
|
1
|
FL State
|
37
|
172
|
160
|
0
|
40
|
0
|
Maryland
|
7
|
103
|
149
|
4
|
5
|
1
|
Where PTS = number of points scored, RUSH = number of rushing yards, PASS = number of passing yards, TURNO = number of turnovers, PENALT = number yards penalized, and H = home game. H is obviously used as a dummy variable. A cannot be used as a dummy variable for Away games because then H and A become perfectly correlated and the regression leads to a biased result. All of this gives a regression form of:
PTS=B0+B1(RUSH)+B2(PASS)+B3(TURNO)+B4(PENALT)+B5(H)
It is hypothesized from a priori analysis and common sense that RUSH, PASS, and H will all have positive coefficients, while TURNO, PENALT, and A will each be negative.
Further, as noted by the p-values reported, RUSH and PASS are both statistically significant while TURNO, PENALT, and H are insignificant at the 5% level of significance. More specifically the conclusions of the coefficients of each parameter tell us the following: 100 rushing yards in a game will on average add 10.9 points to a team’s final score.100 passing yards in a game will on average add 9.1 points to a team’s final score. One turnover has the effect of subtracting 1.5 points from the final score. Receiving a 10-yard penalty has the effect of adding 0.8 points to a team’s final score. And finally, a team playing at home results in about a 5 point advantage.
Conversely, a team playing an away game will begin the game, in effect, with a 5.06 point deficit. To better illustrate the descriptive ability of each parameter, each has been graphically regressed against PTS. These graphs are found in the Appendix and confirm the results above. The intercept has no explanatory value, as it is negative, and is said to be outside the scope of the model. Intuitively, a game with no passing, rushing, turnovers, penalties, or place to play should have a zero score – as a regression through the origin would. The largest issue, as seen when calculated, is that the standard errors of the no intercept model increase in the case of each parameter.
Taking a look at heteroskedasticity, by conducting the Breusch-Pagan test, a critical value of 30.14 and a calculated value of 15.78 allows us with confidence to “Fail to reject the null,” that residuals are homoskedastic. Using the White test, a critical value of 16.2 and calculated value of 4.98 allow us to strongly “Fail to reject the null,” that the residuals are homoskedastic. Confirmed by both tests, the presence of heteroskedasticity is not an issue.
Able to be written linearly in parameters, assuming the draw was indeed a random sample, the variables not being exactly linearly related, the error term having an expected value of zero, and the error term’s variance not being dependent on the values of any explanatory variables allows us to say the model fulfills the requirement of the Gauss-Markov Theorem and is the “best, linear, unbiased, estimator.”
Conclusion
What can be taken from all of this? As shown somewhat in the results section, on average putting rushing, and passing yards, on the board leads to more points while committing a turnover is almost like giving the other team a free safety. However, on average, the data says that a 10-yard penalty will add about 0.8 points to your score.
One possible explanation for the PENALT coefficient having a positive value (other than the sample not being truly random – or large enough) is that teams that get more penalties are more aggressive – and the team that plays the most aggressive usually scores more points: Penalties could be correlated with effort and effort is correlated with scoring. Finally, the best way for a team to score nearly a touchdown with zero effort is to play the game in their home stadium. More than doubling the sample size by adding out of sample observations to the data set shows that the coefficients for RUSH and PASS remain virtually unchanged while the coefficients for TURNO changes to -1.9, PENALT changes to .01, and H moves closer to 3. These parameters are probably closer to their true values as estimates can always be made more accurate with larger sample sizes – the limited sample size provided in this particular study is constrained by time.
Further research could be done – and some of it has been done – on many different aspects of football in general and determinates of football scoring in particular. For example: Does the type of turnover count (interception vs. fumble)? – see Willoughby 2002. Does the weight of the offensive line have an effect on scoring/winning?… Does the defensive line? How does the average 40 yard dash time of the starting receivers effect points scored?…
The average 40 yard dash time of the starting safety. How does the coach’s win percentage play into the analysis? Is the number of third-down conversions made in a game highly descriptive of the final score? And from Michael Lewis in The Blind Side: How is total points scored related to the time the quarterback is allowed to stand in the pocket? How is the efficiency of an offense related to a weighted score (weight, speed, strength) of the starting left tackle?… The right defensive end? …The entire offensive line?…The entire defensive line?
All of these questions are entertaining to think about and would be immensely fun to research and write about. Is it too late to switch from Ag. to Sports economics?
Appendix
References
Gray, Phillip and Gray, Stephen.(1997). Testing Market Efficiency: Evidence from the NFL Sports Betting Market.The Journal of Finance, 52(4), 1725 – 1737.
Lewis, Michael. (2003). Moneyball. W. W. Norton & Company.
Lewis, Michael. (2007). The Blind Side: Evolution of a Game. W. W. Norton & Company.
Click here to read Part 2: College Football Winners: A Binary Case Of Winning And Losing