Standard Error Programming Advice -- Test Data Set

Kellogg_logo_home

Data for Testing Standard Error Estimation Programs

To test the different programs, I created a test data set. The data set contains four variables: a firm identifier (firmid), a time variable (year), the independent variable (x), and the dependent variable (y). The residual and the independent variable both contain a firm effect, but no year effect. Thus the standard errors clustered by firm are different from the OLS standard errors (and the standard errors clustered by firm and year are different than the standard errors clustered by year). I have posted this data set as a text file and as a Stata data set. The results of running the OLS regression with OLS standard errors, White standard errors and clustered standard errors – as well as Fama-MacBeth coefficients and standard errors are reported below.

OLS Coefficients and Standard Errors

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0297	0.0284	1.05
X	1.0348	0.0286	36.20
			R² = 0.2078

OLS Coefficients and White Standard Errors

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0297	0.0284	1.05
X	1.0348	0.0284	36.44
			R² = 0.2078

OLS Coefficients and Standard Errors Clustered by Firm

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0297	0.0670	0.44
X	1.0348	0.0506	20.45
			R² = 0.2078

OLS Coefficients and Standard Errors Clustered by Year

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0297	0.0234	1.27
X	1.0348	0.0334	30.99
			R² = 0.2078

OLS Coefficients and Standard Errors Clustered by Firm and Year

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0297	0.0651	0.46
X	1.0348	0.0536	19.32
			R² = 0.2078

Fama-MacBeth Coefficients and Standard Errors

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0313	0.0234	1.34
X	1.0356	0.0333	31.06
			R² = 0.2078

Clustering in Multiple Dimensions in SAS

In SAS you can specify multiple variables in the cluster statement. For example, you could put both firm and year as the cluster variables. Using the test data set, I ran the regression in SAS and put both the firm identifier (firmid) and the time identifier (year) in the cluster statement. The SAS commands are:

    proc surveyreg data=mydata;
            cluster firmid year;
            model y = x ;

The results are:

Variable	Coefficient	Standard Error	T-statistic
Constant	0.0297	0.0284	1.05
X	1.0348	0.0284	36.44
			R² = 0.2078

These are White standard errors, not standard errors clustered by both firm and time. To see this, compare these results to the results above for White standard errors and standard errors clustered by firm and year. The reason is when you tell SAS to cluster by firmid and year it allows observations with the same firmid and and the same year to be correlated. Since there is only one observation for each firm year in the sample, this assumes all residuals are uncorrelated (SAS assumes there are 5,000 clusters). In my paper, in Thompson (2006) and in Cameron, Gelbach and Miller (2006), when we discussed clustering by firm and year, this allows the residuals of observations from the same firm or the same year to be correlated.