 Data for Testing Standard Error Estimation Programs

To test the different programs, I created a test data set. The data set contains four variables: a firm identifier (firmid), a time variable (year), the independent variable (x), and the dependent variable (y). The residual and the independent variable both contain a firm effect, but no year effect. Thus the standard errors clustered by firm are different from the OLS standard errors (and the standard errors clustered by firm and year are different than the standard errors clustered by year). I have posted this data set as a text file and as a Stata data set. The results of running the OLS regression with OLS standard errors, White standard errors and clustered standard errors ï¿½ as well as Fama-MacBeth coefficients and standard errors are reported below.

### OLS Coefficients and Standard Errors

 Variable Coefficient Standard Error T-statistic Constant 0.0297 0.0284 ï¿½ 1.05 X 1.0348 0.0286 36.20 R2 = 0.2078

### OLS Coefficients and White Standard Errors

 Variable Coefficient Standard Error T-statistic Constant 0.0297 0.0284 ï¿½ 1.05 X 1.0348 0.0284 36.44 R2 = 0.2078

### OLS Coefficients and Standard Errors Clustered by Firm

 Variable Coefficient Standard Error T-statistic Constant 0.0297 0.0670 0.44 X 1.0348 0.0506 20.45 R2 = 0.2078

### OLS Coefficients and Standard Errors Clustered by Year

 Variable Coefficient Standard Error T-statistic Constant 0.0297 0.0234 1.27 X 1.0348 0.0334 30.99 R2 = 0.2078

### OLS Coefficients and Standard Errors Clustered by Firm and Year

 Variable Coefficient Standard Error T-statistic Constant 0.0297 0.0651 0.46 X 1.0348 0.0536 19.32 R2 = 0.2078

### Fama-MacBeth Coefficients and Standard Errors

 Variable Coefficient Standard Error T-statistic Constant 0.0313 0.0234 1.34 X 1.0356 0.0333 31.06 R2 = 0.2078

## Clustering in Multiple Dimensions in SAS

In SAS you can specify multiple variables in the cluster statement. For example, you could put both firm and year as the cluster variables. Using the test data set, I ran the regression in SAS and put both the firm identifier (firmid) and the time identifier (year) in the cluster statement. The SAS commands are:

proc surveyreg data=mydata;
cluster firmid year;
model y  = x ;

The results are:

 Variable Coefficient Standard Error T-statistic Constant 0.0297 0.0284 1.05 X 1.0348 0.0284 36.44 R2 = 0.2078

These are White standard errors, not standard errors clustered by both firm and time. To see this, compare these results to the results above for White standard errors and standard errors clustered by firm and year. The reason is when you tell SAS to cluster by firmid and year it allows observations with the same firmid and and the same year to be correlated. Since there is only one observation for each firm year in the sample, this assumes all residuals are uncorrelated (SAS assumes there are 5,000 clusters). In my paper, in Thompson (2006) and in Cameron, Gelbach and Miller (2006),  when we discussed clustering by firm and year, this allows the residuals of observations from the same firm or the same year to be correlated.