The error term is the most important component of the classical linear regression model (CLRM). Most of the CLRM assumptions that allow econometricians to prove the desirable properties of the OLS estimators (the Gauss-Markov theorem) directly involve characteristics about the error term (or disturbances). One of the CLRM assumptions deals with the conditional variance of the error term; namely, that the variance of the error term is constant (homoskedastic).
Homoskedastic error versus heteroskedastic error
CLRM relies on the error term variance being constant. Enter the term homoskedasticity, which refers to a situation where the error has the same variance regardless of the value(s) taken by the independent variable(s). Econometricians usually express homoskedasticity as
where Xi represents a vector of values for each individual and for all the independent variables.
As you can see, when the error term is homoskedastic, the dispersion of the error remains the same over the range of observations and regardless of functional form.
In many situations, the error term doesn’t have a constant variance, leading to heteroskedasticity — when the variance of the error term changes in response to a change in the value(s) of the independent variable(s). Econometricians typically express heteroskedasticity as
If the error term is heteroskedastic, the dispersion of the error changes over the range of observations, as shown. The heteroskedasticity patterns depicted are only a couple among many possible patterns. Any error variance that doesn’t resemble that in the previous figure is likely to be heteroskedastic.
If you recall that homogeneous means uniform or identical, whereas heterogeneous is defined as assorted or different, you may have an easier time remembering the concept of heteroskedasticity forever. Lucky you!
The consequences of heteroskedasticity
Heteroskedasticity violates one of the CLRM assumptions. When an assumption of the CLRM is violated, the OLS estimators may no longer be BLUE (best linear unbiased estimators).
Specifically, in the presence of heteroskedasticity, the OLS estimators may not be efficient (achieve the smallest variance). In addition, the estimated standard errors of the coefficients will be biased, which results in unreliable hypothesis tests (t-statistics). The OLS estimates, however, remain unbiased.
Under the assumption of homoskedasticity, in a model with one independent variable
the variance of the estimated slope coefficient is
where
is the homoskedastic variance of the error and
However, without the homoskedasticity assumption, the variance of
where
is the heteroskedastic variance of the error.
Therefore, if you fail to appropriately account for heteroskedasticity in its presence, you improperly calculate the variances and standard errors of the coefficients. The t-statistic for coefficients is calculated with
Therefore, any bias in the calculation of the standard errors is passed on to your t-statistics and conclusions about statistical significance.
Heteroskedasticity is a common problem for OLS regression estimation, especially with cross-sectional and panel data. However, you usually have no way to know in advance if it’s going to be present, and theory is rarely useful in anticipating its presence.