Applied Econometrics: When Can an Omitted Variable Invalidate a Regression?

Omitted variable bias is a fundamental regression concept that frequently arises in antitrust litigation. Every regression has omitted some variable. The relevant question is whether the omission generates bias that significantly compromises the reliability of the regression model.

The essence of regression analysis is to use variation in X (the independent variable) to explain variation in Y (the dependent variable). Intuitively, omitted variable bias occurs when the independent variable (the X) that we have included in our model picks up the effect of some other variable that we have omitted from the model. The reason for the bias is that we are attributing effects to X that should be attributed to the omitted variable. Specifically, if the omitted variable has an effect on the dependent variable (Y) and is correlated with the explanatory variable (X), the regression will mistakenly attribute the effects of the omitted variable to the explanatory variable, resulting in omitted variable bias. When an omitted variable is uncorrelated with X, then it generally does not present any problems.

It is easy to claim, in the abstract, that a regression has failed to account for some unspecified factor. Omitted variable bias is therefore most effective as a methodological critique when one can (1) identify a plausible candidate for the omitted variable; (2) predict the direction of the bias based on its expected correlation with X and Y; and (3) (ideally) demonstrate this effect empirically by controlling for the omitted variable, and showing that the results change substantially.

Consider an example of a horizontal price-fixing conspiracy in which the defendants allegedly entered into an agreement as of a certain date. Suppose that the plaintiffs present a regression indicating that prices increased by 30 percent on average after the start date of the alleged conspiracy, relative to beforehand. An economist for the defense might argue that the plaintiffs’ regression model suffers from omitted variable bias, because the plaintiffs’ economist neglected to control for changes in the defendants’ costs that took place around the time of the alleged conspiracy.

Note that the direction of the bias is important here, and depends critically on whether and how the omitted variable (cost) is correlated with the challenged conduct. If costs are positively correlated with the conduct, then the direction of the bias is positive, implying that the plaintiffs’ model has overstated the effect of the conspiracy on prices. This would imply that the plaintiffs’ regression was mistakenly attributing an observed price increase to the conspiracy, when in fact some or all of the increase was driven by higher costs. But if costs are negatively correlated with the conduct, then the direction of the bias is negative, implying that the plaintiffs’ model has understated the effect of the conspiracy on prices. That is, but for falling costs, the conspiracy would have driven prices still higher. Finally, if costs are uncorrelated with the conduct, then omitting them from the regression model does not bias the plaintiffs’ regression model.

The ideal solution would be to obtain cost data from the defendants, so that costs can be directly controlled for in the regression model. If the plaintiffs’ regression still detects a positive and significant effect of the conspiracy on prices, the defense can no longer plausibly argue that the plaintiffs’ estimate of the effect of the conduct is biased (unless some other omitted variable is identified). But if the plaintiffs’ regression no longer shows a significant effect of the conspiracy, then the plaintiffs cannot plausibly prove liability or claim damages based on their regression model. If suitable cost data are unavailable, other forms of record evidence (e.g., testimony from input suppliers) could be used to investigate the likely direction of the correlation between the omitted variable and the conduct, and thus, the likely direction of the bias.

Claims of omitted variable bias were raised by the defense in In re High-Tech Employee Antitrust Litigation. In that case, the plaintiffs alleged that top executives at some of Silicon Valley’s most prominent companies, including Apple, Google, Intel, and Adobe, conspired to restrict the recruiting and hiring of high-tech workers as a mechanism for suppressing compensation. To quantify this effect, the plaintiffs’ economist used an econometric model in which the dependent variable was real annual employee compensation, and the independent variable was a measure of the challenged conduct, calculated as the proportion of months within a given year during which a given employer was subject to one or more of the anti-solicitation agreements challenged by the plaintiffs. The results of the regression indicated that the compensation paid to class members was negatively related to the challenged conduct.

The defendants’ economists argued in the abstract that the plaintiffs’ regression model might suffer from omitted variable bias. By invoking omitted variable bias, the defense was asserting that the plaintiffs’ measure of the challenged conduct was correlated with some other variable, which the plaintiffs had omitted from their model, and that it was this omitted variable that was actually causing lower compensation to be paid to class members. As the court observed in its class certification order, the defense had failed to specify what the omitted variable might be, or to explain why excluding it from the model would have biased the plaintiffs’ regression in the manner claimed by defendants.

Both points are important. A plausible omitted variable is, first and foremost, something that affects the dependent variable. In this context, defendants’ experts would have had to offer up some factor that would be expected to have a significant effect on class member compensation, yet was not already controlled for in plaintiffs’ regression model. Second, one would have to be able to plausibly claim that the omitted variable had the correct correlation with the challenged conduct. Without a specified omitted variable, the court was unpersuaded that the alleged omission generated a bias that significantly compromised the reliability of the plaintiffs’ regression model.

Kevin W. Caves and Hal J. Singer have worked on antitrust issues in a number of industries, including issues involving regression analyses and omitted variable bias. This article is based on a paper published in theantitrustsource in December 2017.