All other things being equal: why is this so?

When you are working with data, you will often hear the expression "all other things being equal". This expression is used to say that you are looking at the effect of one variable on another, while keeping all other variables constant. For example, if you want to know the effect of age on the probability of having cancer, you will want to keep all other variables constant. This is because if you don't, you will not be able to know if the effect you are seeing is due to age or to another variable.

For example, if you don't keep all other variables constant, you might see that the probability of having cancer increases with age. But this might be because older people are more likely to smoke, and smoking is a risk factor for cancer.

What is the "all other things being equal" expression?

The "all other things being equal" expression is used to say that you are looking at the effect of one variable on another, while keeping all other variables constant. For example, if you want to know the effect of age on the probability of having cancer, you will want to keep all other variables constant. This is because if you don't, you will not be able to know if the effect you are seeing is due to age or to another variable.

Even if this concept might feel pretty intuitive, there is a formal theorem that explains in more depth how it works, called the Frisch-Waugh-Lovell theorem. In this post, we will explain in details this expression, this theorem and make the proof of it.

Impact on interpretation of the coefficients

Once you have run a regression, you have a model that looks like this: $$\hat y = \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 x_2 + \hat\beta_3 x_3$$ where $\hat y$ is the predicted value $x_i$ is the value of the $i$-th variable and $\hat\beta_i$ is the estimated coefficient of the $i$-th variable. The interpretation of the coefficients is the following: if you increase $x_i$ by 1, then $\hat y$ will increase by $\hat\beta_i$, all other things being equal. This interpretation is based on the assumption that all other variables are constant.

Frisch-Waugh-Lovell theorem

Main concepts

The Frisch-Waugh-Lovell (FWL) theorem allows us to reduce a multivariate regression analysis to an univariate one. The main idea behind is the fact that there are mutliple ways to estimate a $\beta_1$ coefficient in the following regression model: $$y = \beta_1x_1 + \beta_2x_2 + \varepsilon$$ The main idea behind this is to use the residuals of the regression of $x_1$ on $x_2$ as the regressor of $y$.

The theorem says that all of these are equivalent for estimate $\beta_1$:

Estimation by regressing $y$ on $x1$ and $x2$
Estimation by regressing $y$ on the residual from the regression of $x1$ on $x2$, generally called orthogonalization or residualization
Estimation by regressing the residual from the regression of $y$ on $x2$ on the residual from the regression of $x1$ on $x2$

Proof

Before going any further, you might want to check an excellent article called Why Linear Regression is a Projection. This article will give you the taste of why we can talk about linear regression with projections and some linear algebra tools, very useful for the following demonstration.

We assume the following regression: $$\hat y = X\hat\beta_1 + Z\hat\beta_2 + r$$ Where:

$\hat y$ is the continuous variable we want to predict
$X$ and $Z$ are our predictors (vector or matrix, no differences)
$\hat\beta_1$ and $\hat\beta_2$ our estimation using least squared ordinary (LSO) method
$r$ the residuals of the estimation

Once this simple regression is posed, let's recall some useful properties:

If there is no collinarity between our explanatory variables, the best fit to the least squares problem is unique
Any matrix of variables can be split in its projection. For our regression above, it means that we can re-write $Z = P_XZ + M_XZ$, where $P_X$ is the projection to $X$ and $M_X$ the residual makers for $X$. Also, we have: $M_X = I - P_X$
A regression on orthogonal sets of regressors can be done on each set at a time while still getting the coefficients from the joint regression

In order to go from $\hat{\beta} = (X^TX)^{-1}X^Ty$ (the OLS estimator) into the projection $\hat{y}$, we only have to multiple it by $X$, which gives us: $$X\hat\beta = X(X^TX)^{-1}X^Ty = P_xy =\hat{y}$$ $$P_x = X(X^TX)^{-1}X^T$$ Now, let's say we have the two following estimations: $$y = X\hat{\beta_1} + Z\hat{\beta_2} + r_1$$ $$M_Z y = M_Z X\hat{\beta_3} + r_2$$ $$M_Z = I - P_Z = I - Z(Z^TZ)^{-1}Z^T = residual~makers$$

What we want to prove with this theorem is: $$\hat{\beta_1} = \hat{\beta_3}$$

We re-write our first estimation by multiplying it by $M_Z$ $$M_Z y = M_Z X\hat{\beta_1} + M_Z X\hat{\beta_2} + M_Z r_1$$ With:

$M_ZZ\hat{\beta_2} = (I - Z(Z^TZ)^{-1}Z^T)Z\hat{\beta_2} = Z\hat{\beta_2} - Z(Z^TZ)^{-1}Z^TZ\hat{\beta_2} = Z\hat{\beta_2} - Z\hat{\beta_2} = 0$
$M_Zr_1 = (I - Z(Z^TZ)^{-1}Z^T)r_1 = r_1 - Z(Z^TZ)^{-1}Z^Tr_1 = r_1 - ZZ^{-1}(Z^T)^{-1} Z^Tr_1 = r_1 - 0 = r_1$

So we now have: $$M_Zy = M_ZX\hat{\beta_1} + r_1$$

We now multiply it by $X^T$ $$X^TM_Zy = X^TM_ZX\hat{\beta_1} + X^Tr_1$$ With:

$X^Tr_1 = X^T(y - X\hat{\beta_1} - Z\hat{\beta_2}) = X^Ty - X^T(X\hat{\beta_1} + Z\hat{\beta_2}) = X^Ty - X^Ty = 0$

So we now have: $$X^TM_Zy = X^TM_ZX\hat{\beta_1}$$

We multiple the second estimation by $(M_ZX)^T$ $$(M_ZX)^TM_Zy = X^TM_Z^TM_Zy = X^TM_Z^T(M_Z X\hat{\beta_3} + r_2)$$ $$= X^TM_Z^TM_ZX\hat{\beta_3} + X^TM_Z^Tr_2$$ With:

$X^TM_Z^Tr_2 = X^TM_Z^T(M_Z y - M_Z X\hat{\beta_3}) = X^TM_Z^TM_Z (y - X\hat{\beta_3}) = 0$

So we have: $$X^T M_Zy = X^T M_Z X \hat{\beta_3}$$

Conclusion of this proof

We proved that $X^TM_Zy = X^TM_ZX\hat{\beta_1}$.

We also proved that $X^TM_Zy = X^TM_ZX\hat{\beta_3}$.

So we can conclude that $y = X_1\hat{\beta_1} = X_1\hat{\beta_3}$ and $\hat{\beta_1} = \hat{\beta_3}$.

As you have seen, the demonstration goes in all directions and is not easy to follow if you don't do the calculations yourself. It's probably not useful to master this demonstration, but it's good to manipulate the equations a little bit to improve your intuition. Hope this has helped you if you are trying to understand this theorem better.

Going further

In this post, we saw that the "all other things being equal" expression is used to say that you are looking at the effect of one variable on another, while keeping all other variables constant. We also saw that this expression has an impact on the interpretation of the coefficients. Finally, we saw that the Frisch-Waugh-Lovell theorem is a theorem that allows you to get the same results as if you had run a regression with all the variables, while only running a regression with one variable.

If you want to go further, check:

This post is the work of Joseph Barbier and Thomas Salanova. If you have any questions, feel free to contact us!