SW Chapter 12
Spring 2026
Textbook Coverage
Three threats to internal validity all produce endogeneity—the regression \(X_i\) is correlated with the error term. That is, \(E(u_i | X_i) \neq 0\):
Solutions to endogeneity considered previously:
Today: Instrumental variables (IV)
\[\log(wage_i) = \beta_0 + \beta_1 schooling_i + \delta V_i + u_i\]
\[\log(wage_i) = \beta_0 + \beta_1 schooling_i + \delta V_i + u_i\]
\[\log(wage_i) = \beta_0 + \beta_1 schooling_i + \delta V_i + a_i + u_i\]
Bottom Line
Neither multiple regression nor panel data convincingly addresses the ability bias in estimating returns to schooling. We need a different approach.
To understand how IV solves this, let’s visualize the problem using causal diagrams from Ch 8b.

Gray = unobserved
From Ch 8b, we know this DAG:
The backdoor path is open and Ability is unobserved.

Gold = instrument, Gray = unobserved
Add an instrument \(Z\) to the DAG. A valid instrument must satisfy:
Key Insight
IV works by isolating the variation in Education that comes only from \(Z\). Since \(Z\) has no connection to Ability and no direct effect on Earnings, this variation is “clean” — free of confounding.
Broad public policy interest in reducing cigarette consumption.
\[sales_i = \beta_0 + \beta_1 price_i + u_i\]
The identification problem: When both supply and demand shift, equilibrium data trace out neither curve — just a scatter of equilibrium points.
We need something that shifts supply without shifting demand.

If we can hold demand fixed and only observe supply shifting:

\[sales_i = 164.4 - 0.38 \cdot price_i + u_i\]
Simultaneous causality would disappear if we could randomly assign prices to the different states
We introduced these conditions visually using DAGs. Now let’s state them formally.
An instrumental variable is an additional variable \(Z_i\) that satisfies three assumptions:
Two vs. Three Assumptions
Some textbooks combine (2) and (3) into a single validity or exogeneity condition: \(\text{Corr}(Z, u) = 0\). Others (including SW) separate them: exogeneity (Z uncorrelated with \(u\)) and exclusion (Z affects Y only through X). The practical question is the same either way: can Z affect Y through any channel other than X?
With a binary instrument \(Z_i \in \{0, 1\}\), there is a simple way to see how IV works:
First stage: How does \(Z\) move \(X\)?
\[\text{First stage} = E[X_i \mid Z_i = 1] - E[X_i \mid Z_i = 0]\]
Reduced form: How does \(Z\) move \(Y\)?
\[\text{Reduced form} = E[Y_i \mid Z_i = 1] - E[Y_i \mid Z_i = 0]\]
\[\boxed{\hat{\beta}_1^{Wald} = \frac{\text{Reduced form}}{\text{First stage}}}\]
Intuition
The reduced form tells us how the instrument changes the outcome. The first stage tells us how the instrument changes treatment. Their ratio tells us the effect of treatment induced by the instrument.
Start with the population model:
\[Y = \beta_0 + \beta_1 X + u\]
Take the covariance of both sides with \(Z\):
\[\text{Cov}(Y, Z) = \text{Cov}(\beta_0 + \beta_1 X + u, Z)\] \[= \text{Cov}(\beta_0, Z) + \text{Cov}(\beta_1 X, Z) + \text{Cov}(u, Z)\] \[= 0 + \beta_1 \text{Cov}(X, Z) + 0 \quad \text{by } \text{Cov}(u, Z) = 0\]
Solving for \(\beta_1\):
\[\boxed{\beta_1 = \frac{\text{Cov}(Y, Z)}{\text{Cov}(X, Z)}}\]
\[\beta_1 = \frac{\text{Cov}(Y, Z)}{\text{Cov}(X, Z)}\]
Key Feature of IV
Notice we never assumed \(\text{Cov}(X_i, u_i) = 0\). IV explicitly allows for endogeneity — that’s the whole point!
\[\beta_1 = \frac{\text{Cov}(Y, Z)}{\text{Cov}(X, Z)}\]
Back to our wages/schooling example. Instrument: distance from childhood home to nearest college.
\[\beta_1 = \frac{\text{Cov}(\log \text{wage}, \text{distance})}{\text{Cov}(\text{schooling}, \text{distance})}\]
Key Feature
\(\text{Cov}(X, Y)\) does not appear in the formula — we never directly compare someone’s wage to their schooling. Instead, we use only the variation in schooling driven by distance.
For a dataset with \(n\) observations, replace population covariances with sample covariances:
\[\hat{\beta}_1^{2SLS} = \frac{\widehat{\text{Cov}}(Y, Z)}{\widehat{\text{Cov}}(X, Z)} = \frac{\frac{1}{n}\sum_{i=1}^{n}(Y_i - \bar{Y})(Z_i - \bar{Z})}{\frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})(Z_i - \bar{Z})}\]
\[\hat{\beta}_0^{2SLS} = \bar{Y} - \hat{\beta}_1^{2SLS}\bar{X}\]
Called two-stage least squares — why will become apparent shortly.
Stage 1: Regress \(X\) on \(Z\) (the “first stage”)
\[X_i = \pi_0 + \pi_1 Z_i + v_i\]
Form the predicted values:
\[\hat{X}_i = \hat{\pi}_0 + \hat{\pi}_1 Z_i\]
Note: \(X_i = \color{green}{\hat{X}_i} + \color{red}{\hat{v}_i}\)
Stage 2: Regress \(Y\) on \(\hat{X}\) (the “second stage”)
\[Y_i = \beta_0 + \beta_1 \hat{X}_i + u_i\]
Key Insight
The instrument isolates the variation in \(X\) that is exogenous. 2SLS uses only this “clean” variation to estimate \(\beta_1\).
IV uses only the variation in \(X\) driven by the instrument — that’s the whole point.
But this also means we can only observe the effect among people for whom the instrument actually affects their treatment.
Local Average Treatment Effect (LATE)
The IV estimate is local to the people whose treatment status is changed by the instrument, weighted by how much the instrument affects them. It is not the average effect for all units, or even for all treated units.
Implication: The IV estimate may not represent the average effect for everyone — or even for those actually treated.
Practical Implication
When interpreting IV results, ask: For whom does the instrument induce treatment? The IV estimate applies to that group — not necessarily to all units in the sample.
\[sales_i = \beta_0 + \beta_1 price_i + u_i\]
Stata’s ivregress command handles both stages automatically:
ivregress 2sls Y (X=Z), robust
2sls)=)=)robust = heteroskedasticity-robust standard errors. ivregress 2sls packpc (avgprs=tax), robust
Instrumental variables (2SLS) regression Number of obs = 96
Wald chi2(1) = 88.46
Prob > chi2 = 0.0000
R-squared = 0.4219
Root MSE = 19.567
------------------------------------------------------------------------------
| Robust
packpc | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
avgprs | -.4208748 .0447474 -9.41 0.000 -.5085781 -.3331715
_cons | 169.556 7.516482 22.56 0.000 154.824 184.2881
------------------------------------------------------------------------------
Instrumented: avgprs
Instruments: tax2SLS estimates that a 1-unit increase in price leads to a decrease of 0.42 packs per capita.
firstivregress 2sls packpc (avgprs=tax), robust first
The first option displays the first-stage regression alongside the second stage.

Rule of Thumb: F > 10
A common rule of thumb is that the first-stage F-statistic should be greater than 10. If so, instruments are likely powerful (relevant). This threshold is a guide, not a bright line.
What if the first-stage F-test is less than 10?
\[\hat{\beta}_1^{2SLS} = \frac{\widehat{\text{Cov}}(Y, Z)}{\widehat{\text{Cov}}(X, Z)}\]
Consequences of Weak Instruments
If the first-stage F-statistic is less than 10, the 2SLS estimates may be biased and confidence intervals may have incorrect coverage. Do not trust the results.
| Assumption | Testable? | How? |
|---|---|---|
| Relevance (\(\text{Corr}(Z, X) \neq 0\)) | Yes | First-stage F-test (rule of thumb: > 10) |
| Exogeneity (\(\text{Corr}(Z, u) = 0\)) | No | Must argue based on institutional knowledge |
| Exclusion (\(Z\) doesn’t directly cause \(Y\)) | No | Must argue based on theory |
Exogeneity and Exclusion Cannot Be Tested
You lack data on the omitted variable, so you cannot test whether \(Z\) is correlated with it. Both exogeneity and exclusion must be defended with reasoning — relevance is the only one with a formal test.
\[sales_i = \beta_0 + \beta_1 price_i + \beta_2 income_i + u_i\]
ivregress 2sls packpc (avgprs=tax) incomepop, robust first

tax as the excluded instrumentTerminology
\[sales_{it} = \alpha_i + \lambda_t + \beta_1 price_{it} + \beta_2 income_{it} + \delta V_i + \omega_{it}\]
Instrument for price is sales tax (taxs) — the panel-data sales-tax variable; distinct from the per-pack excise tax used earlier.

Best Elasticity Estimate
Column 7 adds logs so \(\beta_1\) is an elasticity: when price rises 1%, sales fall 1.3%. The CI of \((-1.53, -1.00)\) lies entirely below \(-1\), so we can reject that demand is inelastic.
What Changes Across Specifications?
Adding state + year FEs and an IV for price (sales tax) moves the coefficient from OLS’s biased estimate toward a credible causal effect.
We’ve been using wages/schooling throughout. We used distance to college as one instrument. Here’s another famous approach to the same question:
Check the Three Assumptions
This instrument has been extensively debated — it illustrates how the assumptions can be challenged even when the research design seems clever.
Does quarter of birth actually predict education?

Born in Q1 → significantly less education. F-tests are strong for total years.

Mean log weekly earnings by quarter of birth (1980 Census). The sawtooth pattern mirrors the education pattern — Q1 births earn less.

OLS and TSLS estimates of returns to education, men born 1920–1929 (1970 Census, \(n = 247{,}199\)). Instruments: quarter-of-birth × year-of-birth interactions.
Puzzle: IV > OLS?
If ability bias is the main problem, we’d expect OLS to overestimate the return to schooling — so IV should be smaller than OLS. But here IV is larger. Two possible reasons:
Also note: IV standard errors are much larger than OLS — we’re using less variation, so estimates are noisier.
Core insight: When the first stage is weak, even a tiny direct effect of \(Z\) on \(Y\) biases IV more than OLS.
\[\frac{\text{plim}\,\hat\beta_{IV} - \beta}{\text{plim}\,\hat\beta_{OLS} - \beta} \;=\; \frac{\rho_{Z,\varepsilon}/\rho_{X,\varepsilon}}{\rho_{X,Z}}\]
Does QOB have a direct effect on wages? BJB’s quantitative argument:
BJB’s conclusion
Differences in family income at birth “would seem to account for virtually all of the association between quarter of birth and wages.”
Plus other QOB correlates (school attendance, behavioral problems, reading/writing/math, IQ) — any could open a direct \(Z \to Y\) channel.
Even a legitimate IV is biased toward OLS in finite samples. The magnitude is approximately \(1/F\) — proportional to the first-stage F-stat on the excluded instruments.
For AK’s Q1 \(\times\) year-of-birth specification with within-year age controls (28 excluded instruments): \(F = 1.6\), partial \(R^2 = 0.014\%\). Quantitatively important bias despite \(n = 329{,}509\).
BJB’s smoking gun: replaced AK’s real QOB with randomly-generated “QOB” and re-ran the 2SLS.
| Real QOB | Random noise | |
|---|---|---|
| Mean 2SLS coef. on educ | 0.060 | 0.061 |
| Mean SE | 0.029 | 0.039 |
| First-stage \(F\) | 1.61 | \(\approx 1\) |
| OLS coef (for reference) | 0.063 | 0.063 |
The lesson
Even with purely random “instruments,” the 2SLS output looks reasonable — point estimate near OLS, plausible SEs. You cannot detect the problem from second-stage coefficients and standard errors alone. Only the first-stage \(F\) reveals it. Always report it.
AK’s preferred specification used up to 180 excluded instruments: QOB (3) \(\times\) year of birth (10) \(\times\) state of birth (50) interactions.
The logic (reasonable): Compulsory-attendance laws vary across states (different age cutoffs, different enforcement). So QOB \(\times\) state interactions capture real institutional variation that plain QOB misses — and should also tighten the standard errors.
Why it backfires (BJB’s critique):
Moral
You can’t instrument your way out of a weak first stage by piling on interactions. Quality of instrument variation matters more than quantity of instruments.
Recipe Card: Instrumental Variables
1. Identify the endogeneity problem
2. Propose an instrument \(Z\) and argue it is valid
3. Estimate via 2SLS
ivregress 2sls Y (X=Z) controls, robust4. Report and check
Mistakes to avoid
Good IV practice
ivregress (not manual two-step)ivregress or xtivreg — never manual two-stepWhat is the effect on grades of studying an additional hour per day?
Would you expect the OLS estimator of \(\beta_1\) to be unbiased? Why or why not?
Stinebrickner and Stinebrickner (2008), “The Causal Effect of Studying on Academic Performance,” The B.E. Journal of Economic Analysis & Policy
Do you think \(Z_i\) (whether a roommate brought a video game) is a valid instrument?

OLS coefficient on study = 0.038; IV coefficient = 0.36 (much larger). Why? Selection bias — students who study more tend to be weaker students compensating with extra effort (negative selection). IV corrects this by using only the variation in study time driven by the random video-game assignment.
ECON3500 | Instrumental Variables