SW Chapter 10 (Part 2)
Spring 2026
Last class (with Rae), we covered:
Today: a specific research design built on these tools — difference-in-differences.
Question: What is the effect of a garbage incinerator on nearby housing prices?
After the incinerator was built:

\[\widehat{rprice} = 101{,}308 - 30{,}688 \cdot nearinc\]
Houses near the incinerator sell for ~$30k less. Did the incinerator cause this?
Look at the relationship before the incinerator was built:

\[\widehat{rprice} = 82{,}517 - 18{,}824 \cdot nearinc\]
The incinerator was built in a place where housing prices were already depressed!
The $30k gap reflects both the incinerator effect and pre-existing differences.
Cross-Sectional Comparison (After Only)
Compare near vs. far houses after the incinerator: \(-\$30{,}688\)
Problem: Near-incinerator houses were already cheaper. Location characteristics are confounders — they affect both proximity to the incinerator and housing prices.
Before/After Comparison (Treatment Group Only)
Compare near-incinerator houses before vs. after: \(+\$18{,}790\)
Problem: Housing prices were rising everywhere. We’re mixing the treatment effect with a common time trend.
We need a method that handles both problems simultaneously.

Subtract out the pre-existing difference:
\[\begin{aligned} \hat{\delta}_1 &= \underbrace{(-30{,}688)}_{\text{after gap}} - \underbrace{(-18{,}824)}_{\text{before gap}} \\ &= -11{,}864 \end{aligned}\]
The incinerator reduced nearby prices by ~$12k — not $30k.

The difference of differences:
| Before | After | \(\Delta\) | |
|---|---|---|---|
| Control (far) | $82,517 | $101,308 | +$18,790 |
| Treatment (near) | $63,693 | $70,619 | +$6,927 |
| Treat \(-\) Control | −$18,824 | −$30,688 | −$11,864 |
\[\begin{aligned} &\underbrace{(70{,}619 - 63{,}693)}_{\Delta \text{ treat}} - \underbrace{(101{,}308 - 82{,}517)}_{\Delta \text{ control}} \\ &\quad = {\color{red}-11{,}864} \end{aligned}\]

The dashed line shows where near-incinerator prices would have been if they followed the same trend as far-away prices.
The DiD estimate is the vertical gap between:

Watch the steps:
The key: use the control group’s trajectory to build the counterfactual for the treatment group.
What DiD Really Estimates
DiD doesn’t compare treatment and control groups directly. It asks:
How much did the treatment group change relative to what would have happened without treatment?
The control group’s trajectory provides the counterfactual — but only if parallel trends holds.
This is why DiD is so useful for policy evaluation:
We can capture the entire DiD logic in one regression:
\[rprice_i = \beta_0 + \delta_0 \cdot after_i + \beta_1 \cdot nearinc_i + \delta_1 \cdot (after_i \times nearinc_i) + u_i\]
| Before (\({\color{white} after = 0}\)) | After (\({\color{white} after = 1}\)) | Change | |
|---|---|---|---|
| Control (far) | \(\beta_0\) | \(\beta_0 + \delta_0\) | \(\delta_0\) |
| Treatment (near) | \(\beta_0 + \beta_1\) | \(\beta_0 + \delta_0 + \beta_1 + \delta_1\) | \(\delta_0 + \delta_1\) |
| Treat \(-\) Control | \(\beta_1\) | \(\beta_1 + \delta_1\) | \(\delta_1\) |
\(\delta_1\) — the coefficient on the interaction term — is the DiD estimate.
\[y_i = \underbrace{\beta_0}_{\text{baseline}} + \underbrace{\delta_0 \cdot after_i}_{\text{time trend}} + \underbrace{\beta_1 \cdot treated_i}_{\text{group difference}} + \underbrace{\delta_1 \cdot (after_i \times treated_i)}_{\text{treatment effect}} + u_i\]
| Coefficient | Meaning |
|---|---|
| \(\beta_0\) | Average outcome for control group, before |
| \(\delta_0\) | How the control group changed over time (common time trend) |
| \(\beta_1\) | Pre-existing difference between groups |
| \(\delta_1\) | The DiD estimate — causal if parallel trends holds |
Why add controls to a DiD regression?
1. Precision: Controls reduce residual variance → tighter standard errors, even if parallel trends holds unconditionally.
2. Credibility: Parallel trends may only hold conditional on covariates. If treatment and control groups differ on observables that predict trends, controlling for those variables makes parallel trends more plausible.
Works fine — just add \(\mathbf{X}_{it}\) to the regression:
\[y_{it} = \beta_0 + \delta_0 \cdot after_t + \beta_1 \cdot treated_i + \delta_1 \cdot (after_t \times treated_i) + \gamma' \mathbf{X}_{it} + u_{it}\]
Controls Can Matter for Identification
Unlike in an RCT, parallel trends may only hold after conditioning on observables. Omitting those variables is OVB.
Example: If newer homes were built farther from the incinerator site during this period, controlling for house characteristics isn’t just about precision — it’s about identification.
Parallel Trends Assumption (common trends assumption in SW)
In the absence of treatment, the difference between treatment and control groups would have remained constant over time.
Equivalently: both groups would have followed parallel trajectories.
Key point: Parallel trends is a statement about what would have happened without treatment — it is about the untreated counterfactual, not about what we observe after treatment.
What this allows:
What this requires:

Groups follow the same trajectory before treatment. The treatment effect is the gap between the actual outcome and the counterfactual (dashed line).
The dashed line = the counterfactual: where the treatment group would have been if it followed the control group’s trend.

Groups were converging before treatment. DiD attributes the continued convergence to the treatment — overstating the true effect.
The gold dotted line = the true counterfactual (continuing convergence). The dashed line = what DiD incorrectly assumes. The difference between the two is the bias.
In the incinerator example — what could threaten parallel trends?
More generally, any factor that:
We can never prove parallel trends — it’s about what would have happened without treatment.
But we can look for supporting evidence:
1. Pre-treatment trend test: Plot the outcome for both groups over time. Do they move in parallel before treatment?
2. Placebo/falsification tests:
3. Event study plot: Estimate treatment effects for each period relative to treatment. Pre-treatment coefficients should be near zero and flat.
An event study estimates separate effects for each period relative to treatment:
\[y_{it} = \alpha_i + \lambda_t + \sum_{k \neq -1} \delta_k \cdot D_{it}^k + u_{it}\]
where \(D_{it}^k = 1\) if unit \(i\) is \(k\) periods from treatment at time \(t\).
Think of it as DiD estimated separately for each time period — revealing when effects emerge and whether they grow, fade, or were already present before treatment.
What to look for:
Why Event Studies Matter
Event study plots have become nearly mandatory in applied DiD papers. They are the most credible way to support — though not necessarily prove — the parallel trends assumption.
Question: Does raising the minimum wage reduce employment?
Setting:
| Before | After | Δ | |
|---|---|---|---|
| NJ (treatment) | 20.44 FTE | 21.03 FTE | +0.59 |
| PA (control) | 23.33 FTE | 21.17 FTE | −2.16 |
| NJ − PA | +2.76 |
Result: Employment increased in NJ relative to PA — the opposite of the standard prediction. One of the most influential papers in labor economics.
What makes this a good DiD design?
What might you be concerned about?
Option 1: Interaction regression
Option 2: Factor variable notation (preferred)
The coefficient on 1.after#1.treated is the DiD estimate \(\hat{\delta}_1\).
With true panel data (same entities over time), combine DiD with fixed effects:
Or equivalently, using reghdfe:
Don’t Forget to Cluster!
With panel data, always cluster standard errors at the entity level — we’ll cover exactly why on Tuesday.
| Ch 9 Threat | How DiD Helps | What DiD Cannot Fix |
|---|---|---|
| OVB | Eliminates time-invariant confounders by differencing | Time-varying confounders that differentially affect groups |
| Wrong functional form | Flexible — can add covariates, nonlinearities | Misspecified treatment timing or group definitions |
| Measurement error | Not addressed | Attenuation bias still applies |
| Sample selection | Not addressed | Differential attrition (people move because of treatment) |
| Simultaneous causality | Treatment timing helps | If treatment is responsive to anticipated outcomes |
Before you trust a DiD estimate, ask:
DiD combines a cross-sectional comparison with a time comparison — the causal interpretation requires parallel trends
The interaction term captures the DiD estimate (\(\hat{\delta}_1\))
Parallel trends is untestable — it’s about a counterfactual — but pre-trends and event studies provide supporting evidence
DiD addresses time-invariant OVB by differencing it out — but time-varying confounders that differentially affect groups still threaten validity
Always cluster standard errors at the entity level with panel data — more on this Tuesday
ECON3500 | Chapter 10: Difference-in-Differences