<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Assignment overview | ECON3500: Econometrics and Applications</title><link>https://econ3500s26.netlify.app/assignment/</link><atom:link href="https://econ3500s26.netlify.app/assignment/index.xml" rel="self" type="application/rss+xml"/><description>Assignment overview</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><lastBuildDate>Tue, 06 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://econ3500s26.netlify.app/media/social-image.png</url><title>Assignment overview</title><link>https://econ3500s26.netlify.app/assignment/</link></image><item><title>Problem set 5</title><link>https://econ3500s26.netlify.app/assignment/05-ps/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/05-ps/</guid><description>&lt;h2 id="welcome">Welcome&lt;/h2>
&lt;p>Our final problem set, 😭 covering Chapters 10 and 12.&lt;/p>
&lt;p>See the exercises below, or you can
&lt;a href="../05-ps.pdf">download them as a pdf&lt;/a>.&lt;/p>
&lt;p>You should not need any textbook tables to complete this problem set. Any datasets, scans, articles, and formulas you need are linked directly in the questions below or included in the hints.&lt;/p>
&lt;h2 id="what-do-i-submit">What do I submit?&lt;/h2>
&lt;ul>
&lt;li>Your written up answers to exercise questions. If you work on a piece of paper, please scan using some sort of phone software (like Microsoft Lens or Adobe Scan) rather than just taking a picture.&lt;/li>
&lt;li>A do-file that runs your Stata analysis.&lt;/li>
&lt;li>A log file that includes the output from running your do-file.&lt;/li>
&lt;/ul>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>In 1985, neither Florida nor Georgia had laws banning open alcohol containers in vehicle passenger compartments. By 1990, Florida had passed such a law, but Georgia had not.&lt;/p>
&lt;p>a. Suppose you collect random samples of the driving-age population in both states, for 1985 and 1990. Let $arrest$ be a binary variable equal to one if a person was arrested for drunk driving during the year. Without controlling for any other factors, write down a linear probability model that allows you to test whether the open container law reduced the probability of being arrested for drunk driving. Which coefficient measures the effect of the law?&lt;/p>
&lt;p>b. Why might you want to control for other factors in the model? What might some of these factors be?&lt;/p>
&lt;p>c. Now, suppose that you can only collect data for 1985 and for 1990 at the county level for the two states. The dependent variable would be the fraction of licensed drivers arrested for drunk driving during the year. How does this data structure differ from the individual-level data described in part (a)? What econometric method would you use?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For this exercise, use
&lt;a href="../materials/JTRAIN.dta">&lt;code>JTRAIN.dta&lt;/code>&lt;/a> to determine the effect of a job training grant on hours of job training per employee. The basic model for the three years is the following: $$\begin{split}
hrsemp_{it} &amp;amp;= \beta_0 + \delta_1 d88_t + \delta_2 d89_t +\
&amp;amp; \beta_1 grant_{it} + \beta_2 grant_{i,t-1} + \beta_3 log(employ_{it}) + a_i + u_{it}
\end{split}$$&lt;/p>
&lt;p>a. Estimate the equation using first differencing. How many firms are used in the estimation? How many total observations would be used if each firm had data on all variables (in particular, $hrsemp$) for all three time periods?&lt;/p>
&lt;p>b. Interpret the coefficient on $grant$, and comment on its significance.&lt;/p>
&lt;p>c. Is it surprising that $grant_{-1}$ is insignificant? Explain.&lt;/p>
&lt;p>d. Do larger firms train their employees more or less, on average? How big are the differences in training?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use
&lt;a href="../materials/CRIME4.dta">&lt;code>CRIME4.dta&lt;/code>&lt;/a> for this exercise, and see example 13.9 in this poor-quality scanned
&lt;a href="../materials/example-13.9.pdf">upload&lt;/a>.&lt;/p>
&lt;p>a. Replicate the results in Example 13.9.&lt;/p>
&lt;p>b. Re-estimate the unobserved effects model for crime in Example 13.9, but use fixed effects rather than differencing. Are there any notable sign or magnitude changes in the coefficients? What about statistical significance?&lt;/p>
&lt;p>c. Add the logs of each wage variable in the data set and estimate the model by fixed effects. How does including these variables affect the coefficient on the criminal justice variables in part (b)?&lt;/p>
&lt;p>d. Do the wage variables in part (c) have the expected sign? Are they jointly significant?&lt;/p>
&lt;pre>&lt;code> &amp;lt;!-- Stock and Watson: 12.6, --&amp;gt;
&lt;/code>&lt;/pre>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>SW-12.6&lt;/strong> In an instrumental variable regression model with one regressor, $X_i$, and one instrument, $Z_i$, the regression of $X_i$ onto $Z_i$ has $R^2 = 0.05$ and $n = 100$. Is $Z_i$ a strong instrument?&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> Would your answer change if $R^2 = 0.05$ and $n = 500$?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- 12.9, E12.2 (Skip part e) -->
&lt;ol start="5">
&lt;li>
&lt;p>&lt;strong>SW-12.9&lt;/strong> A researcher is interested in the effect of military service on human capital. She collects data from a random sample of 4000 workers aged 40 and runs the OLS regression $Y_i = \beta_0 + \beta_1X_i + u_i$, where $Y_i$ is a worker&amp;rsquo;s annual earnings and $X_i$ is a binary variable equal to 1 if the person served in the military and is equal to 0 otherwise.&lt;/p>
&lt;p>a. Explain why the OLS estimates are likely to be unreliable. (&lt;em>Hint:&lt;/em> Which variables are omitted from the regression? Are they correlated with military service?)&lt;/p>
&lt;p>b. During the Vietnam war there was a draft in which priority for the draft was determined by a national lottery. The days of the year were randomly re-ordered 1 through 365. (Those whose birthdays were ordered first were drafted before those with birthdates ordered second, and so forth.) Explain how the lottery might be used as an instrument to estimate the effect of military service on earnings. For more about this issue, see Joshua D. Angrist&amp;rsquo;s paper &amp;ldquo;Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administration Records,&amp;rdquo; &lt;em>American Economic Review&lt;/em>, June 1990: 313–336.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- E12.2 (Skip part e -->
&lt;ol start="6">
&lt;li>
&lt;p>&lt;strong>SW-E12.2&lt;/strong> Does viewing a violent movie lead to violent behavior? If so, the incidence of violent crimes, such as assault, should rise following the release of a violent movie that attracts many viewers. Alternatively, movie viewing may substitute for other activities, such as alcohol consumption, that lead to violent behavior, so that assaults should fall more when more viewers are attracted to the cinema. Use the data file
&lt;a href="../materials/Movies.dta">&lt;code>Movies.dta&lt;/code>&lt;/a>, which contains data on the number of assaults and movie attendance for 516 weekends from 1995 through 2004.&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup> A detailed description is given
&lt;a href="../materials/movies_description.pdf">here&lt;/a>. The data set includes weekend US attendance for strongly violent movies (such as &lt;em>Hannibal&lt;/em>), mildly violent movies (such as &lt;em>Spiderman&lt;/em>), and non-violent movies (such as &lt;em>Finding Nemo&lt;/em>). The data also includes the count of the number of assaults for the same weekend in a subset of counties in the United States. Finally, the data set includes indicators for year, month, whether the weekend is a holiday, and various measures of the weather.&lt;/p>
&lt;p>a. Regress the logarithm of the number of assaults ($ln_assaults= ln(assaults)$) on the year and month indicators. Is there evidence of seasonality in assaults? That is, do there tend to be more assaults in some months than others? Explain.&lt;/p>
&lt;p>b. Now, regress total movie attendance ($attend = attend_v + attend_m + attend_n$) on the year and month indicators. Is there evidence of seasonality in movie attendance? Explain.&lt;/p>
&lt;p>c. Regress $ln_assaults$ on $attend_v$, $attend_m$, $attend_n$, the year and month indicators, and the weather and holiday control variables available in the data set.&lt;/p>
&lt;ol>
&lt;li>Based on the regression, does viewing a strongly violent movie increase or decrease assaults? By how much? Is the estimated effect statistically significant?&lt;/li>
&lt;li>Does attendance at strongly violent movies affect us all differently than attendance at moderately violent movies? Differently than attendance at non-violent movies?&lt;/li>
&lt;li>A strongly violent blockbuster movie is released and weekend attendance at strongly violent movies increases by 6 million; meanwhile, attendance falls by 2 million for moderately violent movies and by 1 million for non-violent movies. What is the predicted effect on assault? Construct a 95% confidence interval for the change in assault.&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>&lt;/li>
&lt;/ol>
&lt;p>d. It is difficult to control for all the variables that affect assaults and that might be correlated with movie attendance. For example, the effect of the weather on assaults and movie attendance is only crudely approximated by the weather variables in the data set. However, the data set does include a set of instruments $pr_attend_v$, $pr_attend_m$, and $pr_attend_n$, that are correlated with attendance but are (arguably) uncorrelated with weekend-specific factors such as the weather that affect both assaults and movie attendance. These instruments use historical attendance patterns, not information on a particular weekend, to predict a film&amp;rsquo;s attendance in a given weekend. For example, if a film&amp;rsquo;s attendance is high in the second week of its release, then this could be used to predict that attendance was also high in the first week of its release. The details of the construction of these instruments are available in the
&lt;a href="https://eml.berkeley.edu//~sdellavi/wp/moviescrime08-08-01Forthc.pdf" target="_blank" rel="noopener">Dahl and DellaVigna paper&lt;/a>. Run the regression from part c, including year, month, holiday, and weather controls, but now using the instruments for attendance. Use this regression to re-answer the questions from part c: c(1)- c(3).&lt;/p>
&lt;!-- e. The intuition underlying the instruments in part 4 is that attendance in a given week is correlated with attendance and surrounding weeks. For each movie category, the data set includes attendance in surrounding weeks. Run the regression using the instruments $attend\_v\_f$, $attend\_m\_f$, $attend\_n\_f$, $attend\_v\_b$, $attend\_m\_b$, and $attend\_n\_b$ instead of the instruments used in part d, then use this regression to answer part c: c(1)- c(3). -->
&lt;p>e. Based on your analysis, what do you conclude about the effects of violent movies on short-run violent behavior?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- AAE12.1 (skip for now) -->
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>&lt;em>Hint:&lt;/em> Use the first-stage F-statistic and the usual rule of thumb that instruments with $F &amp;lt; 10$ are weak.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>These are aggregated versions of data provided by Gordon Dahl and Stefano DellaVigna, used in their paper,
&lt;a href="https://eml.berkeley.edu//~sdellavi/wp/moviescrime08-08-01Forthc.pdf" target="_blank" rel="noopener">&amp;ldquo;Does Movie Violence Increase Violent Crime?&amp;quot;&lt;/a>.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>&lt;em>Hint:&lt;/em> Review section 7.3 and material surrounding equations 8.7 and 8.8.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Problem set 4</title><link>https://econ3500s26.netlify.app/assignment/04-ps/</link><pubDate>Fri, 13 Mar 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/04-ps/</guid><description>&lt;h2 id="welcome">Welcome&lt;/h2>
&lt;p>Chapters 8 and 9 problems! Enjoy!&lt;/p>
&lt;p>See the exercises below, or you can
&lt;a href="../04-ps.pdf">download them as a pdf&lt;/a>.&lt;/p>
&lt;h2 id="what-do-i-submit">What do I submit?&lt;/h2>
&lt;ul>
&lt;li>Your written up answers to exercise questions. If you work on a piece of paper, please scan using some sort of phone software (like Microsoft Lens or Adobe Scan) rather than just taking a picture.&lt;/li>
&lt;li>A do-file that runs your Stata analysis (for question 8).&lt;/li>
&lt;li>A log file that includes the output from running your do-file (for question 8).&lt;/li>
&lt;/ul>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;!-- Wooldrdge 3.9 -->
&lt;ol>
&lt;li>
&lt;p>The following equation describes the median housing price in a community in terms of amount of pollution ($nox$ for nitrous oxide) and the average number of rooms in houses in the community ($rooms$):&lt;/p>
&lt;p>$log(price) = \beta_0 + \beta_1log(nox) + \beta_2rooms + u$&lt;/p>
&lt;p>a. What are the probable signs of $\beta_1$ and $\beta_2$? What is the interpretation of $\beta_1$? Explain.&lt;/p>
&lt;p>b. Why might $nox$ [or more precisely, $log(nox)$] and $rooms$ be negatively correlated? If this is the case, does the simple regression of $log(price)$ on $log(nox)$ produce an upward or a downward biased estimator of $\beta_1$?&lt;/p>
&lt;p>c. Using data, the following equations were estimated:&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>$\widehat{log(price)} = 11.71 - 1.043 log(nox)$, $n = 506$, $R^2 = 0.264$
$\widehat{log(price)} = 9.23 - 0.718 log(nox) + 0.306 rooms$, $n = 506$, $R^2 = 0.514$&lt;/p>
&lt;p>Is the relationship between the simple and multiple regression estimates of the elasticity of $price$ with respect to $nox$ what you would have predicted, given your answer in part (b)? Does this mean that 0.718 is definitely closer to the true elasticity than 1.043?&lt;/p>
&lt;!-- Stock and watson 8.4 -->
&lt;ol start="2">
&lt;li>
&lt;p>Read the box &lt;em>&amp;ldquo;The Return to Education and the Gender Gap&amp;rdquo;&lt;/em> in Section 8.3 of your textbook (Stock &amp;amp; Watson).&lt;/p>
&lt;p>a. Consider a man with 16 years of education and 2 years of experience. Use the results from column (4) of Table 8.1 and the method in Key Concept 8.1 to estimate the expected change in the logarithm of average hourly earnings (AHE) associated with an additional year of experience.&lt;/p>
&lt;p>b. Explain why your answer to (a) does not depend on the region he is from.&lt;/p>
&lt;p>c. Repeat (a), assuming 10 years of experience.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- % Stock and Watson 8.6 -->
&lt;ol start="3">
&lt;li>
&lt;p>To answer this question, refer to &lt;em>Table 8.3: Nonlinear Regression Model of Test Scores&lt;/em> in your textbook:&lt;/p>
&lt;p>a. A researcher suspects that the effect of % Eligible for subsidized lunch has a nonlinear effect on test scores. In particular, he conjectures that increases in this variable from 10% to 20% have little effect on test scores but that changes from 50% to 60% have a much larger effect.
i. Describe a nonlinear specification that can be used to model this form of nonlinearity.
ii. How would you test whether the researcher&amp;rsquo;s conjecture was better than the linear specification in column (7) of Table 8.3?&lt;/p>
&lt;p>b. A researcher suspects that the effect of income on test scores is different in districts with small classes than in districts with large classes.
i. Describe a nonlinear specification that can be used to model this form of nonlinearity.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- &lt;!-- Stock and Watson 9.2 -->
&lt;!-- Consider the one-variable regression model $Y_i = \beta_0 = \beta_1 x_i + u_i$ and suppose it satisfies the least squares assumptions in Key Concept 4.3. Suppose $Y_i$ is measured with error, so the data are $\widetilde{Y_i} = Y_i + w_i$, where $w_i$ is the measurement error, which is i.i.d. and independent of $Y_i$ and $X_i$. Consider the population regression $Y$ using the mismeasured dependent variable, $\widetilde{Y}$. -->
&lt;!-- a. Show that $v_i = u_i + w_i$ -->
&lt;!-- b. Show that the regression $\widetilde{Y_i} = \beta_0 + \beta_1X_i + v_i$ satisfies the least squares assumptions in Key Concept 4.3. Assume that $w_i$ is independent of $Y_j$ and $X_j$ for all values of $i$ and $j$ and has a finite fourth moment. -->
&lt;!-- c. Can confidence intervals be constructed in the usual way? -->
&lt;!-- d. Evaluate these statements: “Measurement error in the X’s is a serious problem. Measurement error in Y is not.” -->
&lt;!-- 9.3, 9.5, 9.6, 9.10 (odd-numbered answers are online, but think through them carefully!) -->
&lt;!-- 2. Additional empirical exercise 9.1, -->
&lt;!-- Stock and Watson 9.3 -->
&lt;ol start="4">
&lt;li>
&lt;p>Labor economists studying the determinants of women&amp;rsquo;s earnings discovered a puzzling empirical result. Using randomly selected employed women, they regressed earnings on the women&amp;rsquo;s number of children and a set of control variables (age, education, occupation, and so forth). They found that women with more children had higher wages, controlling for these other factors. Explain how sample selection might be the cause of this result. (Hint: Notice that women who do not work outside the home are missing from the sample.) [This empirical puzzle motivated James Heckman&amp;rsquo;s research on sample selection that led to his 2000 Nobel Prize in Economics. See Heckman (1974)]&lt;/p>
&lt;/li>
&lt;li>
&lt;p>This question uses directed acyclic graphs (DAGs), which we will cover in class. You may also find it helpful to read Huntington-Klein, &lt;em>The Effect&lt;/em>,
&lt;a href="https://theeffectbook.net/ch-CausalPaths.html" target="_blank" rel="noopener">Chapter 8: Causal Paths and Closing Back Doors&lt;/a>, especially Sections 8.3–8.5.&lt;/p>
&lt;p>Consider the relationship between a woman&amp;rsquo;s number of children and her earnings from question 4.&lt;/p>
&lt;p>a. Draw a DAG that includes the following variables: Earnings, Number of Children, Decision to Work Outside the Home, and Ability/Motivation. Add arrows representing plausible causal relationships. For each arrow, write one sentence explaining why you included it.&lt;/p>
&lt;p>b. Is &amp;ldquo;Decision to Work Outside the Home&amp;rdquo; a confounder, a collider, or a mediator on the path between Number of Children and Earnings? Explain.&lt;/p>
&lt;p>c. When researchers study only employed women, they are conditioning on &amp;ldquo;Decision to Work.&amp;rdquo; Using your DAG, explain why this could produce a spurious positive relationship between number of children and earnings — even if children have no direct causal effect on earnings.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- &lt;!-- Stock and Watson 9.5 -->
&lt;ol start="6">
&lt;li>
&lt;p>The demand for a commodity is given by $Q = \beta_0 + \beta_1 P + u$, where $Q$ denotes quantity, $P$ denotes price, and $u$ denotes factors other than price that determine demand. Supply for the commodity is given by $Q = \gamma_0 + \gamma_1P + v$, where $v$ denotes factors other than price that determine supply. Suppose $u$ and $v$ both have a mean of 0, have variances $\sigma^2_u$ and $\sigma^2_v$, and are mutually uncorrelated.&lt;/p>
&lt;p>a. Solve the two simultaneous equations to show how Q and P depend on u and v. &lt;em>(Hint: In equilibrium, quantity supplied equals quantity demanded. Set the two equations equal and solve for P in terms of u and v. Then substitute back to find Q.)&lt;/em>&lt;/p>
&lt;p>b. Derive the means of P and Q. &lt;em>(Hint: Use your answers from part (a) and the fact that $E(u) = E(v) = 0$.)&lt;/em>&lt;/p>
&lt;p>c. &lt;strong>(Optional)&lt;/strong> Derive the variance of P, the variance of Q, and the covariance between Q and P.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- d. A random sample of observations of $(Q_i, P_i)$ is collected, and $Q_i$ is regressed on $P_i$. (That is, $Q_i$ is the dependent variable, and $P_i$ is the independent variable) Suppose the sample is very large. -->
&lt;!-- i. Use your answers to (b) and (c) to derive values of the regression coefficients. [Hint: Use Equations (4.7) and (4.8).] -->
&lt;!-- ii. A researcher uses the slope of this regression as an estimate of the slope of the demand function ($\beta_1$). Is the estimated slope too large or too small? (Remember, demand curves slope down and supply curves slope up!) -->
&lt;!-- &lt;!-- &lt;!-- Stock and Watson 9.6 -->
&lt;!-- 4. Suppose that $n = 100$ i.i.d. observations for $Y_i$, $X_i$ yield the following regres-sion results: -->
&lt;!-- $\widehat{Y} = 32.1 + 66.8X$, $SER = 15.1$, $R^2 = 0.81$., where $\widehat{SE(\beta_0)} = 115.12$ and $\widehat{SE(\beta_1)}= 112.22$ -->
&lt;!-- Another researcher is interested in the same regression, but he makes an error when he enters the data into his regression program: He enters each observation twice, so he has 200 observations (with observation 1 entered twice, observation 2 entered twice, and so forth). -->
&lt;!-- a. Using these 200 observations, what results will be produced by his regression program? (Hint: Write the “incorrect” values of the sample means, variances, and covariances of Y and X as functions of the “correct” values. Use these to determine the regression statistics.) -->
&lt;!-- $\widehat{Y} =$ ____ + ____ $X$, $SER = $____, $R^2 =$ ____, -->
&lt;!-- $\widehat{SE(\beta_0)}$ = ______ and $\widehat{SE(\beta_1)}$= ______ -->
&lt;!-- b. Which (if any) of the internal validity conditions are violated? -->
&lt;!-- Stock and Watson 9.10 -->
&lt;ol start="7">
&lt;li>
&lt;p>Revisit the box &lt;em>&amp;ldquo;The Return to Education and the Gender Gap&amp;rdquo;&lt;/em> in Section 8.3 of your textbook (Stock &amp;amp; Watson). Discuss the internal and external validity of the estimated effect of education on earnings.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the dataset
&lt;a href="https://www.princeton.edu/~mwatson/Stock-Watson_3u/Students/EE_Datasets/CollegeDistance.dta" target="_blank" rel="noopener">&lt;code>CollegeDistance.dta&lt;/code>&lt;/a> (described in Empirical Exercise AEE 4.3) to answer the following questions.&lt;/p>
&lt;p>a. Run a regression of $ED$ on $Dist$, $Female$, $Bytest$, $Tuition$, $Black$, $Hispanic$, $Incomehi$, $Ownhome$, $DadColl$, $MomColl$, $Cue80$, and $Stwmfg80$. If $Dist$ increases from 2 to 3 (that is, from 20 to 30 miles), how are years of education expected to change? If $Dist$ increases from 6 to 7 (that is, from 60 to 70 miles), how are years of education expected to change?&lt;/p>
&lt;p>b. Run a regression of $ln(ED)$ on $Dist$, $Female$, $Bytest$, $Tuition$, $Black$, $Hispanic$, $Incomehi$, $Ownhome$, $DadColl$, $MomColl$, $Cue80$, and $Stwmfg80$. If $Dist$ increases from 2 to 3 (from 20 to 30 miles), how are years of education expected to change? If $Dist$ increases from 6 to 7 (from 60 to 70 miles), how are years of education expected to change?&lt;/p>
&lt;p>c. Run a regression of $ED$ on $Dist$, $Dist^2$, $Female$, $Bytest$, $Tuition$, $Black$, $Hispanic$, $Incomehi$, $Ownhome$, $DadColl$, $MomColl$, $Cue80$, and $Stwmfg80$. If $Dist$ increases from 2 to 3 (from 20 to 30 miles), how are years of education expected to change? If $Dist$ increases from 6 to 7 (from 60 to 70 miles), how are years of education expected to change?&lt;/p>
&lt;p>d. Do you prefer the regression in (c) to the regression in (a)? Explain.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- e. Consider a Hispanic female with $Tuition = \$950$, $Bytest = 58$, $Incomehi = 0$, $Ownhome = 0$, $DadColl = 1$, $MomColl = 1$, $Cue80 = 7.1$, and $Stwmfg = \$10.06$.
i. Plot the regression relation between $Dist$ and $ED$ from (a) and (c) for $Dist$ in the range of 0 to 10 (from 0 to 100 miles). Describe the similarities and differences between the estimated regression functions. Would your answer change if you plotted the regression function for a white male with the same characteristics?
ii. How does the regression function (c) behave for $Dist > 10$? How many observations are there with $Dist > 10$? -->
&lt;p>e. Add the interaction term $DadColl \times MomColl$ to the regression in (c). What does the coefficient on the interaction term measure?&lt;/p>
&lt;p>f. Mary, Jane, Alexis, and Bonnie have the same values of $Dist$, $Bytest$, $Tuition$, $Female$, $Black$, $Hispanic$, $Incomehi$, $Ownhome$, $Cue80$, and $Stwmfg80$. Neither of Mary&amp;rsquo;s parents attended college. Jane&amp;rsquo;s father attended college, but her mother did not. Alexis&amp;rsquo;s mother attended college, but her father did not. Both of Bonnie&amp;rsquo;s parents attended college. Using the regressions from (e):
i. What does the regression predict for the difference between Jane&amp;rsquo;s and Mary&amp;rsquo;s years of education?
ii. What does the regression predict for the difference between Alexis&amp;rsquo;s and Mary&amp;rsquo;s years of education?
iii. What does the regression predict for the difference between Bonnie&amp;rsquo;s and Mary&amp;rsquo;s years of education?&lt;/p>
&lt;p>g. Is there any evidence that the effect of $Dist$ on $ED$ depends on the family&amp;rsquo;s income?&lt;/p>
&lt;p>h. After running all these regressions (and any others that you want to run), summarize the effect of $Dist$ on years of education.&lt;/p></description></item><item><title>Problem set 3</title><link>https://econ3500s26.netlify.app/assignment/03-ps/</link><pubDate>Tue, 17 Feb 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/03-ps/</guid><description>&lt;h2 id="welcome">Welcome&lt;/h2>
&lt;p>Note that this is your last problem set before the next exam!&lt;/p>
&lt;p>See the exercises below, or you can
&lt;a href="../03-ps.pdf">download them as a pdf&lt;/a>. You can download the data file you need for question 6
&lt;a href="../materials/growth.dta">here&lt;/a>, along with information on the variable definitions
&lt;a href="https://www.princeton.edu/~mwatson/Stock-Watson_3u/Students/EE_Datasets/Growth_Description.pdf" target="_blank" rel="noopener">here&lt;/a>&lt;/p>
&lt;h2 id="what-do-i-submit">What do I submit?&lt;/h2>
&lt;ul>
&lt;li>Your written answers to exercise questions. If you work on a piece of paper, please scan using some sort of phone software (like Microsoft Lens or Adobe Scan) rather than just taking a picture.&lt;/li>
&lt;li>A do-file that runs your Stata analysis (for questions 6 and 7).&lt;/li>
&lt;li>A log file that includes the output from running your do-file (for question 6 and 7).&lt;/li>
&lt;/ul>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;!-- % Wooldridge 6.8 -->
&lt;ol>
&lt;li>
&lt;p>Suppose that we want to estimate the effects of alcohol consumption ($alcohol$) on college grade point average ($colGPA$). In addition to collecting information on alcohol consumption and grade point averages, we also obtain attendance information (say, percentage of lectures attended, $attend$). A standardized test score (say, $SAT$) and high school GPA ($hsGPA$) are also available.&lt;/p>
&lt;p>a. Should we include $attend$ along with alcohol as explanatory variables in a multiple regression model? What would be the interpretation of $\beta_{alcohol}$ if we did?&lt;/p>
&lt;p>b. Should $SAT$ and $hsGPA$ be included as explanatory variables? Explain.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- SW 6.6 -->
&lt;ol start="2">
&lt;li>
&lt;p>A researcher plans to study the causal effect of police on crime, using data from a random sample of U.S. counties. She plans to regress the county&amp;rsquo;s crime rate on the (per capita) size of the county&amp;rsquo;s police force.&lt;/p>
&lt;p>a. Explain why this regression is likely to suffer from omitted variable bias. Which variables would you add to the regression to control for important omitted variables?&lt;/p>
&lt;p>b. Use your answer to (a) and the expression for omitted variable bias (from the slides or textbook) to determine whether the regression will likely over- or underestimate the effect of police on the crime rate. (That is, is $\hat{\beta_1}&amp;gt;\beta_1$, or that $\hat{\beta_1} &amp;lt; \beta_1$?)&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- SW additional emirical exercise 7.2 -->
&lt;ol start="3">
&lt;li>
&lt;p>Critique each of the following proposed research plans. Your critique should explain any problems with the proposed research and describe how the research plan might be improved. Include discussion of any additional data that needs to be collected, and the appropriate statistical techniques for analyzing those data.&lt;/p>
&lt;p>a. A researcher is interested in determining whether a large aerospace firm is guilty of gender bias in setting wages. To determine potential bias, the researcher collects salary and gender information for all of the firm&amp;rsquo;s engineers. The researcher then plans to conduct a &amp;ldquo;difference in means&amp;rdquo; test to determine whether the average salary for women is significantly less than the average salary for men. &lt;!-- SW 6.7 -->&lt;/p>
&lt;p>b. A researcher is interested in determining whether time in prison has a permanent effect on a person&amp;rsquo;s wage rate. He collects data on a random sample of people who have been out of prison for at least 15 years. He collects similar data on a random sample of people who have never served time in prison. The data set includes information on each person&amp;rsquo;s current wage, education, age, ethnicity, gender, tenure (time in current job), occupation, and union status, as well as whether the person has ever been incarcerated. The researcher plans to estimate the effect of incarceration on wages by regressing wages on an indicator variable for incarceration, including in the regression the other potential determinants of wages such as education, tenure, union status, and so on.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Consider a dataset that contains information on 4700 full-time full-year workers. The highest educational achievement for each worker was either a high school diploma or a bachelor&amp;rsquo;s degree. The worker&amp;rsquo;s ages ranged from 25 to 45 years. The data set also contains information on the region of the country where the person lived, marital status, and number of children. See below for variable definitions.&lt;/p>
&lt;p>a. Is the college-high school earnings difference estimated from this regression statistically significant at the 5% level? Construct a 95% confidence interval of the difference.&lt;/p>
&lt;p>b. Do there appear to be important regional differences in hourly earnings? Use an appropriate hypothesis test to explain your answer.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Definition&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>AHE&lt;/td>
&lt;td>average hourly earnings (in 2005 dollars)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>College&lt;/td>
&lt;td>1 if college, 0 if high school&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Female&lt;/td>
&lt;td>1 if female, 0 if male&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Age&lt;/td>
&lt;td>age (in years)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Ntheast&lt;/td>
&lt;td>1 if Region = Northeast, 0 otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Midwest&lt;/td>
&lt;td>1 if Region = Midwest, 0 otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>South&lt;/td>
&lt;td>1 if Region = South, 0 otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>West&lt;/td>
&lt;td>1 if Region = West, 0 otherwise&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;img src = "../materials/sw7-3.png">
&lt;!-- SW 7.8 -->
&lt;ol start="5">
&lt;li>
&lt;p>Consider the regression results below and do the following:&lt;/p>
&lt;p>a. Construct the $R^2$ for each of the regressions&lt;/p>
&lt;p>b. Construct the homoskedasticity-only $F$-statistic for testing $\beta_3 = \beta_4 = 0$ shown in column (5). Is the statistic significant at the 5% level?&lt;/p>
&lt;p>c. Construct a 99% confidence interval for $\beta_1$ for the regression in column (5)&lt;/p>
&lt;/li>
&lt;/ol>
&lt;img src = "../materials/sw7-1.png">
&lt;ol start="6">
&lt;li>
&lt;p>Download the dataset
&lt;a href="../materials/growth.dta">growth.dta&lt;/a>, which contains data on average growth rates from 1960 through 1995 for 65 countries, along with variables that are potentially related to growth. You can download a detailed description of all variable names is available
&lt;a href="https://www.princeton.edu/~mwatson/Stock-Watson_3u/Students/EE_Datasets/Growth_Description.pdf" target="_blank" rel="noopener">here&lt;/a>. For all questions, exclude Malta, which has an extremely high trade share.&lt;/p>
&lt;p>a. Write the population model for a regression of &lt;code>growth&lt;/code> on &lt;code>tradeshare&lt;/code>, &lt;code>yearsschool&lt;/code>, &lt;code>rev_coups&lt;/code>, &lt;code>assassinations&lt;/code>, and &lt;code>rgdp60&lt;/code>. Then estimate it using OLS with heteroskedasticity-robust standard errors.&lt;/p>
&lt;p>b. What is the value of the coefficient on &lt;code>rev_coups&lt;/code>? Interpret the value of this coefficient. Is it large or small in a real-world sense?&lt;/p>
&lt;p>c. Use the regression to predict the average annual growth rate for a country that has average values for all regressors.&lt;/p>
&lt;p>d. Test whether the political variables &lt;code>rev_coups&lt;/code> and &lt;code>assassinations&lt;/code>, taken as a group, can be omitted from the regression. What is the p-value of the F-statistic?&lt;/p>
&lt;p>e. After running your regression, pick one country in your sample. Report its actual value of &lt;code>growth&lt;/code>, its fitted (predicted) value, and its residual. In one sentence, what does that residual mean?&lt;/p>
&lt;p>f. Under what assumptions is the OLS estimator BLUE? For this regression, which of those assumptions are likely to hold, which are likely violated, and for which would you need more information? (One short sentence per assumption is enough.)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Consider the regression in Question 6: &lt;code>growth&lt;/code> on &lt;code>tradeshare&lt;/code>, &lt;code>yearsschool&lt;/code>, &lt;code>rev_coups&lt;/code>, &lt;code>assassinations&lt;/code>, and &lt;code>rgdp60&lt;/code>.&lt;/p>
&lt;p>a. Give an example of a variable that is likely to be in the error term and would &lt;em>not&lt;/em> violate the zero conditional mean assumption. Explain in one sentence.&lt;/p>
&lt;p>b. Give an example of a variable that is likely to be in the error term and &lt;em>would&lt;/em> violate the zero conditional mean assumption. Explain in one sentence.&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title>Lab 8: Instrumental variables</title><link>https://econ3500s26.netlify.app/assignment/08-lab/</link><pubDate>Tue, 14 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/08-lab/</guid><description>&lt;p>&lt;strong>
&lt;a href="../08-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;p>It&amp;rsquo;s our final lab of the semester!&lt;/p>
&lt;h2 id="materials" class="unnumbered">Materials&lt;/h2>
&lt;ul>
&lt;li>
&lt;a href="../materials/voucher.dta">&lt;code>voucher.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="objectives" class="unnumbered">Objectives&lt;/h2>
&lt;p>Today we&amp;rsquo;re going to work with
&lt;a href="../materials/voucher.dta">&lt;code>voucher.dta&lt;/code>&lt;/a>, a dataset of student
performance from Rouse (1998). She measures the impact of private school
vouchers on student achievement.&lt;/p>
&lt;p>By the end of this lab, you should be able to complete the following
tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Estimate instrumental variable specifications and interpret them.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Output regression results using &lt;code>outreg2&lt;/code>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="iv-intuition" class="unnumbered">Why instrumental variables?&lt;/h3>
&lt;p>In Labs 6 and 7, we dealt with &lt;strong>endogeneity&lt;/strong> — situations where our key independent variable is correlated with the error term, usually because of omitted variables, measurement error, or reverse causality. Fixed effects (Lab 7) solve this when the problem comes from time-invariant confounders.&lt;/p>
&lt;p>&lt;strong>Instrumental variables (IV)&lt;/strong> offer another approach: find a variable (the &amp;ldquo;instrument&amp;rdquo;) that affects $Y$ &lt;em>only through&lt;/em> $X$. This instrument provides a source of exogenous variation in $X$ that we can use to estimate the causal effect. The two requirements for a valid instrument are:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Relevance&lt;/strong>: The instrument must be correlated with the endogenous variable ($X$).&lt;/li>
&lt;li>&lt;strong>Exclusion restriction&lt;/strong>: The instrument must affect $Y$ only through $X$ (not directly).&lt;/li>
&lt;/ol>
&lt;h3 id="data-context" class="unnumbered">Data context&lt;/h3>
&lt;p>The data come from an evaluation of the Milwaukee Parental Choice Program, which randomly offered school vouchers to students via a lottery. The final measure of student performance is &lt;code>mnce&lt;/code>, their math test score in 1994 (after up to four years in a private school). We also have baseline performance: their math test score in 1990 (&lt;code>mnce90&lt;/code>). The variable &lt;code>choiceyrs&lt;/code> is the number of years actually enrolled in a private school, and &lt;code>selectyrs&lt;/code> is the number of years a student was &lt;em>selected&lt;/em> (via lottery) to receive a voucher.&lt;/p>
&lt;p>The lottery creates a natural instrument: being &lt;em>selected&lt;/em> for a voucher (which is random) affects the number of years &lt;em>enrolled&lt;/em> in a private school, but shouldn&amp;rsquo;t directly affect test scores through any other channel.&lt;/p>
&lt;h3 id="variables" class="unnumbered">Variables we&amp;rsquo;ll use&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">variable&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>mnce&lt;/code>&lt;/td>
&lt;td style="text-align:left">math score in 1994&lt;/td>
&lt;td style="text-align:left">outcome variable&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>mnce90&lt;/code>&lt;/td>
&lt;td style="text-align:left">math score in 1990&lt;/td>
&lt;td style="text-align:left">baseline performance&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>choiceyrs&lt;/code>&lt;/td>
&lt;td style="text-align:left">years enrolled in a choice school&lt;/td>
&lt;td style="text-align:left">endogenous variable&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>selectyrs&lt;/code>&lt;/td>
&lt;td style="text-align:left">years selected to receive a voucher&lt;/td>
&lt;td style="text-align:left">instrument&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>choiceyrs1&lt;/code>–&lt;code>choiceyrs4&lt;/code>&lt;/td>
&lt;td style="text-align:left">dummies for years in choice school&lt;/td>
&lt;td style="text-align:left">used in Q9&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>selectyrs1&lt;/code>–&lt;code>selectyrs4&lt;/code>&lt;/td>
&lt;td style="text-align:left">dummies for years selected for voucher&lt;/td>
&lt;td style="text-align:left">used in Q9&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>black&lt;/code>&lt;/td>
&lt;td style="text-align:left">Black indicator&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>hispanic&lt;/code>&lt;/td>
&lt;td style="text-align:left">Hispanic indicator&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>female&lt;/code>&lt;/td>
&lt;td style="text-align:left">female indicator&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="key-commands" class="unnumbered">Key commands &lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:left">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>ivregress 2sls y (x = z) controls, robust&lt;/code>&lt;/td>
&lt;td style="text-align:left">IV regression using two-stage least squares&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>ivregress 2sls y (x = z) controls, robust first&lt;/code>&lt;/td>
&lt;td style="text-align:left">Same, reporting first-stage results&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>predict yhat, xb&lt;/code>&lt;/td>
&lt;td style="text-align:left">Generate predicted values from the previous regression&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>testparm varname&lt;/code>&lt;/td>
&lt;td style="text-align:left">Test significance of a coefficient (F-statistic)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>outreg2 using file.xls, replace&lt;/code>&lt;/td>
&lt;td style="text-align:left">Export regression results to Excel (first column)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>outreg2 using file.xls, append&lt;/code>&lt;/td>
&lt;td style="text-align:left">Add a column to an existing results table&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="ivregress" class="unnumbered">Conducting IV regressions with &lt;code>ivregress&lt;/code>&lt;/h3>
&lt;p>General form:&lt;/p>
&lt;pre tabindex="0">&lt;code>ivregress estimator depvar [varlist1] (varlist2 = varlist_iv) [if] [in] [weight] [, options]
&lt;/code>&lt;/pre>&lt;ul>
&lt;li>&lt;code>estimator&lt;/code> is where we will type &lt;code>2sls&lt;/code>&lt;/li>
&lt;li>&lt;code>depvar&lt;/code> is your dependent variable&lt;/li>
&lt;li>You can include other explanatory variables before or after the parentheses, &lt;code>[varlist1]&lt;/code>&lt;/li>
&lt;li>In the parentheses, write your endogenous ($x$) then your instrument ($z$) — these can be lists!&lt;/li>
&lt;li>The rest of it is just as you&amp;rsquo;re used to&lt;/li>
&lt;/ul>
&lt;p>Example:&lt;/p>
&lt;p>To estimate the following two-stage least squares equation:
$$ rent = \beta_0 + \beta_1 \widehat{hsngval} + \beta_2 pcturban + u$$
where $\widehat{hsngval}$ is predicted from the following first-stage equation
$$ hsngval = \alpha_0 + \alpha_1 faminc + \alpha_2 pcturban + v $$&lt;/p>
&lt;pre tabindex="0">&lt;code>webuse hsng2
ivregress 2sls rent (hsngval = faminc) pcturban, robust
&lt;/code>&lt;/pre>&lt;p>You can add &lt;code>, first&lt;/code> to report the first-stage results:&lt;/p>
&lt;pre tabindex="0">&lt;code>ivregress 2sls rent (hsngval = faminc) pcturban, robust first
&lt;/code>&lt;/pre>&lt;h3 id="outreg2" class="unnumbered">Outputting your results with &lt;code>outreg2&lt;/code>&lt;/h3>
&lt;p>We are very good at reading raw Stata output. But raw Stata output has no place in our papers. How do we make it pretty? There are lots of ways, including &lt;code>putexcel&lt;/code>, which lets you create customizable Excel tables with your outputs (good for descriptive statistics), and &lt;code>estout&lt;/code>, which does the same thing but is more regression oriented.&lt;/p>
&lt;p>Personally, I like &lt;code>outreg2&lt;/code>, because it&amp;rsquo;s easy to set up and use. So that&amp;rsquo;s what we&amp;rsquo;ll use!&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Installation required:&lt;/strong> &lt;code>outreg2&lt;/code> is a user-created package, which means you have to install it first:&lt;/p>
&lt;pre tabindex="0">&lt;code>ssc install outreg2
&lt;/code>&lt;/pre>&lt;p>You only need to do this &lt;strong>once&lt;/strong> per computer. If you get an error that &lt;code>outreg2&lt;/code> is already installed, that&amp;rsquo;s fine — just keep going.&lt;/p>
&lt;/div>
&lt;/div>
&lt;p>You&amp;rsquo;ll run &lt;code>outreg2&lt;/code> after estimating a regression. It takes your results and saves them to a table. You can run it multiple times and generate columns of results within the same Excel sheet, which is pretty handy! The general format of &lt;code>outreg2&lt;/code> is this:&lt;/p>
&lt;pre tabindex="0">&lt;code>// You can copy and paste this into Stata, and it should work!
// Note that it will save to your working directory
sysuse auto, clear
// Specification 1
regress mpg foreign weight headroom trunk length turn displacement
outreg2 using myfile.xls, replace
// Specification 2 (add on)
regress mpg foreign weight headroom trunk length turn displacement, robust
outreg2 using myfile.xls, append
&lt;/code>&lt;/pre>&lt;p>You can customize with lots of options! (See &lt;code>help outreg2&lt;/code>, or check out
&lt;a href="https://thedatahall.com/stata-outreg2-part1/" target="_blank" rel="noopener">these resources&lt;/a>)&lt;/p>
&lt;p>What sort of things?&lt;/p>
&lt;ul>
&lt;li>Export directly to Word
&lt;ul>
&lt;li>&lt;code>outreg2 using myfile, word replace&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Add notes
&lt;ul>
&lt;li>&lt;code>outreg2 using myfile, addnote(Dummy variables not shown)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Report only some variables
&lt;ul>
&lt;li>&lt;code>outreg2 using myfile, keep(mpg foreign)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Modify number of decimal places
&lt;ul>
&lt;li>&lt;code>outreg2 using myfile, dec(5)&lt;/code>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>You can use a loop to make a whole set of columns!&lt;/li>
&lt;/ul>
&lt;p>An example:&lt;/p>
&lt;pre tabindex="0">&lt;code>
sysuse auto, clear
local r &amp;quot;replace&amp;quot;
forval num = 1/5 {
regress mpg weight headroom if rep78 == `num'
sum mpg if rep78 == `num'
local mean = `r(mean)'
outreg2 using myfile.xls, `r' keep(headroom) title(&amp;quot;Sample Graph&amp;quot;) nocons addtext(&amp;quot;Rep78&amp;quot;, `num') addstat(&amp;quot;Mean&amp;quot;, `mean') auto(2) bracket
local r &amp;quot;append&amp;quot;
}
&lt;/code>&lt;/pre>&lt;h2 id="workflow" class="unnumbered">Workflow overview&lt;/h2>
&lt;ol>
&lt;li>Load &lt;code>voucher.dta&lt;/code> and start your log file.&lt;/li>
&lt;li>Explore the data (&lt;code>summarize&lt;/code>, &lt;code>describe&lt;/code>).&lt;/li>
&lt;li>Estimate OLS regressions (naive estimates).&lt;/li>
&lt;li>Run the first stage and check instrument relevance.&lt;/li>
&lt;li>Estimate IV models (by hand and with &lt;code>ivregress&lt;/code>).&lt;/li>
&lt;li>Compare OLS and IV results.&lt;/li>
&lt;li>Create a summary table with &lt;code>outreg2&lt;/code>.&lt;/li>
&lt;/ol>
&lt;h2 id="lab-8-worksheet" class="unnumbered">Lab 8 Worksheet&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>Your written-up answers to exercise questions (1)–(10). This can be typed or written out then scanned (or photographed), in any reasonable format.&lt;/li>
&lt;li>The do-file you created that runs this analysis&lt;/li>
&lt;li>A log file that contains the results from this exercise.&lt;/li>
&lt;li>&lt;strong>A table with your regression results&lt;/strong> (six columns, from &lt;code>outreg2&lt;/code>). Include this with your written answers.&lt;/li>
&lt;/ul>
&lt;p>&lt;em>Use robust standard errors in all regressions.&lt;/em>&lt;/p>
&lt;h3 id="questions">Questions&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>In your do-file, start a log and open &lt;code>voucher.dta&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Summarize your data. Of the 990 students in the sample, how many were never awarded a
voucher? How many had a voucher for all four years? How many
actually attended a choice school for four years?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Hint:&lt;/strong> &lt;code>tab selectyrs&lt;/code> and &lt;code>tab choiceyrs&lt;/code> will show you the distribution.
&lt;/div>
&lt;/div>
&lt;ol start="3">
&lt;li>
&lt;p>Predict the relationship between choice school attendance and math
scores by regressing math scores &lt;code>mnce&lt;/code> (dependent variable) on
number of years enrolled in a choice school &lt;code>choiceyrs&lt;/code> (independent
variable). What do you find? Is this what you expect? What happens
if you add in the variables &lt;code>black&lt;/code>, &lt;code>hispanic&lt;/code>, and &lt;code>female&lt;/code>? Write
your results in equation form.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Why might &lt;code>choiceyrs&lt;/code> be endogenous? Explain.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Now, estimate a regression of &lt;code>choiceyrs&lt;/code> (dependent variable) on
&lt;code>selectyrs&lt;/code> (independent variable), including race/ethnicity and
gender controls. Why is this a reasonable choice of an instrument?
What is the F-statistic on &lt;code>selectyrs&lt;/code>?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Hint:&lt;/strong> Use &lt;code>testparm selectyrs&lt;/code> after the regression to get the F-statistic. A rule of thumb is that the F-statistic should be at least 10 for the instrument to be considered strong enough.
&lt;/div>
&lt;/div>
&lt;ol start="6">
&lt;li>Based on the previous regression, use the &lt;code>predict&lt;/code> command to
generate a predicted $\widehat{choiceyrs}$. Estimate the regression
of &lt;code>mnce&lt;/code> on $\widehat{choiceyrs}$, including race/ethnicity and
gender controls. Write the estimated equation. How does your result
compare to your OLS estimate?&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Reminder:&lt;/strong> The &lt;code>predict&lt;/code> command generates fitted values from the most recently estimated regression. Run it immediately after the Q5 regression — before running anything else:&lt;/p>
&lt;pre tabindex="0">&lt;code>predict choiceyrs_hat, xb
&lt;/code>&lt;/pre>&lt;p>Then use &lt;code>choiceyrs_hat&lt;/code> as your independent variable in the second-stage regression.&lt;/p>
&lt;/div>
&lt;/div>
&lt;ol start="7">
&lt;li>Re-estimate a regression of &lt;code>mnce&lt;/code> (dependent variable) on
&lt;code>choiceyrs&lt;/code> (independent variable) using &lt;code>selectyrs&lt;/code> as an
instrument for &lt;code>choiceyrs&lt;/code>. This time, estimate the equation in one command line using &lt;code>ivregress 2sls&lt;/code>. How do your
results change, if at all?&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Example syntax:&lt;/strong>&lt;/p>
&lt;pre tabindex="0">&lt;code>ivregress 2sls mnce (choiceyrs = selectyrs) black hispanic female, robust
&lt;/code>&lt;/pre>&lt;p>&lt;strong>Important:&lt;/strong> The &lt;em>coefficients&lt;/em> from Q6 and Q7 should be the same, but the &lt;em>standard errors&lt;/em> will differ. That&amp;rsquo;s because the manual approach (Q6) doesn&amp;rsquo;t correctly account for the fact that $\widehat{choiceyrs}$ is a generated regressor. &lt;code>ivregress&lt;/code> adjusts the standard errors automatically — always use it in practice.&lt;/p>
&lt;/div>
&lt;/div>
&lt;ol start="8">
&lt;li>Repeat your IV analysis, but this time include a control for
baseline achievement by adding &lt;code>mnce90&lt;/code>. Write the results in
equation form below. Do you find these results convincing? Explain.&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Heads up:&lt;/strong> &lt;code>mnce90&lt;/code> is missing for many students — your sample will drop from 990 to about 328 observations. This is expected. Think about what it means for your results.
&lt;/div>
&lt;/div>
&lt;ol start="9">
&lt;li>
&lt;p>We can also use multiple instruments for multiple endogenous
variables. The variables &lt;code>choiceyrs1&lt;/code>, &lt;code>choiceyrs2&lt;/code>, etc. are dummy
variables indicating the different number of years a student could
have been in a choice school. Similarly, &lt;code>selectyrs1&lt;/code>, &lt;code>selectyrs2&lt;/code>,
etc. have a similar definition, but for being selected from the
lottery.&lt;/p>
&lt;p>Here, &lt;code>choiceyrs1&lt;/code> = 1 if the student attended a choice school for exactly 1 year, &lt;code>choiceyrs2&lt;/code> = 1 for exactly 2 years, and so on. The &lt;code>selectyrs1&lt;/code>–&lt;code>selectyrs4&lt;/code> variables are defined analogously for lottery selection.&lt;/p>
&lt;p>Estimate the following equation using IV: $$\begin{split}
mnce &amp;amp;= \beta_0 + \beta_1 choiceyrs_1 + \beta_2 choiceyrs_2 + \beta_3 choiceyrs_3 + \beta_4 choiceyrs_4 + \
&amp;amp; \beta_5 black + \beta_6 hispanic + \beta_7 female + \beta_8 mnce90 + u
\end{split}$$&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Hint:&lt;/strong> Put all the endogenous variables on the left of the &lt;code>=&lt;/code> and all the instruments on the right:&lt;/p>
&lt;pre tabindex="0">&lt;code>ivregress 2sls mnce ///
(choiceyrs1 choiceyrs2 choiceyrs3 choiceyrs4 = ///
selectyrs1 selectyrs2 selectyrs3 selectyrs4) ///
black hispanic female mnce90, robust
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;ol start="10">
&lt;li>
&lt;p>Finally, go back through your regressions in your do-file. After
each regression (there should be six: OLS without controls, OLS with
controls, IV by hand, IV using &lt;code>ivregress&lt;/code>, IV with &lt;code>mnce90&lt;/code>, and IV
with multiple instruments), add a line of code to output the results
to a Word or Excel file using &lt;code>outreg2&lt;/code>.&lt;/p>
&lt;p>&lt;strong>Include a table with your results with your submission&lt;/strong> — there
should be six columns in one table.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Hint:&lt;/strong> Use &lt;code>replace&lt;/code> for the first regression and &lt;code>append&lt;/code> for each subsequent one:&lt;/p>
&lt;pre tabindex="0">&lt;code>regress mnce choiceyrs, robust
outreg2 using lab8_results.xls, replace
&lt;p>regress mnce choiceyrs black hispanic female, robust
outreg2 using lab8_results.xls, append&lt;/p>
&lt;p>// &amp;hellip; continue for remaining regressions
&lt;/code>&lt;/pre>&lt;/p>
&lt;/div>
&lt;/div>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Submission checklist&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Answers to questions (1)–(10)&lt;/li>
&lt;li>Do-file with comments for each question&lt;/li>
&lt;li>Log file that matches your do-file commands&lt;/li>
&lt;li>&lt;code>outreg2&lt;/code> table (six columns)&lt;/li>
&lt;li>Make sure your do-file includes &lt;code>log close&lt;/code> at the end&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;p>References: Rouse, Cecilia Elena (1998), &amp;ldquo;Private School Vouchers and
Student Achievement: An Evaluation of the Milwaukee Parental Choice
Program,&amp;rdquo; &lt;em>The Quarterly Journal of Economics&lt;/em> 113(2), 553-602.&lt;/p></description></item><item><title>Research Paper: Presentation</title><link>https://econ3500s26.netlify.app/assignment/rp-07-presentation/</link><pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-07-presentation/</guid><description>&lt;p>&lt;strong>Due date is rolling: your slides are due the day of your assigned presentation. Because we&amp;rsquo;re running out of class, we can&amp;rsquo;t do extensions!&lt;/strong>&lt;/p>
&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>Let&amp;rsquo;s share our work! You&amp;rsquo;ll prepare and deliver a 6-8 minute presentation of your paper, with accompanying slides. This will be one presentation per paper. If you are working with a partner, it should be one set of slides, with both members contributing to the creation and delivery.&lt;/p>
&lt;p>Our objectives here are the following:&lt;/p>
&lt;ol>
&lt;li>To communicate the key elements off your paper clearly and concisely (which, in turn, will help advance your paper)&lt;/li>
&lt;li>To share and receive feedback on areas of improvement before finalizing your papers&lt;/li>
&lt;li>To share with your peers and learn what each of you has been working on!&lt;/li>
&lt;/ol>
&lt;h2 id="guidelines-for-presentation">Guidelines for presentation&lt;/h2>
&lt;p>Your presentation should cover the main elements of your paper:&lt;/p>
&lt;ol>
&lt;li>Introduction&lt;/li>
&lt;li>Research questions&lt;/li>
&lt;li>Background/motivation/related literature&lt;/li>
&lt;li>Data&lt;/li>
&lt;li>Empirical strategy&lt;/li>
&lt;li>Results&lt;/li>
&lt;li>Limitations/discussion&lt;/li>
&lt;li>Conclusion&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>It doesn&amp;rsquo;t need to be scripted, but you should have practiced, such that your presentation flows smoothly.&lt;/li>
&lt;li>Slides should serve as a guide, not a substitute for your narration, so you should avoid reading directly off your slides&lt;/li>
&lt;li>It should be between 6 and 8 minutes. &lt;strong>I will cut you off at 8 minutes. I will do it.&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h2 id="example">Example&lt;/h2>
&lt;div class="slides-container">
&lt;p>&lt;a href="https://econ3500s26.netlify.app/slides/rp-presentation-sample.pdf" class="btn btn-primary">📥 Download this week's slides&lt;/a>&lt;/p>
&lt;figure class="slide-thumb">
&lt;a href="https://econ3500s26.netlify.app/slides/rp-presentation-sample.pdf" target="_blank">
&lt;img src="https://econ3500s26.netlify.app/slides/rp-presentation-sample.png" alt="First slide" />
&lt;/a>
&lt;/figure>
&lt;/div>
&lt;p>Additional student examples are available on Brightspace (see &lt;strong>Gated Resources&lt;/strong>)&lt;/p>
&lt;h2 id="deliverables-and-due-dates">Deliverables and due dates&lt;/h2>
&lt;p>Your due date is the day you signed up for via Doodle poll: &lt;strong>April 28 or April 30 at 1:15pm&lt;/strong>&lt;/p>
&lt;p>&lt;strong>On Brightspace:&lt;/strong> Submit a copy of your slides &lt;strong>before&lt;/strong> your presentation.&lt;/p>
&lt;h2 id="presentation-rubric">Presentation rubric&lt;/h2>
&lt;p>
&lt;a href="../materials/RP_presentation_rubric.pdf">&lt;strong>Printer-friendly PDF&lt;/strong>&lt;/a>&lt;/p></description></item><item><title>Research Paper: Final Submission</title><link>https://econ3500s26.netlify.app/assignment/rp-08-final-submission/</link><pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-08-final-submission/</guid><description>&lt;h2 id="research-paper---final-submission">Research Paper - Final Submission&lt;/h2>
&lt;p>Research papers are fairly formulaic, and that&amp;rsquo;s a good thing - it helps readers know where to look for information, depending on what they want to get out of it.&lt;/p>
&lt;h2 id="what-should-i-submit">What should I submit?&lt;/h2>
&lt;p>Your paper is due at &lt;strong>11:59pm May 04&lt;/strong>. &lt;em>I can accept extensions only up to May 05&lt;/em>, as there are external grading deadlines I need to meet.&lt;/p>
&lt;p>You should submit the following (see rubric for details):&lt;/p>
&lt;ul>
&lt;li>Final paper in pdf or docx format (must include an AI attribution statement — see rubric)&lt;/li>
&lt;li>Stata do-file with all analysis you conducted&lt;/li>
&lt;li>Stata log file with results for analysis conducted in your do-file.&lt;/li>
&lt;/ul>
&lt;p>I will grade your papers following the rubric. If you would like me to share comments, you must &lt;em>opt-in&lt;/em> by filling out the
&lt;a href="https://forms.gle/kxGcNvFzzGtduGQ99" target="_blank" rel="noopener">feedback survey&lt;/a>. If you do not fill it out, you will not receive feedback!&lt;/p>
&lt;p>Review the
&lt;a href="../../bonus/research-paper-checklist">research paper checklist&lt;/a> for lots of suggestions.&lt;/p>
&lt;h2 id="rubric">Rubric&lt;/h2>
&lt;!-- Download rubric [here](../materials/RP_Rubric_F20.pdf) -->
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Total: 102 marks&lt;/th>
&lt;th>100 = Excellent&lt;/th>
&lt;th>80 = Adequate&lt;/th>
&lt;th>60 = Marginal&lt;/th>
&lt;th>40 = Poor&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Motivation/Literature (18 marks)&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Introduction&lt;/td>
&lt;td>Introduction provides complete overview of paper, motivates research question using sources&lt;/td>
&lt;td>Introduction provides some overiew of paper, motivation clear with limited sources&lt;/td>
&lt;td>Introduction vague; motivation minimal&lt;/td>
&lt;td>Incomplete introduction, no motivation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Research question&lt;/td>
&lt;td>Research question well identified, specific&lt;/td>
&lt;td>Research question stated, not specific&lt;/td>
&lt;td>Research question vague, not answerable&lt;/td>
&lt;td>Cannot identify research question in paper&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Literature&lt;/td>
&lt;td>Important literature discussed and linked to topic&lt;/td>
&lt;td>Important literature included, not linked to research question/paper&lt;/td>
&lt;td>Scattered lit. discussion, poorly linked to topic (missing or irrelevant papers)&lt;/td>
&lt;td>Sparse literature, not linked to topic&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Methodology/Analysis (30 marks)&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Data&lt;/td>
&lt;td>Clear discussion of data sources and any data cleaning; data cleaned appropriately&lt;/td>
&lt;td>Data sources referenced but incomplete discussion; some data issues overlooked&lt;/td>
&lt;td>Limited discussion of data&lt;/td>
&lt;td>No discussion of data sources or cleaning&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Empirical methods&lt;/td>
&lt;td>Methodology discussed and empirical methods applied correctly&lt;/td>
&lt;td>Methodology generally correct, with some issues overlooked&lt;/td>
&lt;td>Major errors in empirical methods&lt;/td>
&lt;td>Fundamental misunderstanding of empirical methods/no microdata used&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Discussion of results&lt;/td>
&lt;td>Results discussed and interpreted clearly&lt;/td>
&lt;td>Results discussed, but inadequate interpretation&lt;/td>
&lt;td>Results presented without interpretation&lt;/td>
&lt;td>Poor discussion of results, no interpretation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Choice of evidence&lt;/td>
&lt;td>Presented evidence addresses research question, is well utilized&lt;/td>
&lt;td>Presented evidence related, only partially addresses research question&lt;/td>
&lt;td>Evidence related, but not directly relevant to research question.&lt;/td>
&lt;td>Evidence does not address research question&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Figures and tables&lt;/td>
&lt;td>Figures and tables appropriate to analysis, easy to interpret&lt;/td>
&lt;td>Appropriate figures/tables included, difficult to interpret&lt;/td>
&lt;td>Irrelevant figures/tables included or key figures/tables missing&lt;/td>
&lt;td>Insufficient figures/tables, poorly presented&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Limitations&lt;/td>
&lt;td>Limitations discussed and minimized through analysis&lt;/td>
&lt;td>Limitations discussed, few steps to minimize&lt;/td>
&lt;td>Incomplete discussion of limitations&lt;/td>
&lt;td>No discussion of limitations&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Conclusions/interpretation (18 marks)&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Conclusions&lt;/td>
&lt;td>Clear presentation of conclusions, qualifications, consequences, and contributions&lt;/td>
&lt;td>Conclusions established, limited discussion implications and contributions&lt;/td>
&lt;td>Fails to make clear conclusions, limited discussion of interpretation/contributions&lt;/td>
&lt;td>Cannot discern conclusions&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Critical thinking&lt;/td>
&lt;td>Demonstrates independent and critical thinking&lt;/td>
&lt;td>Demonstrates some independent and critical thinking&lt;/td>
&lt;td>Limited evidence of independent and critical thinking&lt;/td>
&lt;td>No evidence of independent and critical thinking&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Argumentation&lt;/td>
&lt;td>Assertions are qualified and well supported&lt;/td>
&lt;td>Most assertions are qualified and well supported&lt;/td>
&lt;td>Assertions are overly strong or unsupported&lt;/td>
&lt;td>Assertions made in contrast to evidence or without evidence&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Written presentation (24 marks)&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Organization&lt;/td>
&lt;td>Well organized, easy to understand&lt;/td>
&lt;td>Good organization, some parts out of place&lt;/td>
&lt;td>Unclear organization&lt;/td>
&lt;td>Disorganized, impedes understanding&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Writing style&lt;/td>
&lt;td>Clear and easy to read&lt;/td>
&lt;td>Awkward or wordy writing, clear planning&lt;/td>
&lt;td>Readable but difficult to follow&lt;/td>
&lt;td>Difficult to understand&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Grammar&lt;/td>
&lt;td>Few grammatical and typographical errors&lt;/td>
&lt;td>Some grammatical and typographical errors, but do not impede understanding&lt;/td>
&lt;td>Moderate grammatical errors/typos&lt;/td>
&lt;td>Frequent errors impede understanding&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Formatting&lt;/td>
&lt;td>Meets all formatting requirements&lt;/td>
&lt;td>Minor deviation from formatting requirements&lt;/td>
&lt;td>Exceeds page limit/major deviation from formatting requirements&lt;/td>
&lt;td>Formatting requirements completely disregarded&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Replication code (10 marks)&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Do-files and log&lt;/td>
&lt;td>Well-documented, easy to read&lt;/td>
&lt;td>Detailed documentation, somewhat confusing&lt;/td>
&lt;td>Unclear documentation&lt;/td>
&lt;td>Little to no documentation&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AI Attribution (2 marks)&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AI use statement&lt;/td>
&lt;td>Statement identifies all AI tools used and their specific purposes (or explicitly states no AI was used)&lt;/td>
&lt;td>—&lt;/td>
&lt;td>—&lt;/td>
&lt;td>No statement included, or statement does not identify tools and purposes&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>Lab 7: Difference in differences</title><link>https://econ3500s26.netlify.app/assignment/07-lab/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/07-lab/</guid><description>&lt;p>&lt;strong>
&lt;a href="../07-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;h2 id="materials" class="unnumbered">Materials&lt;/h2>
&lt;ul>
&lt;li>
&lt;a href="../materials/banks.dta">&lt;code>banks.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>
&lt;a href="../materials/nsly_marijuana.dta">&lt;code>nsly_marijuana.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="objectives" class="unnumbered">Objectives&lt;/h2>
&lt;p>There are two separate parts to this lab — a set of data for working with difference-in-differences models, and another set for working with fixed-effects models.&lt;/p>
&lt;p>By the end of this lab, you should be able to complete the following
tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Estimate and interpret difference-in-differences models&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate panel data models using dummy variables&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Interpret panel data models&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="panel-data" class="unnumbered">What is panel data?&lt;/h3>
&lt;p>Up to now, we&amp;rsquo;ve worked with &lt;strong>cross-sectional&lt;/strong> data — one observation per person (or state, or county) at a single point in time. In this lab, we&amp;rsquo;ll work with &lt;strong>panel data&lt;/strong> (also called longitudinal data), where we observe the &lt;em>same&lt;/em> individuals or units across &lt;em>multiple&lt;/em> time periods.&lt;/p>
&lt;p>Panel data lets us control for characteristics of each unit that don&amp;rsquo;t change over time — even ones we can&amp;rsquo;t directly measure — by comparing each unit to &lt;em>itself&lt;/em> over time. This is the key idea behind &lt;strong>fixed effects&lt;/strong> models.&lt;/p>
&lt;h3 id="did-intuition" class="unnumbered">What is difference-in-differences?&lt;/h3>
&lt;p>Difference-in-differences (DiD) is a method for estimating causal effects when one group is exposed to a treatment and another is not. The idea: compare how the outcome changed over time for the &lt;strong>treatment group&lt;/strong> vs. the &lt;strong>control group&lt;/strong>. The first difference removes time-invariant characteristics of each group; the second difference removes common time trends. What&amp;rsquo;s left is the estimated treatment effect — &lt;em>if&lt;/em> the two groups would have trended the same way absent the treatment.&lt;/p>
&lt;h2 id="key-commands" class="unnumbered">Key commands &lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>xtset panelvar timevar&lt;/code>&lt;/td>
&lt;td style="text-align:right">Declare your data as a panel (e.g., &lt;code>xtset id year&lt;/code>)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>xtreg y x, fe&lt;/code>&lt;/td>
&lt;td style="text-align:right">Panel regression with fixed effects on &lt;code>panelvar&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>xtreg y x, fe cluster(panelvar)&lt;/code>&lt;/td>
&lt;td style="text-align:right">Same, with clustered standard errors&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>i.varname&lt;/code>&lt;/td>
&lt;td style="text-align:right">Add fixed effects for every value of &lt;code>varname&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>xi: reg y i.varname&lt;/code>&lt;/td>
&lt;td style="text-align:right">Same as above, but works with string variables&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>areg y x, absorb(varname)&lt;/code>&lt;/td>
&lt;td style="text-align:right">Absorb fixed effects (estimated but not reported)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="xtset" class="unnumbered">Using &lt;code>xtset&lt;/code> and &lt;code>xtreg&lt;/code>&lt;/h3>
&lt;p>The &lt;code>xtset&lt;/code> command tells Stata that you have panel data. For example, if you have individual and year data, then you would enter &lt;code>xtset id year&lt;/code>, or whatever the appropriate variable names are.&lt;/p>
&lt;p>General format: &lt;code>xtset panelvar timevar&lt;/code>&lt;/p>
&lt;p>After declaring your panel with &lt;code>xtset&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>Use &lt;code>xtreg&lt;/code> instead of &lt;code>regress&lt;/code> for panel regression. Everything else proceeds as normal.&lt;/li>
&lt;li>Add &lt;code>,fe&lt;/code> to estimate a fixed effects model, where the fixed effects are the &lt;code>panelvar&lt;/code> variable you declared.&lt;/li>
&lt;li>Add &lt;code>cluster(panelvar)&lt;/code> to cluster standard errors at the panel level (accounts for correlation within units over time).&lt;/li>
&lt;/ul>
&lt;p>For example: &lt;code>xtreg income education i.year, fe cluster(id)&lt;/code> regresses income on education with individual fixed effects (from &lt;code>xtset&lt;/code>) and year fixed effects (from &lt;code>i.year&lt;/code>), clustering standard errors at the individual level.&lt;/p>
&lt;h3 id="other-fe" class="unnumbered">Adding other fixed effects&lt;/h3>
&lt;p>You can add fixed effects to a model more generally with the &lt;code>i.&lt;/code> prefix or &lt;code>areg&lt;/code>. A few examples:&lt;/p>
&lt;pre tabindex="0">&lt;code>xi: reg income i.educ i.bpl, robust
reg income i.educ i.bpl, robust
areg income i.educ, robust absorb(bpl)
&lt;/code>&lt;/pre>&lt;ol>
&lt;li>&lt;code>xi:&lt;/code> — this prefix is necessary for adding &lt;code>i.&lt;/code> variables if the variables are in string form. You can also use it to do fancier interactions with fixed effects, like &lt;code>xi: reg income i.educ*i.bpl, robust&lt;/code>&lt;/li>
&lt;li>You can exclude the prefix and just do &lt;code>i.var&lt;/code> to create indicator variables so long as your variable is &lt;em>numeric&lt;/em>&lt;/li>
&lt;li>You can use &lt;code>areg&lt;/code> to &amp;ldquo;absorb&amp;rdquo; a set of fixed effects — they will not be reported in your output, but they will be estimated. This method is less efficient than &lt;code>xtreg&lt;/code> because you use up degrees of freedom.&lt;/li>
&lt;/ol>
&lt;h2 id="workflow" class="unnumbered">Workflow overview&lt;/h2>
&lt;ol>
&lt;li>Load a dataset and start your log file.&lt;/li>
&lt;li>Explore the data structure (&lt;code>describe&lt;/code>, &lt;code>browse&lt;/code>, &lt;code>tab&lt;/code>).&lt;/li>
&lt;li>For Part A: Calculate the DiD estimator by hand, then estimate it as a regression.&lt;/li>
&lt;li>For Part B: Declare your panel data and estimate fixed-effects models.&lt;/li>
&lt;li>Compare results across specifications and interpret.&lt;/li>
&lt;li>Answer the worksheet questions.&lt;/li>
&lt;/ol>
&lt;h2 id="lab-7-worksheet" class="unnumbered">Lab 7 Worksheet&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>Your written-up answers to exercise questions (1)–(18). This can be typed or written out then scanned (or photographed), in any reasonable format.&lt;/li>
&lt;li>The do-file(s) you created that run this analysis&lt;/li>
&lt;li>A log file that contains the results from this exercise.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h3 id="part-a" class="unnumbered">Part A: Difference-in-differences&lt;/h3>
&lt;p>This part looks at a simple difference-in-differences model based on Richardson and Troost (2009).&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/p>
&lt;h4 id="data-context-a" class="unnumbered">Data context&lt;/h4>
&lt;p>Mississippi is split between two Federal Reserve Districts. During the early years of the Great Depression, each district took a different approach to bank runs. The Sixth District increased lending, while the Eighth District responded by restricting lending to threatened banks. We look at the impact of these policies on bank survival rates using difference-in-differences.&lt;/p>
&lt;p>Each row in &lt;code>banks.dta&lt;/code> represents a Federal Reserve district in a given year. The dataset is small — use &lt;code>browse&lt;/code> to see the full thing.&lt;/p>
&lt;h4 id="variables-a" class="unnumbered">Variables (Part A)&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">variable&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>district&lt;/code>&lt;/td>
&lt;td style="text-align:left">Federal Reserve district&lt;/td>
&lt;td style="text-align:left">6 or 8&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>year&lt;/code>&lt;/td>
&lt;td style="text-align:left">year&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>bib&lt;/code>&lt;/td>
&lt;td style="text-align:left">number of banks in business&lt;/td>
&lt;td style="text-align:left">outcome variable&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Tip: use &lt;code>describe&lt;/code> and &lt;code>browse&lt;/code> to confirm the variable names in your dataset.&lt;/p>
&lt;h4 id="questions">Questions&lt;/h4>
&lt;p>&lt;em>Use robust standard errors in all regressions.&lt;/em>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Start a new do-file and change directory to your working directory.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In your do-file, start a log and open
&lt;a href="../materials/banks.dta">&lt;code>banks.dta&lt;/code>&lt;/a>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using pencil &amp;amp; paper or electronic means of your choosing (you don&amp;rsquo;t need to do this in Stata), plot a graph of the number of banks in business, by district, by year.&lt;/p>
&lt;ul>
&lt;li>Plot number of banks in business on the y-axis and year on the x-axis.&lt;/li>
&lt;li>Include only the years 1930 and 1931.&lt;/li>
&lt;li>Draw separate lines for the numbers of banks in District 6 and District 8.&lt;/li>
&lt;li>Draw a dotted &amp;ldquo;counterfactual&amp;rdquo; line based on your understanding of the change in bank policies.&lt;/li>
&lt;li>Mark all four actual values clearly.&lt;/li>
&lt;/ul>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;strong>Hint:&lt;/strong> The counterfactual line shows what &lt;em>would&lt;/em> have happened to District 8 if it had followed the same trend as District 6. To draw it: start from District 8&amp;rsquo;s 1930 value and apply the same change that District 6 experienced between 1930 and 1931.
&lt;/div>
&lt;/div>
&lt;ol start="4">
&lt;li>
&lt;p>First, we&amp;rsquo;re going to calculate a difference-in-difference estimator by hand between 1930 and 1931. Using the &lt;code>browse&lt;/code> command, fill in $x$ values from the following table:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Number of banks in business&lt;/th>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>District&lt;/td>
&lt;td>1930&lt;/td>
&lt;td>1931&lt;/td>
&lt;td>1931-1930&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>District 6&lt;/td>
&lt;td>x&lt;/td>
&lt;td>x&lt;/td>
&lt;td>x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>District 8&lt;/td>
&lt;td>x&lt;/td>
&lt;td>x&lt;/td>
&lt;td>x&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>District 8 - District 6&lt;/td>
&lt;td>x&lt;/td>
&lt;td>x&lt;/td>
&lt;td>x&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>What is the difference-in-difference estimator?&lt;/p>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;strong>Hint:&lt;/strong> Use &lt;code>browse&lt;/code> or &lt;code>list if year == 1930 | year == 1931&lt;/code> to see the values you need.
&lt;/div>
&lt;/div>
&lt;ol start="5">
&lt;li>
&lt;p>Now, generate the following variables:&lt;/p>
&lt;ul>
&lt;li>&lt;code>treat&lt;/code>: a binary variable equal to 1 for District 8 and 0 otherwise&lt;/li>
&lt;li>&lt;code>post&lt;/code>: a binary variable equal to 1 for the year 1931 or greater&lt;/li>
&lt;li>&lt;code>treatXpost = treat*post&lt;/code>&lt;/li>
&lt;/ul>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;p>&lt;strong>Hint:&lt;/strong> Use &lt;code>tab district&lt;/code> and &lt;code>tab year&lt;/code> to check the values before generating your variables. For example:&lt;/p>
&lt;pre tabindex="0">&lt;code>gen treat = district == 8
gen post = year &amp;gt;= 1931
gen treatXpost = treat * post
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;ol start="6">
&lt;li>
&lt;p>Using the above variables, estimate the impact of looser lending restrictions on the number of banks using a difference-in-difference estimator, &lt;strong>restricting the sample to 1930 and 1931&lt;/strong>. Write your estimates in equation form.&lt;/p>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;p>&lt;strong>Reminder:&lt;/strong> You can restrict the sample &lt;em>within&lt;/em> a regression using &lt;code>if&lt;/code> without dropping data:&lt;/p>
&lt;pre tabindex="0">&lt;code>regress bib treat post treatXpost if year == 1930 | year == 1931, robust
&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;ol start="7">
&lt;li>
&lt;p>Now estimate the same regression (same variables), but remove the sample restriction so all years are included. What is the overall impact of looser lending restrictions on bank survival? Write your estimates in equation form.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>State clearly the assumption needed to interpret these difference-in-difference estimators as causal.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;hr>
&lt;h3 id="part-b" class="unnumbered">Part B: Fixed effects&lt;/h3>
&lt;p>Next, we&amp;rsquo;re going to look at the relationship between marijuana use and income using the National Longitudinal Survey of Youth 1997 Cohort (NLSY97).&lt;/p>
&lt;h4 id="data-context-b" class="unnumbered">Data context&lt;/h4>
&lt;p>Each row in &lt;code>nsly_marijuana.dta&lt;/code> is an individual-year observation from the NLSY97 — the same people surveyed across multiple years. This is &lt;strong>panel data&lt;/strong>: we observe the same individuals over time, which lets us control for time-invariant individual characteristics (like innate ability or family background) using fixed effects.&lt;/p>
&lt;h4 id="variables-b" class="unnumbered">Variables (Part B)&lt;/h4>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">variable&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>id&lt;/code>&lt;/td>
&lt;td style="text-align:left">individual identifier&lt;/td>
&lt;td style="text-align:left">use with &lt;code>xtset&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>year&lt;/code>&lt;/td>
&lt;td style="text-align:left">survey year (1997–2011)&lt;/td>
&lt;td style="text-align:left">use with &lt;code>xtset&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>income&lt;/code>&lt;/td>
&lt;td style="text-align:left">total wage and salary income&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>marij&lt;/code>&lt;/td>
&lt;td style="text-align:left">used marijuana in past year&lt;/td>
&lt;td style="text-align:left">1 = yes, 0 = no&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>gender&lt;/code>&lt;/td>
&lt;td style="text-align:left">gender&lt;/td>
&lt;td style="text-align:left">1 = male, 2 = female&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>race&lt;/code>&lt;/td>
&lt;td style="text-align:left">race/ethnicity&lt;/td>
&lt;td style="text-align:left">4 categories (use &lt;code>tab race&lt;/code> to see labels)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h4 id="questions-1">Questions&lt;/h4>
&lt;ol start="9">
&lt;li>
&lt;p>Now switch to the second dataset. Open
&lt;a href="../materials/nsly_marijuana.dta">&lt;code>nsly_marijuana.dta&lt;/code>&lt;/a> in your do-file.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If starting a new do-file, set your working directory and start a log. (You can also continue in the same do-file from Part A.)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>How many individuals are in the data? How many years are they observed?&lt;/p>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;strong>Hint:&lt;/strong> Try &lt;code>codebook id&lt;/code> to see the number of unique individuals, and &lt;code>tab year&lt;/code> to see which years are in the data.
&lt;/div>
&lt;/div>
&lt;ol start="12">
&lt;li>
&lt;p>Estimate a regression of whether marijuana use (&lt;code>marij&lt;/code>) affects income, with no additional controls. Report your results in equation form.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate a regression of whether marijuana use affects income, but add any controls you deem important (from the relatively limited selection available — use &lt;code>describe&lt;/code> to see what&amp;rsquo;s there). There is no single correct answer — use your judgment and explain your choices. How do the results change? Report your results in equation form.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>One way to estimate fixed effects models is to use &lt;code>xtreg&lt;/code> with the &lt;code>,fe&lt;/code> option. Use &lt;code>xtset&lt;/code> to tell Stata you have panel data, then estimate a fixed-effects regression of whether marijuana use affects income.&lt;/p>
&lt;p>Your model should include:&lt;/p>
&lt;ul>
&lt;li>Individual-level fixed effects (these come from &lt;code>xtreg ... , fe&lt;/code>)&lt;/li>
&lt;li>Year-level fixed effects (add &lt;code>i.year&lt;/code> to your regression)&lt;/li>
&lt;li>Clustered standard errors at the individual level&lt;/li>
&lt;/ul>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;p>&lt;strong>Step by step:&lt;/strong>&lt;/p>
&lt;pre tabindex="0">&lt;code>xtset id year
xtreg income marij i.year, fe cluster(id)
&lt;/code>&lt;/pre>&lt;p>Clustering standard errors at the &lt;code>id&lt;/code> level accounts for the fact that observations from the same person across years are not independent.&lt;/p>
&lt;/div>
&lt;/div>
&lt;ol start="15">
&lt;li>
&lt;p>What is the coefficient on &lt;code>marij&lt;/code>? What is the interpretation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>After adding fixed effects, should you include controls for gender and race/ethnicity to reduce omitted variable bias? Why or why not?&lt;/p>
&lt;div class="alert alert-note">
&lt;/li>
&lt;/ol>
&lt;div>
&lt;strong>Think about it:&lt;/strong> What happens to a variable that &lt;em>never changes&lt;/em> within an individual when you include individual fixed effects?
&lt;/div>
&lt;/div>
&lt;ol start="17">
&lt;li>
&lt;p>How do your results in question 14 using fixed effects compare to your results in questions 12 and 13? Why do they differ?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Name one specific factor that would create omitted variable bias in the pooled OLS regressions (questions 12–13) but is controlled for by fixed effects.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>Based on Chapter 5 of &lt;em>Mastering &amp;lsquo;Metrics&lt;/em>.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Research Paper: Referee Report</title><link>https://econ3500s26.netlify.app/assignment/rp-05-referee/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-05-referee/</guid><description>&lt;h2 id="purpose">Purpose:&lt;/h2>
&lt;ul>
&lt;li>To practice generating constructive criticism for imperfect work&lt;/li>
&lt;li>To link what we&amp;rsquo;ve learned in class to an original analysis&lt;/li>
&lt;li>To generate useful feedback for your peers&lt;/li>
&lt;/ul>
&lt;h2 id="key-elements">Key Elements&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Summary of the paper – to convey to the editor that you understand the paper.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Major comments&lt;/p>
&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Big-picture things that the author could make to improve the paper&lt;/li>
&lt;li>2-4 major comments is sufficient&lt;/li>
&lt;li>Not enough to criticize!&lt;/li>
&lt;/ul>
&lt;ol start="3">
&lt;li>Minor comments:&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>Clarifying questions, areas that are unclear&lt;/li>
&lt;li>Small changes the author could make to improve the paper – add an additional covariate, try adjusting the specification, etc.&lt;/li>
&lt;li>Depending on paper, you could have just a few, or you could have a lot&lt;/li>
&lt;li>Don’t copy edit the paper!&lt;/li>
&lt;/ul>
&lt;h2 id="what-should-this-look-like">What should this look like?&lt;/h2>
&lt;ul>
&lt;li>All the components above
&lt;ul>
&lt;li>At least 2 major comments (substantial suggestions)&lt;/li>
&lt;li>At least 3 minor comments (minor suggestions)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Will probably be 1-2 pages
&lt;ul>
&lt;li>3 pages = overkill, &amp;lt; 1 page, dig in deeper!&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Written in collegial tone&lt;/li>
&lt;/ul>
&lt;h2 id="submission">Submission&lt;/h2>
&lt;ul>
&lt;li>Submit your referee report as a Word or PDF document&lt;/li>
&lt;li>&lt;em>And&lt;/em> send directly to your partner&lt;/li>
&lt;/ul>
&lt;p>Note that I&amp;rsquo;m assigning referee partners at an individual level, not at a paper level. If you are working with a partner, each of you will complete a referee report, and your paper will receive two reviews.&lt;/p></description></item><item><title>Research Paper: Rough Draft</title><link>https://econ3500s26.netlify.app/assignment/rp-06-roughdraft/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-06-roughdraft/</guid><description>&lt;p>This is &lt;strong>optional&lt;/strong>! If you would like feedback on your rough draft, submit it to me and I&amp;rsquo;ll get back to you within a few days.&lt;/p>
&lt;p>&lt;strong>I cannot accept extensions on the rough draft.&lt;/strong> However, partial drafts are very welcome — submit whatever you have and I&amp;rsquo;ll give you feedback on it.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Have Stata/coding questions? Struggling with framing? Rae is very available for email support — don&amp;rsquo;t hesitate to reach out to them!
&lt;/div>
&lt;/div></description></item><item><title>Lab 6: Internal validity and LPM</title><link>https://econ3500s26.netlify.app/assignment/06-lab/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/06-lab/</guid><description>&lt;p>&lt;strong>
&lt;a href="../06-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;h2 id="materials" class="unnumbered">Materials&lt;/h2>
&lt;ul>
&lt;li>
&lt;a href="../materials/acs2024_4pct.dta">&lt;code>acs2024_4pct.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="objectives" class="unnumbered">Objectives&lt;/h2>
&lt;p>Today we&amp;rsquo;re going to work with
&lt;a href="../materials/acs2024_4pct.dta">&lt;code>acs2024_4pct.dta&lt;/code>&lt;/a>, which
contains information from the
&lt;a href="https://www.census.gov/programs-surveys/acs" target="_blank" rel="noopener">2024 American Community Survey&lt;/a>.&lt;/p>
&lt;p>By the end of this lab, you should be able to complete the following
tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>Think about sample selection issues&lt;/li>
&lt;li>Estimate and interpret linear probability models&lt;/li>
&lt;li>Reason about omitted variable bias and measurement error&lt;/li>
&lt;/ul>
&lt;h3 id="data-context" class="unnumbered">Data context&lt;/h3>
&lt;p>Each row in &lt;code>acs2024_4pct.dta&lt;/code> is an individual from the 2024 ACS microdata. The file includes demographics, education, labor-force status, work hours, and earnings variables. We will restrict our sample to &lt;strong>married adults&lt;/strong> and explore the gender wage gap, labor force participation, and how sample selection affects our estimates.&lt;/p>
&lt;p>Tip: use &lt;code>describe&lt;/code>, &lt;code>codebook&lt;/code>, and &lt;code>tab ... , nolabel&lt;/code> to check labels and coding for any variables you plan to use.&lt;/p>
&lt;h3 id="variables" class="unnumbered">Variables we&amp;rsquo;ll use&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">variable&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>incwage&lt;/code>&lt;/td>
&lt;td style="text-align:left">wage and salary income&lt;/td>
&lt;td style="text-align:left">check for topcodes (999999 = N/A)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>sex&lt;/code>&lt;/td>
&lt;td style="text-align:left">sex&lt;/td>
&lt;td style="text-align:left">1 = Male, 2 = Female&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>age&lt;/code>&lt;/td>
&lt;td style="text-align:left">age in years&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>marst&lt;/code>&lt;/td>
&lt;td style="text-align:left">marital status&lt;/td>
&lt;td style="text-align:left">1 = married spouse present, 2 = married spouse absent&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>labforce&lt;/code>&lt;/td>
&lt;td style="text-align:left">labor force status&lt;/td>
&lt;td style="text-align:left">0 = N/A, 1 = not in LF, 2 = in LF&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>uhrswork&lt;/code>&lt;/td>
&lt;td style="text-align:left">usual hours worked per week&lt;/td>
&lt;td style="text-align:left">0 = N/A (did not work last year)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>wkswork1&lt;/code>&lt;/td>
&lt;td style="text-align:left">weeks worked last year&lt;/td>
&lt;td style="text-align:left">0 = did not work&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="key-commands" class="unnumbered">Key commands &lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>codebook var1&lt;/code>&lt;/td>
&lt;td style="text-align:right">Look at key details for &lt;code>var1&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>clonevar var1 = var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">Make a new variable, &lt;code>var1&lt;/code> that duplicates &lt;code>var2&lt;/code> (including labels)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>_pctile var1, per(99)&lt;/code>&lt;/td>
&lt;td style="text-align:right">Calculate the 99th percentile of &lt;code>var1&lt;/code>, and store as a local variable&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>ret list&lt;/code>&lt;/td>
&lt;td style="text-align:right">Show locally stored variables (handy!)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="lpm" class="unnumbered">Linear Probability Models&lt;/h2>
&lt;p>What happens when our dependent variable is binary? We can use it anyway! Using OLS with a binary dependent variable is called a &lt;strong>linear probability model&lt;/strong>. There is plenty of debate about whether (and when) this is an okay idea, as it can lead to predictions that are below zero or greater than 1, and it violates homoskedasticity assumptions. We can fix the latter by estimating heteroskedasticity-robust standard errors, and the general consensus &lt;em>seems&lt;/em> to be that usually, we&amp;rsquo;re okay using a LPM. (Though we can do better!)&lt;/p>
&lt;p>What about interpretation? We interpret coefficients in &lt;strong>percentage points&lt;/strong> (not percents!)&lt;/p>
&lt;p>Consider the following:&lt;/p>
&lt;p>$Married_i = \beta_0 + \beta_1 age_i + \beta_2 educ_i + u_i$&lt;/p>
&lt;p>$\beta_1$ means that a 1-year increase in age is associated with a 100*$\beta_1$ &lt;strong>percentage-point change&lt;/strong> in the probability of being married. So if $\beta_1$ is 0.05, that means that being one year older is associated with a 5 percentage point increase in the likelihood of being married.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>LPM in Stata&lt;/strong>&lt;/p>
&lt;p>A linear probability model looks exactly like a typical OLS regression — but your dependent variable is &lt;strong>binary (0/1)&lt;/strong>:&lt;/p>
&lt;pre tabindex="0">&lt;code>regress lf female, robust
&lt;/code>&lt;/pre>&lt;p>The coefficient on &lt;code>female&lt;/code> tells you the change in the &lt;strong>probability&lt;/strong> (in decimal form) of being in the labor force associated with being female. Multiply by 100 to express in percentage points.&lt;/p>
&lt;/div>
&lt;/div>
&lt;p>For great slides on this (and a deeper dive), check out
&lt;a href="https://nickch-k.github.io/EconometricsSlides/Week_08/Week_08_Limited_Dependent_Variables.html" target="_blank" rel="noopener">this resource&lt;/a>!&lt;/p>
&lt;h2 id="lab-video">Lab Video&lt;/h2>
&lt;p>** Note that this video is from an earlier version of the lab that used 2016 data from the Current Population Survey. Details may vary, but the implelmentation is the same!**&lt;/p>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/tpYknYpmjRU" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div>
&lt;h2 id="workflow" class="unnumbered">Workflow overview&lt;/h2>
&lt;ol>
&lt;li>Load the dataset and start your log file.&lt;/li>
&lt;li>Restrict the sample (married adults only).&lt;/li>
&lt;li>Inspect and clean variables (&lt;code>codebook&lt;/code>, &lt;code>tab&lt;/code>, replace N/A codes).&lt;/li>
&lt;li>Generate binary indicators (&lt;code>female&lt;/code>, &lt;code>lf&lt;/code>).&lt;/li>
&lt;li>Run regressions, adding controls sequentially and interpreting results.&lt;/li>
&lt;li>Construct new variables (hourly wages, log wages) and analyze outliers.&lt;/li>
&lt;li>Answer the worksheet questions about internal validity throughout.&lt;/li>
&lt;/ol>
&lt;h2 id="lab-6-worksheet" class="unnumbered">Lab 6 Worksheet&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>Your written up answers to exercise questions (1) - (18). This can be typed or written out then scanned (or photographed), in any reasonable format.&lt;/li>
&lt;li>The do-file you&amp;rsquo;ve created that runs this analysis&lt;/li>
&lt;li>A log file that contains the results from this exercise.&lt;/li>
&lt;/ul>
&lt;p>&lt;em>Use robust standard errors in all regressions.&lt;/em>&lt;/p>
&lt;p>Example:&lt;/p>
&lt;pre tabindex="0">&lt;code>regress incwage female, robust
&lt;/code>&lt;/pre>&lt;h3 id="questions">Questions&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>Open Stata, start a new do-file (or use the template). Make sure
you add code to start (and end) a log.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Open &lt;code>acs2024_4pct.dta&lt;/code> and restrict the sample to adults (age 18+) who are married (spouse present or absent). Use &lt;code>tab marst, nolabel&lt;/code> to identify the correct codes. Confirm that you have &lt;strong>59,039&lt;/strong> observations.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Check work hours (&lt;code>uhrswork&lt;/code>), weeks of work (&lt;code>wkswork1&lt;/code>), and wage income (&lt;code>incwage&lt;/code>) for any N/A codes. In this dataset, &lt;code>uhrswork == 0&lt;/code> means &amp;ldquo;did not work last year&amp;rdquo; (N/A) — replace these with missing. Also check whether &lt;code>incwage&lt;/code> has any topcode values (999999). Use the &lt;code>codebook&lt;/code> command to help (e.g. &lt;code>codebook uhrswork&lt;/code>). Ensure you have the correct means and number of observations:&lt;/p>
&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
wkswork1 | 59,039 30.70257 24.59739 0 52
uhrswork | 37,796 39.18065 12.56177 1 99
-------------+---------------------------------------------------------
incwage | 59,039 50505.14 84753.23 0 907000
&lt;/code>&lt;/pre>&lt;ol start="4">
&lt;li>
&lt;p>Generate a binary variable &lt;code>female&lt;/code> equal to one if &lt;code>sex == 2&lt;/code>. Estimate the impact of &lt;code>female&lt;/code> on wage income (&lt;code>incwage&lt;/code>) among your sample of married individuals. What is the interpretation of the coefficient?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If our objective is to measure the impact of gender on wage income among married people, is sample selection bias likely to be important? Why or why not? Is measurement error likely to be important? Why or why not? If so, what is the likely impact of measurement error on your estimated coefficients?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create a binary variable &lt;code>lf&lt;/code> equal to 1 if an individual is in the labor force (&lt;code>labforce == 2&lt;/code>), and 0 otherwise. Estimate the impact of gender on labor force status. What is the interpretation of the coefficient?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Reminder:&lt;/strong> This is a linear probability model! Your dependent variable (&lt;code>lf&lt;/code>) is binary, so interpret the coefficient in percentage points.
&lt;/div>
&lt;/div>
&lt;ol start="7">
&lt;li>
&lt;p>What is the impact of being in the labor force on wage income? Based on this and the previous question, what is the implication for the direction of omitted variable bias when you estimated $incwage = \beta_0 + \beta_1 female + u$ without controlling for labor force participation status?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Re-estimate the previous regression, including a control for &lt;code>lf&lt;/code>: $incwage = \beta_0 + \beta_1 female + \beta_2 lf + u$. Was your prediction in part (7) correct?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Now, add your cleaned variable for usual hours worked to estimate $incwage = \beta_0 + \beta_1 female + \beta_2 lf + \beta_3 uhrswork + u$. What is the interpretation of each coefficient?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Why does your regression not include all 59,039 people? What type of bias might this introduce?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Is measurement error likely to be important in the previous regression, and if so, for which variables? What is the likely impact of measurement error on your estimated coefficients?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Generate a new variable &lt;code>uhrsNZ&lt;/code> that recodes all missing work hours values as zeros. You can expedite this with the &lt;code>clonevar&lt;/code> command, which retains variable labels. Re-estimate the impact of gender, labor force status and &lt;code>uhrsNZ&lt;/code> on wage income (&lt;code>incwage&lt;/code>). That is, you&amp;rsquo;re replacing &lt;code>uhrswork&lt;/code> with &lt;code>uhrsNZ&lt;/code>. What is the interpretation on &lt;em>each&lt;/em> coefficient? Why did it change?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Now, re-estimate but exclude &lt;code>lf&lt;/code>: $incwage = \beta_0 + \beta_1 female + \beta_3 uhrsNZ + u$. How do your results change? Conditional on including &lt;code>female&lt;/code> and &lt;code>uhrsNZ&lt;/code>, does it make sense to include &lt;code>lf&lt;/code>?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create a new variable that estimates log wages: &lt;code>gen l_incwage = log(incwage)&lt;/code>. Estimate the impact of gender on logged wage income, including a control for &lt;code>uhrswork&lt;/code>. How does the sample size change, and why? What is the interpretation of each coefficient?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using the cleaned variables, calculate hourly wages: &lt;code>gen hourwage = incwage / (uhrswork * 50)&lt;/code>. We assume that people work 50 weeks in one year. What are mean hourly wages for men and women?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate the impact of gender on hourly wages for those with positive hourly wages, controlling for usual hours worked (&lt;code>uhrswork&lt;/code>). Then, replace missing hourly wages with 0 for those who worked but earned no wages, and re-estimate. How does the impact of gender on earnings compare between the two regressions? Why does the sample size change?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Do outliers affect your results? Exclude observations that exceed the 99th percentile in wages based on &lt;code>incwage&lt;/code>, and re-estimate both equations from the previous question. How do your results change?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Hint:&lt;/strong>&lt;/p>
&lt;pre tabindex="0">&lt;code>_pctile incwage, per(99)
ret list
&lt;/code>&lt;/pre>&lt;p>This stores the 99th percentile value, which you can use to filter observations.&lt;/p>
&lt;/div>
&lt;/div>
&lt;ol start="18">
&lt;li>Is measurement error likely to affect your dependent variable, &lt;code>hourwage&lt;/code>? Why or why not? If so, what are the implications?&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Submission checklist&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Answers to questions (1)-(18)&lt;/li>
&lt;li>Do-file with comments for each question&lt;/li>
&lt;li>Log file that matches your do-file commands&lt;/li>
&lt;li>Make sure your do-file includes &lt;code>log close&lt;/code> at the end&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div></description></item><item><title>Research Paper: Research Proposal</title><link>https://econ3500s26.netlify.app/assignment/rp-04-proposal/</link><pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-04-proposal/</guid><description>&lt;h2 id="objective" class="unnumbered">Objective&lt;/h2>
&lt;p>The goal of this submission is for you to translate your research idea and data set into the outline of a workable paper. You can think of your research proposal as &amp;ldquo;baby paper&amp;rdquo;: a summary of what your question is, why it matters, and how you intend to solve it.&lt;/p>
&lt;p>For this assignment, &lt;strong>some basic data work is necessary.&lt;/strong> However, you may find it helpful to explore some analyses to get a better sense of what empirical specifications are feasible.&lt;/p>
&lt;h2 id="components" class="unnumbered">Components&lt;/h2>
&lt;p>While our first two assignments were fairly informal, this is a formal paper. That means that I will pay close attention to not only what ideas you present, but also &lt;em>how&lt;/em> you present them.&lt;/p>
&lt;p>Your proposal should be &lt;strong>at least 1200 words&lt;/strong> (excluding references) and include the following components:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Introduction&lt;/strong> that contains (1) a clearly stated research question. What hypotheses are you testing? (2) Motivation — why is this important/interesting? Include at least 2 sources.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Literature review&lt;/strong>: discussion of related literature and how your paper fits in. Include at least 4 peer-reviewed sources that are &lt;em>distinct from&lt;/em> your introduction sources (i.e., at least 6 unique sources total across the introduction and literature review).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Data description&lt;/strong>: Description of data set — make sure you include the sources! You will need to have loaded and explored your data enough to produce the summary statistics table below.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Summary statistics table&lt;/strong> that includes the basic descriptive statistics that will be relevant to your analysis. Ultimately, this table will probably be the first table in your final paper. It should be &lt;strong>formatted nicely&lt;/strong> (i.e., not copied directly out of Stata) with easy-to-interpret variable names and column headers, and it will likely include the following:&lt;/p>
&lt;ol>
&lt;li>Number of observations&lt;/li>
&lt;li>Means and standard deviations of your dependent variable(s)&lt;/li>
&lt;li>Means and standard deviations of your key independent variable(s)&lt;/li>
&lt;li>If you are comparing two (or more) groups, you will want to report means separately for each group.&lt;/li>
&lt;li>Any other details that might be relevant to your data (i.e.
number of states, number of years, number of households, etc.)&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Empirical specification&lt;/strong>: You &lt;strong>must&lt;/strong> include the empirical specification of the regression(s) you are estimating in equation form, along with a clear description of what each variable is.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Planned analysis&lt;/strong>: How will your results answer your research question? Make sure your assertions are qualified and well-supported.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Threats and limitations&lt;/strong>: What challenges will you face in interpreting your results? Discuss potential threats such as omitted variable bias, reverse causality, measurement error, or other violations of our assumptions, and how you might address them.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bibliography&lt;/strong> — for any references cited in your proposal plus any data sources&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>(Optional)&lt;/em> &lt;strong>Outline of tables&lt;/strong>: In as much detail as possible, outline the tables you plan to include (no numbers necessary)&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Your annotated bibliography will help you push forward your motivation and the literature review.&lt;/p>
&lt;p>Keep in mind that the more detail you include, the better the feedback you&amp;rsquo;ll receive! A classmate will provide a peer review of your proposal, providing feedback to help you turn your proposal into a final paper&lt;/p>
&lt;h2 id="rubric" class="unnumbered">Rubric&lt;/h2>
&lt;p>This assignment is worth &lt;strong>38 points&lt;/strong>. Each criterion is scored on a 5-level scale (does not meet → fully meets), with proportional points at each level.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:center">Component&lt;/th>
&lt;th style="text-align:left">Criteria&lt;/th>
&lt;th style="text-align:center">Points&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Introduction&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>6&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:left">Research question is clearly stated, specific, and answerable&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:left">Question is well-motivated using at least 2 sources&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Literature review&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>4&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">2&lt;/td>
&lt;td style="text-align:left">Important literature discussed and linked to research question; at least 4 peer-reviewed academic sources (distinct from introduction sources)&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Data &amp;amp; summary statistics&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>6&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">3&lt;/td>
&lt;td style="text-align:left">Data set(s) clearly identified, appropriate, and cited&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">4&lt;/td>
&lt;td style="text-align:left">Summary statistics table includes key variables; clearly and accurately conveys info in a formatted table&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Empirical strategy&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>10&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">5&lt;/td>
&lt;td style="text-align:left">Empirical specification clearly stated (equation form), variables defined&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">6&lt;/td>
&lt;td style="text-align:left">Empirical strategy and proposed analysis shows critical thinking&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">6&lt;/td>
&lt;td style="text-align:left">Assertions are qualified and well-supported (e.g., avoid overstating causal claims)&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Threats &amp;amp; limitations&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>2&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">7&lt;/td>
&lt;td style="text-align:left">Discusses potential threats to interpretation and potential solutions&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Presentation &amp;amp; formatting&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>10&lt;/strong>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">8&lt;/td>
&lt;td style="text-align:left">Cited references included in properly-formatted bibliography&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">—&lt;/td>
&lt;td style="text-align:left">Meets formatting requirements (length, paragraphs, etc.)&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">—&lt;/td>
&lt;td style="text-align:left">Writing style clear and easy to read; few grammatical or typographical errors&lt;/td>
&lt;td style="text-align:center">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:left">&lt;strong>Total&lt;/strong>&lt;/td>
&lt;td style="text-align:center">&lt;strong>38&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="submission-requirements" class="unnumbered">Submission requirements&lt;/h2>
&lt;p>Your research proposal should be written in paragraph form (i.e. complete sentences, not bullet points), copy-edited for grammatical/spelling errors, and submitted as a Word or PDF document.&lt;/p>
&lt;h2 id="examples" class="unnumbered">Examples&lt;/h2>
&lt;p>See Brightspace for example proposals. These are not perfect, but they are all of high quality.&lt;/p></description></item><item><title>Lab 5: Merging and hypothesis tests</title><link>https://econ3500s26.netlify.app/assignment/05-lab/</link><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/05-lab/</guid><description>&lt;p>&lt;strong>
&lt;a href="../05-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;h2 id="materials" class="unnumbered">Materials&lt;/h2>
&lt;ul>
&lt;li>
&lt;a href="../materials/acs2024_4pct.dta">&lt;code>acs2024_4pct.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;li>BLS county unemployment data
&lt;a href="../materials/laucnty24.xlsx">&lt;code>laucnty24.xlsx&lt;/code>&lt;/a> &lt;em>(or
&lt;a href="https://www.bls.gov/lau/tables.htm" target="_blank" rel="noopener">download from BLS&lt;/a>)&lt;/em>&lt;/li>
&lt;/ul>
&lt;h2 id="objectives" class="unnumbered">Objectives&lt;/h2>
&lt;p>Today we&amp;rsquo;re going to work with
&lt;a href="../materials/acs2024_4pct.dta">&lt;code>acs2024_4pct.dta&lt;/code>&lt;/a>, which
contains information from the
&lt;a href="https://www.census.gov/programs-surveys/acs" target="_blank" rel="noopener">2024 American Community Survey&lt;/a>. &lt;em>Note that this is a different version from what we have been using! It has a few more variables and also a larger sample.&lt;/em>&lt;/p>
&lt;p>We&amp;rsquo;re going to merge county-level unemployment rates from the
&lt;a href="https://www.bls.gov/lau/tables.htm" target="_blank" rel="noopener">Bureau of Labor Statistics&lt;/a>.&lt;/p>
&lt;p>By the end of this lab, you should be able to complete the following tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Import data from Excel&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Merge data sets&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Test hypotheses for linear combinations of coefficients&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Test the general significance of a regression&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="data-context" class="unnumbered">Data context&lt;/h3>
&lt;p>Each row in &lt;code>acs2024_4pct.dta&lt;/code> is an individual from the 2024 ACS microdata. The file includes demographics, education, labor-force status, earnings, and geographic identifiers at the state and county level. The BLS county unemployment file (&lt;code>laucnty24.xlsx&lt;/code>) contains 2024 annual average labor force statistics for every U.S. county.&lt;/p>
&lt;p>We will merge the two datasets by county, matching on state and county FIPS codes.&lt;/p>
&lt;h3 id="variables" class="unnumbered">Variables we&amp;rsquo;ll use&lt;/h3>
&lt;p>&lt;strong>ACS data (&lt;code>acs2024_4pct.dta&lt;/code>)&lt;/strong>&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">variable&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>inctot&lt;/code>&lt;/td>
&lt;td style="text-align:left">total personal income&lt;/td>
&lt;td style="text-align:left">9999999 = N/A; replace before analysis&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>educ&lt;/code>&lt;/td>
&lt;td style="text-align:left">educational attainment&lt;/td>
&lt;td style="text-align:left">numeric categories; check labels with &lt;code>tab educ, nolabel&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>labforce&lt;/code>&lt;/td>
&lt;td style="text-align:left">labor force status&lt;/td>
&lt;td style="text-align:left">2 = in labor force (check with &lt;code>tab labforce, nolabel&lt;/code>)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>age&lt;/code>&lt;/td>
&lt;td style="text-align:left">age&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>statefip&lt;/code>&lt;/td>
&lt;td style="text-align:left">state FIPS code&lt;/td>
&lt;td style="text-align:left">used for merging&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>countyfip&lt;/code>&lt;/td>
&lt;td style="text-align:left">county FIPS code&lt;/td>
&lt;td style="text-align:left">0 = not identified; used for merging&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>BLS data (&lt;code>laucnty24.xlsx&lt;/code>)&lt;/strong>&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">column&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">State FIPS Code&lt;/td>
&lt;td style="text-align:left">2-digit state code&lt;/td>
&lt;td style="text-align:left">imported as string; needs &lt;code>destring&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">County FIPS Code&lt;/td>
&lt;td style="text-align:left">3-digit county code&lt;/td>
&lt;td style="text-align:left">imported as string; needs &lt;code>destring&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">County Name/State Abbreviation&lt;/td>
&lt;td style="text-align:left">county name&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Labor Force&lt;/td>
&lt;td style="text-align:left">total county labor force&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Employed&lt;/td>
&lt;td style="text-align:left">county employed count&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Unemployed&lt;/td>
&lt;td style="text-align:left">county unemployed count&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Unemployment Rate (%)&lt;/td>
&lt;td style="text-align:left">county unemployment rate&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="key-commands" class="unnumbered">Key commands &lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;strong>Importing data&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>import excel using &amp;quot;file.xlsx&amp;quot;, firstrow clear&lt;/code>&lt;/td>
&lt;td style="text-align:right">Import an Excel file. &lt;code>firstrow&lt;/code> uses row 1 as variable names. &lt;code>clear&lt;/code> erases existing data.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>import excel using &amp;quot;file.xlsx&amp;quot;, cellrange(A2) firstrow clear&lt;/code>&lt;/td>
&lt;td style="text-align:right">Same, but start reading from cell A2 (useful when row 1 is a title, not data).&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;strong>Identifying duplicates&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>duplicates list var1 var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">List any observations that are duplicates on the listed variables.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>duplicates tag var1 var2, gen(d1)&lt;/code>&lt;/td>
&lt;td style="text-align:right">Generate a new variable, &lt;code>d1&lt;/code>, that indicates which observations are duplicates for &lt;code>var1&lt;/code> and &lt;code>var2&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;strong>Merging datasets&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>merge 1:1 var1 var2 using file2&lt;/code>&lt;/td>
&lt;td style="text-align:right">One-to-one merge on &lt;code>var1&lt;/code> and &lt;code>var2&lt;/code>. No duplicates allowed in either dataset.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>merge m:1 var1 var2 using file2&lt;/code>&lt;/td>
&lt;td style="text-align:right">Many-to-one merge on &lt;code>var1&lt;/code> and &lt;code>var2&lt;/code>. Duplicates OK in master data (like merging county data into individual data) but not in using data.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;strong>Converting between string and numeric variables&lt;/strong>&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>destring var1, gen(newvar)&lt;/code>&lt;/td>
&lt;td style="text-align:right">Convert a string variable to numeric, saving as &lt;code>newvar&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>destring var1, replace&lt;/code>&lt;/td>
&lt;td style="text-align:right">Convert a string variable to numeric, replacing the original.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tostring var2, gen(string_var)&lt;/code>&lt;/td>
&lt;td style="text-align:right">Convert a numeric variable to string, saving as &lt;code>string_var&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;strong>Statistical tests&lt;/strong>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>test var1 = var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">Run after a regression. Tests whether the coefficient on &lt;code>var1&lt;/code> equals the coefficient on &lt;code>var2&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>testparm var1 var2 ... &lt;/code>&lt;/td>
&lt;td style="text-align:right">Run after a regression. Tests whether all listed variables are jointly equal to zero.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="a-note-on-temporary-files-optional">A note on temporary files (optional)&lt;/h3>
&lt;p>This exercise works by having two data sets stored on your hard drive, then running a &lt;code>merge&lt;/code> command to unite them. You might notice that the workflow feels clunky and generates extra files — open a data set, save it, open another data set, then merge in the first data set.&lt;/p>
&lt;p>You can use temporary files to speed things up! Basically, you can save files in your local memory, and call those files the same way we called local variables. Everything has to be run in the do-file for this to work.&lt;/p>
&lt;p>A short example (you can paste this in a do-file and run it, as it uses built-in Stata files):&lt;/p>
&lt;pre tabindex="0">&lt;code>
tempfile tempauto // Declare tempfile (needs to run before you try to save)
webuse autosize, clear
save `tempauto', replace // save to temp file
webuse autoexpense, clear
merge 1:1 make using `tempauto' // call tempfile
tab _merge // check out merge
list
&lt;/code>&lt;/pre>&lt;h2 id="workflow" class="unnumbered">Workflow overview&lt;/h2>
&lt;ol>
&lt;li>Import the BLS county unemployment data from Excel.&lt;/li>
&lt;li>Clean variables and save as a Stata data file.&lt;/li>
&lt;li>Open the ACS data and restrict the sample.&lt;/li>
&lt;li>Merge in county-level unemployment by state and county FIPS codes.&lt;/li>
&lt;li>Create education indicators and run regressions.&lt;/li>
&lt;li>Conduct hypothesis tests.&lt;/li>
&lt;/ol>
&lt;h2 id="lab-video">Lab Video&lt;/h2>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/umVrYbXrpoU" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div>
&lt;h2 id="lab-5-worksheet">Lab 5 Worksheet&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>Your written up answers to the exercise questions. This can be typed or written out then scanned (or photographed), in any reasonable format. &lt;em>Note: Question 21 is optional.&lt;/em>&lt;/li>
&lt;li>The do-file you&amp;rsquo;ve created that runs this analysis&lt;/li>
&lt;li>A log file that contains the results from this exercise.&lt;/li>
&lt;/ul>
&lt;h3 id="exercises">Exercises&lt;/h3>
&lt;p>&lt;strong>Part 1: Import and prepare unemployment data&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Visit
&lt;a href="https://www.bls.gov/lau/tables.htm" target="_blank" rel="noopener">https://www.bls.gov/lau/tables.htm&lt;/a> to access 2024 annual &lt;strong>county-level&lt;/strong> unemployment rates. Download the appropriate table as an Excel file.&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/p>
&lt;p>a. Open the file in Excel or another spreadsheet program. Notice that the first row contains a title and the actual column headers start in the second row.&lt;/p>
&lt;p>b. You do not need to edit the file — we&amp;rsquo;ll handle everything in Stata.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;ol start="2">
&lt;li>
&lt;p>Open Stata and start a new do-file using the
&lt;a href="../materials/econ3500_lab_template.do">template&lt;/a>. Update the file paths and add code to start (and end) a log.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Import your unemployment Excel file into Stata. Because the first row is a title (not column headers), use the &lt;code>cellrange&lt;/code> option to start reading from row 2:&lt;/p>
&lt;pre tabindex="0">&lt;code>import excel using &amp;quot;laucnty24.xlsx&amp;quot;, cellrange(A2) firstrow clear
&lt;/code>&lt;/pre>&lt;p>Run &lt;code>describe&lt;/code> to see the variable names Stata assigned. How many observations (counties) are there?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The FIPS code variables were imported as &lt;strong>strings&lt;/strong> (text), not numbers. Convert them to numeric variables so they match the ACS data:&lt;/p>
&lt;pre tabindex="0">&lt;code>destring StateFIPSCode, gen(statefip)
destring CountyFIPSCode, gen(countyfip)
&lt;/code>&lt;/pre>&lt;p>&lt;em>(If Stata named your variables differently, check with &lt;code>describe&lt;/code> and adjust accordingly.)&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Check for duplicates on &lt;code>statefip&lt;/code> and &lt;code>countyfip&lt;/code>. Are there any? &lt;em>(There shouldn&amp;rsquo;t be — each county should appear exactly once.)&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Save your unemployment data as a Stata file:&lt;/p>
&lt;pre tabindex="0">&lt;code>save &amp;quot;unemp_2024.dta&amp;quot;, replace
&lt;/code>&lt;/pre>&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Part 2: Merge with ACS data&lt;/strong>&lt;/p>
&lt;ol start="7">
&lt;li>
&lt;p>Open &lt;code>acs2024_4pct.dta&lt;/code> and restrict the sample to adults (age 18+).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Before merging, take a look at the county identifier in the ACS data. Tabulate &lt;code>countyfip&lt;/code>. What do you notice about the value 0?&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;ol start="9">
&lt;li>
&lt;p>Now, merge your unemployment data into the ACS by county:&lt;/p>
&lt;pre tabindex="0">&lt;code>merge m:1 statefip countyfip using &amp;quot;unemp_2024.dta&amp;quot;
&lt;/code>&lt;/pre>&lt;p>a. Why do we use &lt;code>m:1&lt;/code> (many-to-one) instead of &lt;code>1:1&lt;/code>?&lt;/p>
&lt;p>b. Tabulate the &lt;code>_merge&lt;/code> variable. What share of observations successfully merged?&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;ol start="10">
&lt;li>Drop any unmatched observations (you can use &lt;code>drop if&lt;/code>) and drop the &lt;code>_merge&lt;/code> variable. What is the average unemployment rate for the sample — why would this be different than taking an average of county unemployment rates from your Excel file?&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Part 3: Education variables and regression&lt;/strong>&lt;/p>
&lt;ol start="11">
&lt;li>
&lt;p>Why can&amp;rsquo;t we use &lt;code>educ&lt;/code> directly as a linear variable in a regression?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Generate three dummy variables. These three variables should be mutually exclusive, and they should not be missing for any observations.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>lesshs&lt;/code>, a variable equal to one if a person completed &lt;em>less than&lt;/em> a high school diploma&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>hsgrad&lt;/code>, a variable equal to one if a person completed at least a high school diploma but less than a Bachelor&amp;rsquo;s degree&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>colgrad&lt;/code>, a variable equal to one if a person completed a Bachelor&amp;rsquo;s degree or higher&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>&lt;em>Note:&lt;/em> Education is coded with &lt;strong>labels,&lt;/strong> which means that it is numeric data with a description of what each number means on top. These show up as blue in the Stata browser. To see the underlying codes: &lt;code>tab educ, nolabel&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the mean of &lt;code>lesshs&lt;/code>, &lt;code>hsgrad&lt;/code>, and &lt;code>colgrad&lt;/code>?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Before running a regression, check &lt;code>inctot&lt;/code> (total personal income) for N/A codes. Replace any N/A values as missing.&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup> Then estimate a regression of total personal income on education, using the binary variables you just created. Omit &lt;code>lesshs&lt;/code>. Use robust standard errors.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Part 4: Hypothesis tests&lt;/strong>&lt;/p>
&lt;ol start="15">
&lt;li>
&lt;p>Set up a hypothesis test for whether both &lt;code>hsgrad&lt;/code> and &lt;code>colgrad&lt;/code> are jointly significant. Report the null hypothesis, alternative hypothesis, test statistic, and conclusion.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Set up a hypothesis test for whether the returns to being a high-school graduate are the same as the returns to being a college graduate. Report the null hypothesis, alternative hypothesis, test statistic, and conclusion.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Is this regression significant overall? Explain how you know.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Part 5: Adding unemployment&lt;/strong>&lt;/p>
&lt;ol start="18">
&lt;li>
&lt;p>Now add county-level unemployment rate to the previous equation. What is the interpretation of the coefficient on unemployment? Is it statistically significant?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate the same equation by regressing total personal income on the education binary variables and county-level unemployment, restricting to those who are currently in the labor force. How does this change the coefficient on unemployment?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Identify three &lt;em>state&lt;/em> or &lt;em>county-level&lt;/em> variables that are likely to cause omitted variable bias if you want to know whether unemployment affects individual income.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>(Optional)&lt;/em> For &lt;em>one&lt;/em> of the variables you listed above, find the data online, import into Stata, and merge it in. Regress total personal income on the education binary variables, county-level unemployment, and the new variable you found. Restrict your sample to those who are currently in the labor force. How does the inclusion of your new variable affect the coefficient on unemployment?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>If you have trouble accessing the BLS website, you can use the file provided in the lab materials above.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>In IPUMS data, &lt;code>countyfip = 0&lt;/code> means the county is &lt;strong>not identified&lt;/strong> — the Census Bureau withholds county identifiers for small counties to protect confidentiality. These observations cannot be matched to BLS data.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>Expect roughly 40–60% of observations to match. The main reason for non-matches is that many ACS respondents have &lt;code>countyfip = 0&lt;/code> (county not identified).&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4" role="doc-endnote">
&lt;p>Use &lt;code>summarize inctot&lt;/code> to check for suspicious values. In IPUMS data, 9999999 typically means N/A.&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Lab 4: Multivariate Regression</title><link>https://econ3500s26.netlify.app/assignment/04-lab/</link><pubDate>Thu, 19 Feb 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/04-lab/</guid><description>&lt;p>&lt;strong>
&lt;a href="../04-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;h2 id="materials" class="unnumbered">Materials&lt;/h2>
&lt;ul>
&lt;li>
&lt;a href="../materials/acs2024_2pct.dta">&lt;code>acs2024_2pct.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;li>Looping exercise
&lt;a href="../materials/loop_example.do">&lt;code>loop_example.do&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;!-- - Sample from class [`lab4_sample.do`](../materials/lab4_sample.do) -->
&lt;h2 id="objectives" class="unnumbered">Objectives&lt;/h2>
&lt;p>Today we&amp;rsquo;re going to work with, &lt;code>acs2024_2pct.dta&lt;/code>, which
contains information from the
&lt;a href="https://www.census.gov/programs-surveys/acs" target="_blank" rel="noopener">2024 American Community Survey&lt;/a>. We used this in Lab 02!&lt;/p>
&lt;p>By the end of this lab, you should be able to complete the following
tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Estimate and interpret multiple linear regression in levels, using continuous and binary independent variables, and use heteroskedasticity-robust standard errors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Interpret the results of multivariate linear regressions in terms of
statistical and economic significance&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Practice generating binary variables from categorical measures&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Set up basic loops&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use &lt;code>xi&lt;/code> and &lt;code>i.&lt;/code> prefix to include a lot of binary indicator variables at once.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="data-context" class="unnumbered">Data context&lt;/h3>
&lt;p>Each row in &lt;code>acs2024_2pct.dta&lt;/code> is an individual from the 2024 ACS microdata. The file includes demographics, education, labor-force status, and earnings variables. We will focus on variables like &lt;code>incwage&lt;/code>, &lt;code>educ&lt;/code>, &lt;code>labforce&lt;/code>, &lt;code>statefip&lt;/code>, &lt;code>race&lt;/code>, &lt;code>hispan&lt;/code>, and &lt;code>age&lt;/code>.&lt;/p>
&lt;p>Tip: use &lt;code>describe&lt;/code> and &lt;code>codebook&lt;/code> to check labels and coding for any variables you plan to use.&lt;/p>
&lt;h3 id="variables" class="unnumbered">Variables we&amp;rsquo;ll use&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">variable&lt;/th>
&lt;th style="text-align:left">meaning&lt;/th>
&lt;th style="text-align:left">notes&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>incwage&lt;/code>&lt;/td>
&lt;td style="text-align:left">wage and salary income&lt;/td>
&lt;td style="text-align:left">check labels for topcodes or missing values&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>educ&lt;/code>&lt;/td>
&lt;td style="text-align:left">educational attainment&lt;/td>
&lt;td style="text-align:left">numeric categories; check labels&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>labforce&lt;/code>&lt;/td>
&lt;td style="text-align:left">labor force status&lt;/td>
&lt;td style="text-align:left">values like 0/1/2 (check labels)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>race&lt;/code>&lt;/td>
&lt;td style="text-align:left">race code&lt;/td>
&lt;td style="text-align:left">use to build indicators&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>hispan&lt;/code>&lt;/td>
&lt;td style="text-align:left">Hispanic origin&lt;/td>
&lt;td style="text-align:left">use to build indicators&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>age&lt;/code>&lt;/td>
&lt;td style="text-align:left">age&lt;/td>
&lt;td style="text-align:left">used to construct year of birth&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>statefip&lt;/code>&lt;/td>
&lt;td style="text-align:left">state&lt;/td>
&lt;td style="text-align:left">use with &lt;code>i.statefip&lt;/code>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>👁️ Tip: &lt;code>codebook race&lt;/code> is a quick way to check variable labels for &lt;code>race&lt;/code>! 👁️&lt;/p>
&lt;h2 id="key-commands" class="unnumbered">Key commands &lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>regress var1 var2...&lt;/code>&lt;/td>
&lt;td style="text-align:right">Estimate a regression, with &lt;code>var1&lt;/code> as the dependent variable and all others as the independent&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">variable(s)&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tabulate var1,nolabel&lt;/code>&lt;/td>
&lt;td style="text-align:right">Tabulate variables &lt;em>without&lt;/em> labels&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>replace var1 = . if var1 == 999999&lt;/code>&lt;/td>
&lt;td style="text-align:right">Replace &lt;code>var1&lt;/code> as missing (using a dot) if &lt;code>var1&lt;/code> is equal to 999999. Can be replaced with any other values or logical expressions.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="creating-binary-variables">Creating binary variables&lt;/h3>
&lt;p>Recall that there are two easy ways to make binary
variables out of categorical or continuous variables. Consider the
variable &lt;code>race&lt;/code>, where 1 = White, 2 = Black, 3 = Native American, etc.
Suppose you want to generate a binary indicator for whether a person is White.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>gen white = race == 1&lt;/code>: generates a variable equal to 1 if &lt;code>race&lt;/code>
is 1, and 0 otherwise&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>gen white = 1 if race == 1&lt;/code>: generates a variable equal to 1 if
&lt;code>race&lt;/code> is 1, and &lt;strong>missing&lt;/strong> otherwise. To complete this you need
two lines of code:&lt;br>
&lt;code>gen white = 1 if race == 1&lt;/code>&lt;br>
&lt;code>replace white = 0 if race != 1&lt;/code>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="working-with-loops">Working with loops&lt;/h3>
&lt;p>Loops can help us (1) avoid errors and (2) code super fast!&lt;/p>
&lt;p>I&amp;rsquo;ve uploaded a sample as
&lt;a href="../materials/loop_example.do">&lt;code>loop_example.do&lt;/code>&lt;/a>&lt;/p>
&lt;p>Stata has two types of looping setups, using the &lt;code>forval&lt;/code> or &lt;code>foreach&lt;/code> command. The first is simpler, and the second is more versatile. Recall that you can always use &lt;code>help forval&lt;/code> or &lt;code>help foreach&lt;/code> if your code isn&amp;rsquo;t working or if you have a vision you&amp;rsquo;re not sure how to realize.&lt;/p>
&lt;h4 id="looping-with-forval">Looping with &lt;code>forval&lt;/code>&lt;/h4>
&lt;pre tabindex="0">&lt;code>forvalues lname = range {
Stata commands referring to `lname'
}
&lt;/code>&lt;/pre>&lt;p>What does each component mean?&lt;/p>
&lt;ul>
&lt;li>&lt;code>forvalues&lt;/code>: this is the command. You can abbreviate it as &lt;code>forval&lt;/code>.&lt;/li>
&lt;li>&lt;code>lname&lt;/code>: this is a variable you make up. Often, people will just use &lt;code>i&lt;/code>, becuase we&amp;rsquo;re just counting. It will take on the values in &lt;code>range&lt;/code> as it increments through the loop. It is a &lt;strong>local&lt;/strong> variable, meaning that you have to call it using &lt;code>`lname'&lt;/code> and not as lname (need those punctuation marks!), and that it is only saved as long as your do-file is running.&lt;/li>
&lt;li>&lt;code>range&lt;/code>: this is the set of values that the local variable will iterate over&lt;/li>
&lt;li>Brackets: Open bracket needs to be on same line as the &lt;code>forval&lt;/code> command. Close bracket needs to be on its own line.&lt;/li>
&lt;/ul>
&lt;pre tabindex="0">&lt;code>forval i = 0/2{
gen labfor`i' = labforce == `i'
}
&lt;/code>&lt;/pre>&lt;p>What does this do? It creates a loop for which local variable &lt;code>`i'&lt;/code> is first 0, then 1, then 2. Within the loop, it generates &lt;code>labfor0&lt;/code>, which is equal to 1 if &lt;code>labforce&lt;/code> equals 0 (not in universe), it generates &lt;code>labfor1&lt;/code>, which is equal to 1 if &lt;code>labforce&lt;/code> equals 1 (not in labor force), and it generates &lt;code>labfor2&lt;/code>, which is equal to 1 if &lt;code>labforce&lt;/code> equals 2 (in labor force).&lt;/p>
&lt;p>Applied ACS example: create race indicators in a loop.&lt;/p>
&lt;pre tabindex="0">&lt;code>foreach r in 1 2 3 {
gen race_`r' = race == `r'
}
&lt;/code>&lt;/pre>&lt;p>Use &lt;code>tab race, nolabel&lt;/code> to see the codes you want to include.&lt;/p>
&lt;p>The choice of ranges can be done in other ways:&lt;/p>
&lt;ul>
&lt;li>&lt;code>forval i = 0/10&lt;/code>: hits every integer between 0 and 10 - 0, 1, 2, &amp;hellip; 10&lt;/li>
&lt;li>&lt;code>forval i = 1(10)100&lt;/code>: starts at 1, then increments by 10, stopping at 100: 1, 11, 21, 31, &amp;hellip; 91&lt;/li>
&lt;li>&lt;code>forvalues k = 5 10 to 300&lt;/code>: starts at 10, then increments by 5 until 300: 5, 10, 15, &amp;hellip;&lt;/li>
&lt;/ul>
&lt;p>See &lt;code>help forval&lt;/code> for more options&lt;/p>
&lt;h4 id="looping-with-foreach">Looping with &lt;code>foreach&lt;/code>&lt;/h4>
&lt;p>This command lets you loop through number lists (like above), but also through sets of variables, values, names, etc. You can approach it two ways:&lt;/p>
&lt;ul>
&lt;li>Do not specify the type of list, use &lt;strong>in&lt;/strong>: &lt;code>foreach lname in list&lt;/code>:&lt;/li>
&lt;li>Specify the type of list (&lt;code>listtype&lt;/code>), use &lt;strong>of&lt;/strong>: &lt;code>foreach lname of listtype list&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>This is confusing until we see examples:&lt;/p>
&lt;pre tabindex="0">&lt;code>foreach x in &amp;quot;rice wheat corn rye barley oats&amp;quot; {
display &amp;quot;`x'&amp;quot;
}
&lt;/code>&lt;/pre>&lt;p>This will start with &lt;code>x&lt;/code> equal to the string &amp;ldquo;rice&amp;rdquo;. Then, it will run with &lt;code>x&lt;/code> equal to &amp;ldquo;wheat&amp;rdquo;, etc.&lt;/p>
&lt;pre tabindex="0">&lt;code> foreach num of numlist 1 4/8 13(2)21 103 {
display `num'
}
&lt;/code>&lt;/pre>&lt;p>This will loop over 1, 4, 5, 6, 7, 8, 13, 15, 17, &amp;hellip;&lt;/p>
&lt;p>You can loop over variable names too!&lt;/p>
&lt;pre tabindex="0">&lt;code>foreach var of varlist inc* {
summarize `var',d
}
&lt;/code>&lt;/pre>&lt;p>This summarizes (with detail) each variable that starts with &lt;code>inc&lt;/code>&lt;/p>
&lt;h3 id="working-with-binary-independent-variables">Working with binary independent variables&lt;/h3>
&lt;p>When you are representing a categorical variable with a set of binary variables, there is a slow way and a fast way to integrate them.&lt;/p>
&lt;ul>
&lt;li>Slow way: generate the binary variables you want, and include them. This is good when you want to be precise about your omitted variable, or when you want to create complicated binary categories&lt;/li>
&lt;/ul>
&lt;pre tabindex="0">&lt;code>gen white_nh = race == 1 &amp;amp; hispan == 0
gen black_nh = race == 2 &amp;amp; hispan == 0
gen hisp = hispan != 0
gen other = white_nh == 0 &amp;amp; black_nh == 0 &amp;amp; hisp == 0
regress incwage black_nh hisp other
&lt;/code>&lt;/pre>&lt;p>Here, white, non-Hispanic is the omitted &amp;ldquo;reference&amp;rdquo; category.&lt;/p>
&lt;ul>
&lt;li>Fast way: tell Stata to create a binary variable for each value of a categorical variable.&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> This is good when you aren&amp;rsquo;t trying to do anything complicated and when you want to be quick - very useful if you want something like state-level dummies.&lt;/li>
&lt;/ul>
&lt;pre tabindex="0">&lt;code>regress incwage i.race
&lt;/code>&lt;/pre>&lt;p>Note that this will work only if your categorical variable is numeric. If it&amp;rsquo;s a string you&amp;rsquo;ll get an error. You can fix it by adding a &lt;code>xi:&lt;/code> prefix, like so:&lt;/p>
&lt;pre tabindex="0">&lt;code>xi: regress incwage i.race
&lt;/code>&lt;/pre>&lt;p>When we include a dummy variable for every value of a categorical variable, like above, we call those &amp;ldquo;fixed effects.&amp;rdquo; We&amp;rsquo;ll talk about these more soon.&lt;/p>
&lt;h2 id="reading-regression-tables-reminder">Reading regression tables (reminder!)&lt;/h2>
&lt;figure id="figure-labeled-stata-output">
&lt;a data-fancybox="" href="../regression-label.png" data-caption="Labeled Stata output">
&lt;img src="../regression-label.png" alt="" >
&lt;/a>
&lt;figcaption>
Labeled Stata output
&lt;/figcaption>
&lt;/figure>
&lt;h2 id="workflow" class="unnumbered">Workflow overview&lt;/h2>
&lt;ol>
&lt;li>Load the dataset and start your log file.&lt;/li>
&lt;li>Inspect variables and coding (&lt;code>describe&lt;/code>, &lt;code>tab&lt;/code>, &lt;code>tab ... , nolabel&lt;/code>).&lt;/li>
&lt;li>Create binary indicators needed for your analysis.&lt;/li>
&lt;li>Run baseline regressions with robust standard errors.&lt;/li>
&lt;li>Add controls or fixed effects and compare coefficients.&lt;/li>
&lt;li>Interpret results and answer the worksheet questions.&lt;/li>
&lt;/ol>
&lt;h2 id="lab-4-worksheet" class="unnumbered">Lab 4 Worksheet&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>Your written up answers to exercise questions (1) - (17). This can be typed or written out then scanned (or photographed), in any reasonable format.&lt;/li>
&lt;li>The do-file you’ve created that runs this analysis&lt;/li>
&lt;li>A log file that contains the results from this exercise.&lt;/li>
&lt;/ul>
&lt;h3 id="questions">Questions&lt;/h3>
&lt;p>Download the do-file template and data files. Personalize the file paths so that you can run it and open your &lt;code>acs2024_2pct.dta&lt;/code> file. You can also work with a blank data file if you&amp;rsquo;re more comfortable - just make sure you remember to include commands to start and close your log file.&lt;/p>
&lt;p>&lt;em>Use robust standard errors in all regressions&lt;/em>&lt;/p>
&lt;p>Example:&lt;/p>
&lt;pre tabindex="0">&lt;code>regress incwage educ labforce, robust
&lt;/code>&lt;/pre>&lt;ol>
&lt;li>
&lt;p>Let&amp;rsquo;s practice with loops! Download
&lt;a href="../materials/loop_example.do">loop_example.do&lt;/a> and paste the code into your sample. Run it and look at the output. In your do-file, write comments that describe what each loop is going.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Now, go back to your &lt;code>acs2024_2pct.dta&lt;/code> file and do-file template. Adjust your do-file template so that it loads &lt;code>acs2024_2pct.dta&lt;/code> and starts a log.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Restrict your sample to individuals ages 25-54.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create a new variable, &lt;code>birthyr&lt;/code>, equal to each individual&amp;rsquo;s year of birth: &lt;code>gen birthyr = 2024 - age&lt;/code>. Is there any potential imprecision or error in this variable?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Then, write a loop to generate a dummy variable for each possible value of birth year.&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- Write a loop to generate a dummy variable for each possible value of employment status, `empstat`. That is, you would have `_empstat1`, a binary variable for whether `empstat == 1`, `_empstat2`, a binary variable for whether `empstat == 2`, etc. (There is a faster way to do this, using `xi i.empstat`, but we're learning about loops, so just go with it.) -->
&lt;ol start="6">
&lt;li>
&lt;p>Look through the available list of data (note,
&lt;a href="https://cps.ipums.org/cps/" target="_blank" rel="noopener">IPUMS&lt;/a> has full
documentation of all variables). Based on this data, think of a
research question for your lab of the form, &amp;ldquo;What is the
relationship between .&amp;hellip; and ...?&amp;rdquo;. Pick a dependent variable
that is &lt;strong>continuous&lt;/strong>. &lt;em>(Because a later question asks you to
explore race/ethnicity controls, please do not use a race/ethnicity
variable for $X$.)&lt;/em>&lt;/p>
&lt;p>Research question:\&lt;/p>
&lt;p>Dependent variable ($Y$):\&lt;/p>
&lt;p>Key independent variable ($X$):\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using the data available, write a reasonable population model,
including your key independent variable along with a set of likely
relevant independent variables (somewhere between 2 and 5 additional
variables). Before estimating your regression, you should tabulate
each variable to make sure you are interpreting it correctly.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In words, what exactly will your estimated regression tell us?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What do you hypothesize the answer to your research question is?
(i.e. strong positive, weak negative, none)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Before you estimate your model, make sure you don&amp;rsquo;t have any N/A
values coded. For example, if &lt;code>incwage&lt;/code> is not applicable, it is
coded as &lt;code>9999999&lt;/code>. Tabulate or summarize your data to check for any
values like this. Replace any values as missing if they are equal to
some N/A code (see above).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate the relationship between $X$ and $Y$ using simple linear
regression (excluding any other covariates). Write your results in
equation form and report the $R^2$. How many degrees of freedom do
you have?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate the relationship between $X$ and $Y$ using multiple linear
regression (including other covariates). That is, estimate the
population model you wrote earlier. Write your results in equation
form and report the $R^2$. How many degrees of freedom do you have?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using your multivariate linear regression from the previous step, set up a
hypothesis test for your parameter of interest, the $\beta$
associated with your key independent variable, $X$. What do you
find? What is the p-value? What is the interpretation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Besides your key independent variable, which other variables are
statistically significant at the five-percent level?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A lot of student research papers will look at differences in outcomes by
gender and by racial/ethnic groups. U.S. surveys like the CPS, ACS, and Census treat race and
ethnicity a little strangely, and it can take some practice to get
comfortable.&lt;/p>
&lt;p>There are two variables commonly used to identify a person&amp;rsquo;s race
and ethnicity: the &lt;code>race&lt;/code> and the &lt;code>hispan&lt;/code> variable.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>What share of the sample is White, non-Hispanic?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What share of the sample is Hispanic/Latino?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A common way to summarize the racial/ethnic make-up of the U.S.
is the following categories:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>White, non-Hispanic&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Black, non-Hispanic&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Hispanic/Latino&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Asian, non-Hispanic&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Other&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Make a table that shows the distribution of the population into
these five groupings.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>Estimate your multiple linear regression model from
earlier, but include the race/ethnicity variables that you created
in the previous part. How
do the inclusion of these factors affect your estimates of the
relationship between $Y$ and $X$?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Now, add &amp;ldquo;birth-year fixed effects&amp;rdquo; to your regression that you generated earlier. Because there is a set of binary 0/1 variables, one for each year of birth, they will essentially pull out any mean differences in your dependent variable at the birth-year level - so if your outcome variable is different for people born in 1971 vs 1971 on average, these variables will take care of it. What is the omitted birth year? How
do the inclusion of these factors affect your estimates of the
relationship between $Y$ and $X$?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="video">Video&lt;/h2>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/SFp6pBFAghY" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div>
&lt;h2 id="residuals" class="unnumbered">A note on &amp;ldquo;well-behaved&amp;rdquo; residuals&lt;/h2>
&lt;p>There are three characteristics of &amp;ldquo;well-behaved&amp;rdquo; residuals:&lt;/p>
&lt;ol>
&lt;li>The residuals &amp;ldquo;bounce randomly&amp;rdquo; around the 0 line. This suggests that the assumption that the relationship is linear is reasonable.&lt;/li>
&lt;li>The residuals roughly form a &amp;ldquo;horizontal band&amp;rdquo; around the 0 line. This suggests that the variances of the error terms are equal.&lt;/li>
&lt;li>No one residual &amp;ldquo;stands out&amp;rdquo; from the basic random pattern of residuals. This suggests that there are no outliers.&lt;/li>
&lt;/ol>
&lt;p>We don&amp;rsquo;t want to overweight the importance of this, but it can be a helpful diagnostic to look for outliers, strange patterns.&lt;/p>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>&lt;code>i.race&lt;/code> adds a dummy for every race category and estimates effects relative to the omitted category. Don’t manually include a full set of dummies with an intercept, or you’ll run into perfect multicollinearity (the “dummy variable trap”).&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>There is a faster way to do this, using &lt;code>xi i.birthyr&lt;/code>, but we&amp;rsquo;re learning about loops, so just go with it.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Research Paper: Annotated Bibliography</title><link>https://econ3500s26.netlify.app/assignment/rp-03-annotated/</link><pubDate>Mon, 16 Feb 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-03-annotated/</guid><description>&lt;p>&lt;strong>
&lt;a href="../RP-03-annotated.pdf">Print-friendly PDF&lt;/a>&lt;/strong>&lt;/p>
&lt;h2 id="objective" class="unnumbered">Objective&lt;/h2>
&lt;p>The goal of this submission is to help you narrow and refine your question while situating your work in the broader economics literature. This will make writing your research paper much easier as well!&lt;/p>
&lt;h2 id="whatis" class="unnumbered">What is an annotated bibliography?&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/h2>
&lt;blockquote>
&lt;p>A bibliography is a list of sources (books, journals, Web sites, periodicals, etc.) one has used for researching a topic. Bibliographies are sometimes called &amp;ldquo;References&amp;rdquo; or &amp;ldquo;Works Cited&amp;rdquo; depending on the style format you are using. A bibliography usually just includes the bibliographic information (i.e., the author, title, publisher, etc.).&lt;/p>
&lt;/blockquote>
&lt;blockquote>
&lt;p>An annotation is a summary and/or evaluation. Therefore, an annotated bibliography includes a summary and/or evaluation of each of the sources.&lt;/p>
&lt;/blockquote>
&lt;h2 id="mission" class="unnumbered">What do I need to do&lt;/h2>
&lt;p>Pick the idea you proposed that is most promising. You may refine it based on feedback, further reflection, etc.&lt;/p>
&lt;p>Based on that idea, identify and annotate &lt;strong>six&lt;/strong> sources that are relevant to your project.&lt;/p>
&lt;ul>
&lt;li>At least &lt;strong>four&lt;/strong> must be peer-reviewed, academic journal articles.&lt;/li>
&lt;li>At least &lt;strong>two&lt;/strong> must be from the list of economic journals included below.&lt;/li>
&lt;/ul>
&lt;img src="../materials/annotate_source.png" width=500 alt="8 total, 4 peer-reviewed, 2 econ journal">
&lt;p>For each one, include the following:&lt;/p>
&lt;ol>
&lt;li>Full bibliographic information, following MLA, APA, or Chicago style.&lt;/li>
&lt;li>The annotations, written as a paragraph or as bullet points. These will include a few things:
a. Nature of source: peer-reviewed academic journal (what discipline), working paper, white paper (ie reports from major organizations), other
b. Key findings or arguments of the source: It&amp;rsquo;s in your interest to be quite detailed here (&lt;em>I like to use these to draw on when I write my paper&lt;/em>)
c. Assessment: How does it compare to other sources? (findings support or contrast)? Is the source biased or objective? What is the goal of the source?
d. Reflection: Is this useful to your question? How does it help you shape your argument? How can you use this source in your project? (&lt;em>Here I will sometimes add sample sentences I will write&lt;/em>)&lt;/li>
&lt;/ol>
&lt;p>After this, write an expanded version of your idea proposal (just one the idea you&amp;rsquo;ve chosen) that states your refined research question and describes how your planned project fits into the literature you found. This will be 2-3 paragraphs.&lt;/p>
&lt;p>&lt;strong>Consult the
&lt;a href="#rubric">grading rubric&lt;/a> for additional guidance!&lt;/strong>&lt;/p>
&lt;h2 id="acceptable-economics-journals">Acceptable economics journals&lt;/h2>
&lt;p>At least two article must come from a relevant economics journal in the &lt;strong>top 200&lt;/strong> from the following
&lt;a href="https://ideas.repec.org/top/top.journals.simple.html" target="_blank" rel="noopener">RePeC list&lt;/a>. If your topic is very specific, the &lt;strong>top 400&lt;/strong> is also acceptable, but you need to get prior approval from me.&lt;/p>
&lt;h2 id="submission-requirements" class="unnumbered">Submission requirements&lt;/h2>
&lt;p>Submit the annotated bibliography plus summary as one document on Brightspace&lt;/p>
&lt;p>If you are working in pairs, submit one bibliography for two people.&lt;/p>
&lt;h2 id="tips" class="unnumbered">Tips&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Search for your topic using
&lt;a href="http://library.uvm.edu/research/research_databases" target="_blank" rel="noopener">EconLit&lt;/a>. If you at home, you can select
&lt;a href="http://library.uvm.edu/" target="_blank" rel="noopener">&amp;ldquo;Connect Off Campus&amp;rdquo;&lt;/a> from the main library page&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When determining if an article might be useful, start by focusing on the abstract only. Among those that pass your abstract test, then just read the introduction to see if they are still going to help.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When you&amp;rsquo;ve found an article or two that are useful, you can search forward and backward to find more!&lt;/p>
&lt;ul>
&lt;li>Check the references section to find articles that were cited in your paper&lt;/li>
&lt;li>Use
&lt;a href="https://scholar.google.com/" target="_blank" rel="noopener">Google Scholar&lt;/a> to find articles that cite your paper&lt;/li>
&lt;/ul>
&lt;img src="../materials/scholar_cited.png" width=500 alt="works cited">
&lt;/li>
&lt;li>
&lt;p>Note that working paper series are &lt;em>not&lt;/em> peer-reviewed journal articles.
&lt;a href="https://www.nber.org/papers?page=1&amp;amp;perPage=50&amp;amp;sortBy=public_date" target="_blank" rel="noopener">NBER Working Papers&lt;/a>, for example, are an excellent resoucre but not peer-reviewed. Most reports from large organizations are not peer reviewed.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="rubric" class="unnumbered">Rubric&lt;/h2>
&lt;p>You will receive up to 50 points on this assignment:&lt;/p>
&lt;ul>
&lt;li>Each source is worth 5 points (30 total), with one point per element listed above.&lt;/li>
&lt;li>The idea summary is worth 10 points, with full credit granted if you present your research question, describe in words how you will answer it, and then describe how what you plan to do fits in with the literature you&amp;rsquo;ve reviewed.&lt;/li>
&lt;li>The final 10 points are for meeting the source selection critera (4 peer-reviewed, 2 top 200 economics journals)&lt;/li>
&lt;/ul>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>Pulled from
&lt;a href="https://owl.purdue.edu/owl/general_writing/common_writing_assignments/annotated_bibliographies/index.html" target="_blank" rel="noopener">Purdue OWL&lt;/a>&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Problem set 2</title><link>https://econ3500s26.netlify.app/assignment/02-ps/</link><pubDate>Thu, 05 Feb 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/02-ps/</guid><description>&lt;h2 id="welcome">Welcome&lt;/h2>
&lt;p>Make sure you submit your assignment on Brightspace by the deadline ☝️ ☝️&lt;/p>
&lt;p>See the exercises below, or you can
&lt;a href="../02-ps.pdf">download them as a pdf&lt;/a>. You can download the data file you need for questions 4 and 5
&lt;a href="../collegedistance.dta">here&lt;/a>.&lt;/p>
&lt;h2 id="what-do-i-submit">What do I submit?&lt;/h2>
&lt;ul>
&lt;li>Your written up answers to exercise questions. If you work on a piece of paper, please scan using some sort of phone software (like Microsoft Lens or Adobe Scan) rather than just taking a picture. You can also integrate your written answers into your do-file, just be clear about it.&lt;/li>
&lt;li>A do-file that runs your Stata analysis (for questions 4 and 5).&lt;/li>
&lt;li>A log file that includes the output from running your do-file (for questions 4 and 5).&lt;/li>
&lt;/ul>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>The following table shows, for eight vintages of delicious wine, purchases per buyer ($y$) and the wine buyer&amp;rsquo;s rating ($x$) in a given year:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>1&lt;/th>
&lt;th>2&lt;/th>
&lt;th>3&lt;/th>
&lt;th>4&lt;/th>
&lt;th>5&lt;/th>
&lt;th>6&lt;/th>
&lt;th>7&lt;/th>
&lt;th>8&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>$x$&lt;/td>
&lt;td>3.6&lt;/td>
&lt;td>3.3&lt;/td>
&lt;td>2.8&lt;/td>
&lt;td>2.3&lt;/td>
&lt;td>2.7&lt;/td>
&lt;td>2.9&lt;/td>
&lt;td>2.0&lt;/td>
&lt;td>2.4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>$y$&lt;/td>
&lt;td>24&lt;/td>
&lt;td>21&lt;/td>
&lt;td>22&lt;/td>
&lt;td>20&lt;/td>
&lt;td>18&lt;/td>
&lt;td>13&lt;/td>
&lt;td>9&lt;/td>
&lt;td>16&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>a. Estimate &lt;em>by hand&lt;/em> the regression of purchases per buyer on the buyer&amp;rsquo;s rating.&lt;/p>
&lt;p>b. Interpret the slope of the estimated regression line.&lt;/p>
&lt;p>c. Interpret the intercept of the estimated regression line.&lt;/p>
&lt;p>d. Use your estimated regression line to compute the predicted purchases for a wine with rating $x=2.8$. Then compute the residual for observation (3) with $x=2.8$ and $y=22$.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- (SW4.2) -->
&lt;ol start="2">
&lt;li>
&lt;p>Suppose that a random sample of 200 20-year-old men is selected from a population and that these men&amp;rsquo;s height and weight are recorded. A regression of weight (measured in pounds) on height (measured in inches) yields&lt;/p>
&lt;p>$$\widehat{Weight}=-99.41 + 3.94 Height$$&lt;/p>
&lt;p>$R^2 = 0.81$; $SER = 10.2$&lt;/p>
&lt;p>a. What is the predicted weight for someone who is 72 inches tall? 66 inches tall?&lt;/p>
&lt;p>b. One 20-year-old man has a late growth spurt and grows 1.5 inches over the course of the year. What is the regression&amp;rsquo;s prediction for the increase in his weight?&lt;/p>
&lt;p>c. Suppose that you want to translate the results of this equation into centimeters and kilograms. What are the regression estimates from this new regression? Give all results, including estimated coefficients, $R^2$, and $SER$.&lt;/p>
&lt;p>d. Interpret the $R^2$ value. Does it indicate anything about whether these estimates are likely to be biased? Explain.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Consider the savings function:&lt;/p>
&lt;p>$$sav = \beta_0 + \beta_1 inc + u, u = e\sqrt{inc}$$&lt;/p>
&lt;p>where $e$ is a random variable with $E(e) = 0$ and $Var(e) = \sigma^2_e$. Assume that $e$ is independent of $inc$.&lt;/p>
&lt;p>a. Show that $E(u|inc)=0$, so that the key zero conditional mean assumption is satisfied. [Hint: If $e$ is independent of $inc$, then $E(e|inc) = E(e)$]&lt;/p>
&lt;p>b. Show that $Var(u|inc) = \sigma^2_einc$, so that the homoskedasticity assumption is violated. In particular, the variance of $sav$ increases with $inc$. [Hint: $Var(e|inc) = Var(e)$ if $inc$ and $e$ are independent!]&lt;/p>
&lt;p>c. Why might it be reasonable to assume that the variance of savings increases with family income?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The data file
&lt;a href="../collegedistance.dta">&lt;code>collegedistance.dta&lt;/code>&lt;/a> contains data from a random sample of high school seniors interviewed in 1980 and re-interviewed in 1986.&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> Use these data to investigate the relationship between the number of completed years of education for young adults and the distance from each student&amp;rsquo;s high school to the nearest four-year college. (Proximity to college lowers the cost of education, so that students who live closer to a four-year college should, on average, complete more years of higher education.)&lt;/p>
&lt;p>a. Run a regression of years of completed education ($ED$) on distance to the nearest college ($Dist$), where $Dist$ is measured in tens of miles. (For example, $Dist=2$ means that the distance is 20 miles.) You can regress a dependent variable &lt;code>y&lt;/code> on an independent variable &lt;code>x&lt;/code> with the command &lt;code>regress y x&lt;/code>]. Write the equation you estimated in the form $\widehat{ED} = \beta_0 + \beta_1 Dist$&lt;/p>
&lt;p>b. How does the average value of years of completed schooling change when colleges are built close to where students go to high school?&lt;/p>
&lt;p>c. Bob&amp;rsquo;s high school was 20 miles from the nearest college. Predict Bob&amp;rsquo;s years of completed education using the estimated regression. How would the prediction change if Bob lived 10 miles from the nearest college?&lt;/p>
&lt;p>d. Does distance to college explain a large fraction of the variance in educational attainment across individuals? Explain.&lt;/p>
&lt;p>e. Provide an example of a factor that might cause this model to violate the zero conditional mean assumption. Explain your reasoning.&lt;/p>
&lt;p>f. What is the value of the standard error of the regression?&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup> What are the units for the standard error (meters, grams, years, dollars, cents, or something else)?&lt;/p>
&lt;p>g. Is the estimated regression slope coefficient statistically significant at the 10% level? What is the p-value associated with coefficient&amp;rsquo;s t-statistic?&lt;/p>
&lt;p>h. Construct a 90% confidence interval for the slope coefficient.&lt;/p>
&lt;p>i. Construct a 90% confidence interval for the intercept.&lt;/p>
&lt;p>j. Estimate a regression that restricts the sample to men, and calculate a 90% confidence interval for the slope. Do the same, restricting the sample to women. Does it look like the effect of distance on completed years of education is different?&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Re-estimate the regression from 4a using heteroskedasticity robust standard errors.&lt;/p>
&lt;p>a. Report the slope estimate, robust standard error, t-statistic, p-value, and a 90% confidence interval for the slope.&lt;/p>
&lt;p>b. Compare your robust results to the results that assume homoskedasticity from 4g–4h. Do your conclusions about statistical significance at the 10% level change? Why might the standard errors differ?&lt;/p>
&lt;p>c. In practice, which set of standard errors should you report, and why?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>These data were provided by Professor Cecilia Rouse of Princeton University and were used in her paper &amp;ldquo;Democratization or Diversion? The Effect of Community Colleges on Educational Attainment,&amp;rdquo; Journal of Business and Economic Statistics, April 1995, 12(2): 217–224.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>There are a few ways to find it in Stata&amp;rsquo;s output. The easiest is to note that &amp;ldquo;root MSE&amp;rdquo; is the square root of the SER.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>Note that we cannot make claims about whether they are statistically different because the estimates come from two different samples! A hypothesis test here would be awesome, but we need to build a few more skills to do this.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Lab 3: Regression</title><link>https://econ3500s26.netlify.app/assignment/03-lab/</link><pubDate>Wed, 04 Feb 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/03-lab/</guid><description>&lt;h2 id="lab-content">Lab Content&lt;/h2>
&lt;p>&lt;strong>
&lt;a href="../03-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;h3 id="materials" class="unnumbered">Materials&lt;/h3>
&lt;ul>
&lt;li>
&lt;a href="../materials/graduation.dta">&lt;code>graduation.dta&lt;/code>&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Download these and save in your lab folder (perhaps you named it something like &lt;code>econ3500/labs&lt;/code>?)&lt;/p>
&lt;p>👁️ If your do-file opens in a browser tab, you may want to instead right click and select &amp;ldquo;Save Link As&amp;rdquo; 👁️&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Before you start&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>Set your working directory in Stata to the folder where you saved the data and template.&lt;/li>
&lt;li>Start a log file right away: &lt;code>log using lab3.log, replace&lt;/code>&lt;/li>
&lt;li>Make sure you can open the dataset with &lt;code>use graduation.dta, clear&lt;/code>.&lt;/li>
&lt;/ol>
&lt;/div>
&lt;/div>
&lt;h3 id="objectives" class="unnumbered">Objectives&lt;/h3>
&lt;p>By the end of this tutorial you should be able to complete the following
tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Estimate and interpret a simple (two-variable) linear regression in levels, using continuous and binary variables, and use heteroskedasticity-robust standard errors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Identify $\hat{\beta_0}$, $\hat{\beta_1}$, standard errors, $SST$, $SSE$, $SSR$, and $R^2$ in Stata output and interpret them&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Calculate predicted values and residuals&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create scatter plots&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate a multivariate linear regression&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="key-commands" class="unnumbered">Key commands &lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Estimation commands&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>regress var1 var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">Estimate a regression, with &lt;code>var1&lt;/code> as the dependent variable and &lt;code>var2&lt;/code> as the independent variable(s)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>regress var1 var2, robust&lt;/code>&lt;/td>
&lt;td style="text-align:right">Estimate a regression with heteroskedasticity-robust standard errors&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>correlate var1 var2 ... varn&lt;/code>&lt;/td>
&lt;td style="text-align:right">Calculate correlation coefficients of all listed variables, from &lt;code>var1&lt;/code> to &lt;code>varn&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>graph twoway scatter var1 var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">make a scatter plot with &lt;code>var1&lt;/code> on the y-axis and &lt;code>var2&lt;/code> on the x-axis.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Post-estimation commands&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>predict newvar, xb&lt;/code>&lt;/td>
&lt;td style="text-align:right">Use estimated regression coefficients to predict $\widehat{y}$. It will generate &lt;code>newvar&lt;/code>&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>predict newvar, residuals&lt;/code>&lt;/td>
&lt;td style="text-align:right">Use estimated regression coefficients to predict residuals, generating &lt;code>newvar&lt;/code>&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Working with data, missing values&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>count if var1 == 1&lt;/code>&lt;/td>
&lt;td style="text-align:right">count observations if the expression &lt;code>var1 == 1&lt;/code> is true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>count if !missing(var1)&lt;/code>&lt;/td>
&lt;td style="text-align:right">count observations if &lt;code>var1&lt;/code> is not missing&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>drop if missing(var1)&lt;/code>&lt;/td>
&lt;td style="text-align:right">drop all observations where &lt;code>var1&lt;/code> is missing&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tab var1, missing&lt;/code>&lt;/td>
&lt;td style="text-align:right">Include missing values in tabulation&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="reading-regression-tables">Reading regression tables&lt;/h3>
&lt;img src="../regression-label.png" width=500 alt="labelled Stata output">
&lt;div class="alert alert-warning">
&lt;div>
&lt;p>&lt;strong>Quick reminders&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>The coefficient estimates do &lt;strong>not&lt;/strong> change when you add &lt;code>, robust&lt;/code>.&lt;/li>
&lt;li>The standard errors &lt;strong>do&lt;/strong> change when you add &lt;code>, robust&lt;/code>.&lt;/li>
&lt;li>Run &lt;code>predict&lt;/code> immediately after your regression. If you run another command in between, Stata will overwrite the stored model.&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;h2 id="lab-3-worksheet" class="unnumbered">Lab 3 Exercise&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>Your written answers to exercise questions (1) - (13). This can be typed or written out then scanned (or photographed), in any reasonable format.&lt;/li>
&lt;li>The do-file you&amp;rsquo;ve created that runs this analysis&lt;/li>
&lt;li>A log file that contains the results from this exercise.&lt;/li>
&lt;/ul>
&lt;h3 id="questions">Questions&lt;/h3>
&lt;p>Today, we&amp;rsquo;re going to look around at the graduation data set that we discussed in class,
&lt;a href="../materials/graduation.dta">&lt;code>graduation.dta&lt;/code>&lt;/a>.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Download the do-file template and data files. Personalize the file paths so that you can run it and open your &lt;code>graduation.dta&lt;/code> file. You can also work with a blank data file if you&amp;rsquo;re more comfortable - just make sure you remember to include commands to start and close your log file.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Take a look at &lt;code>graduation.dta&lt;/code>. How many observations are there? What is the distribution of treatment arms?&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>There are six &lt;em>continuous&lt;/em> food security variables&lt;sup id="fnref:5">&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref">5&lt;/a>&lt;/sup>. You can look for them with &lt;code>lookfor fs&lt;/code>. Pick one variable and write out a population model to determine the relationship between assignment to the graduation program and food security. For the rest of this lab, I refer to the variable you chose as &lt;code>foodsecurity&lt;/code>. If that&amp;rsquo;s going to irritate you, you can rename your variable like this: &lt;code>rename fsec5 foodsecurity&lt;/code>, using the variable name that you&amp;rsquo;ve chosen in place of &lt;code>fsec5&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Tabulate your food security value and check for missing observations. Drop any observations for which you have missing values of &lt;code>foodsecurity&lt;/code> (see above for how to do this). How many observations are remaining?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Hint&lt;/strong>
After you drop missing values, run &lt;code>count&lt;/code> to confirm your new sample size. Keep that number consistent for the rest of the lab.
&lt;/div>
&lt;/div>
&lt;ol start="5">
&lt;li>
&lt;p>Make a scatter plot of the relationship between your chosen food security variable and
graduation (Include this in your submitted problem set). Is this easy to interpret? Calculate and report
the associated correlation coefficient.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Conduct a t-test of whether the mean of &lt;code>foodsecurity&lt;/code> is different between those who did and did not receive the graduation program&lt;sup id="fnref:6">&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref">6&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Estimate the relationship between your chosen food security variable, &lt;code>foodsecurity&lt;/code> and assignment to the graduation program, &lt;code>graduation&lt;/code> using simple linear regression, with standard (homoskedasticity-assumed) standard errors. How do your t-statistics compare to what you found in the previous t-test? What was the impact of assignment to the graduation program on food security, based on your regression?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Re-estimate your regression, and this time adjust your standard errors to be heteroskedasticity-robust. Fill in the chart below with your estimates.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">Variable&lt;/th>
&lt;th style="text-align:right">Estimate&lt;/th>
&lt;th style="text-align:right">Variable&lt;/th>
&lt;th style="text-align:right">Estimate&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">$\hat{\beta_0}$&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;td style="text-align:right">$\hat{\beta_1}$&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">$R^2$&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;td style="text-align:right">$TSS$&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">$ESS$&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;td style="text-align:right">$SSR$&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">d.f.&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;td style="text-align:right">$SER$&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ol start="9">
&lt;li>
&lt;p>After that regression estimate, generate a new variable, &lt;code>predict_fs&lt;/code> equal to the predicted
value of your food security variable. Generate a second variable, &lt;code>resid_fs&lt;/code> equal to the
residual.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the mean of each variable? How does the mean of &lt;code>predict_fs&lt;/code>
compare to mean of &lt;code>foodsecurity&lt;/code> in your sample?&lt;sup id="fnref:7">&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref">7&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Examine the predicted value of your food security variable, &lt;code>predict_fs&lt;/code>, for the &lt;em>youngest&lt;/em> person in your
sample.&lt;sup id="fnref:8">&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref">8&lt;/a>&lt;/sup> What is its residual?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When we estimate a linear regression with no coefficients, sometimes
we&amp;rsquo;ll say we are &amp;ldquo;regressing on a constant.&amp;rdquo; Regress &lt;code>foodsecurity&lt;/code>
&lt;em>only&lt;/em> on a constant. What is $\hat{\beta_0}$, and how does it
compare to overall mean?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For this final step, I&amp;rsquo;d like you to play around with the data. Pick &lt;strong>one&lt;/strong> continuous dependent variable and &lt;strong>one&lt;/strong> continuous &lt;em>or&lt;/em> binary independent variable.&lt;sup id="fnref:9">&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref">9&lt;/a>&lt;/sup> You can look at the correlation between two variables, or you can look at the impact of one of the program dimensions (group coaching, group livelihood, etc) on a continuous outcome of interest.&lt;/p>
&lt;p>a. Write a population model you want to estimate.&lt;/p>
&lt;p>b. Estimate it using OLS, adjusting your standard errors to be heteroskedasticity-robust. Write an equation that reflects your estimated model in the form $\hat{y}=\hat{\beta_0} + \hat{\beta_1}x$, replacing $y$ and $x$ with your chosen variables and replacing $\hat{\beta_0}$ and $\hat{\beta_1}$ with your estimates.&lt;/p>
&lt;p>c. In 1-2 sentences, what do your results tell you, collectively?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>&lt;strong>Submission checklist&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Answers file (with your scatter plot and any tables you used)&lt;/li>
&lt;li>Do-file with comments for each question&lt;/li>
&lt;li>Log file that matches your do-file commands&lt;/li>
&lt;li>&lt;code>log close&lt;/code> at the end&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;!--
### A note on "well-behaved"" residuals {#residuals .unnumbered}
There are three characteristics of "well-behaved" residuals:
1. The residuals "bounce randomly" around the 0 line. This suggests that the assumption that the relationship is linear is reasonable.
2. The residuals roughly form a "horizontal band" around the 0 line. This suggests that the variances of the error terms are equal.
3. No one residual "stands out" from the basic random pattern of residuals. This suggests that there are no outliers.
We don't want to overweight the importance of this, but it can be a helpful diagnostic to look for outliers, strange patterns.
-->
&lt;h2 id="video-recording">Video Recording&lt;/h2>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/ShcIoFJWFRQ" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>Post-estimation commands must be run &lt;em>immediately&lt;/em> after a regression, while the regression results are still held in your local variables.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>Here, &lt;code>newvar&lt;/code> equals $\widehat{newvar_i} = \widehat{y_i} = \widehat{\beta_0} + \widehat{\beta_1}x_i$&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>Here, &lt;code>newvar&lt;/code> equals $\widehat{newvar_i} = \widehat{u_i} = y_i - \left(\widehat{\beta_0} + \widehat{\beta_1}x_i\right)$&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4" role="doc-endnote">
&lt;p>There are a few variables here, including &lt;code>treatment_arm&lt;/code>&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:5" role="doc-endnote">
&lt;p>Not &lt;code>fsec7&lt;/code>, which is categorical, or &lt;code>fsec&lt;/code> which is always equal to 1&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:6" role="doc-endnote">
&lt;p>Hint: &lt;code>ttest var1, by(var2)&lt;/code> will run a t-test of whether the mean of &lt;code>var1&lt;/code> is equal for two groups determined by &lt;code>var2&lt;/code>.&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:7" role="doc-endnote">
&lt;p>If they differ, you should make sure you have dropped all missing values of &lt;code>foodsecurity&lt;/code>! Try &lt;code>sum predict_fs foodsecurity&lt;/code> to see if the sample sizes are the same&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:8" role="doc-endnote">
&lt;p>Now is a good time to try out &lt;code>lookfor age&lt;/code>&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:9" role="doc-endnote">
&lt;p>Categorical variables that take on a just few observations, like the identity of your head of household, won&amp;rsquo;t work here. You&amp;rsquo;ll need to tabulate the variables to see what you&amp;rsquo;re working with&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Lab 2: Do-files</title><link>https://econ3500s26.netlify.app/assignment/02-lab/</link><pubDate>Tue, 27 Jan 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/02-lab/</guid><description>&lt;h2 id="lab-content">Lab Content&lt;/h2>
&lt;p>&lt;strong>
&lt;a href="../02-lab.pdf">Print-friendly pdf&lt;/a>&lt;/strong>&lt;/p>
&lt;h3 id="materials" class="unnumbered">Materials&lt;/h3>
&lt;ul>
&lt;li>The data file
&lt;a href="../materials/acs2024_2pct.dta">acs2024_2pct.dta&lt;/a>&lt;/li>
&lt;li>Do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Download these and save in your lab folder (perhaps you named it something like &lt;code>econ3500/labs&lt;/code>?)&lt;/p>
&lt;p>👁️ If your do-file opens in a browser tab, you may want to instead Right click and select &amp;ldquo;Save Link As&amp;rdquo; 👁️&lt;/p>
&lt;figure class="tight-figure" >
&lt;a data-fancybox="" href="https://econ3500s26.netlify.app/media/savelinkas.png" >
&lt;img src="https://econ3500s26.netlify.app/media/savelinkas.png" alt="" width="300" >
&lt;/a>
&lt;/figure>
&lt;h3 id="objectives" class="unnumbered">Objectives&lt;/h3>
&lt;p>By the end of this lab, you should be able to complete the following tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Create, run, and save a do-file&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Explore variables and generate new ones&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Be able to find help with Stata issues - find new commands, check and debug your work, etc.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>You will begin by loading the template do-file, dropping unused variables, and reporting how many variables remain. Then we are going to look at sample characteristics before excluding everyone under 23 and keep that restricted sample for the rest of the lab. Then, we will compare income and wages across age and gender groups and construct a post-secondary education indicator to investigate how the gender wage gap interacts with educational attainment.&lt;/p>
&lt;p>Before you start typing commands, skim the dataset we will work with by opening it in Stata. We now use
&lt;a href="../materials/acs2024_2pct">&lt;code>acs2024_2pct.dta&lt;/code>&lt;/a>, a 2.5% subsample of the 2024 American Community Survey.&lt;/p>
&lt;h3 id="key-commands" class="unnumbered">Key commands &lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Viewing data&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tab var1&lt;/code>&lt;/td>
&lt;td style="text-align:right">tabulate one variable, &lt;code>var1&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tab var1, missing&lt;/code>&lt;/td>
&lt;td style="text-align:right">tabulate &lt;code>var1&lt;/code>, include missing values&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tab var1, nolabel&lt;/code>&lt;/td>
&lt;td style="text-align:right">tabulate &lt;code>var1&lt;/code>, show values rather than labels (if applicable)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Summarizing data&lt;/td>
&lt;td style="text-align:right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tabstat var1&lt;/code>&lt;/td>
&lt;td style="text-align:right">calculate mean of &lt;code>var1&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tabstat var1,by(var2)&lt;/code>&lt;/td>
&lt;td style="text-align:right">calculate mean of &lt;code>var1&lt;/code> separately for each value of &lt;code>var2&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tabstat var1,by(var2) stat(mean count p25 p50 p75)&lt;/code>&lt;/td>
&lt;td style="text-align:right">calculate mean of &lt;code>var1&lt;/code> separately for each value of &lt;code>var2&lt;/code>, with added statistics&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Changing your data&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>gen newvar =var1&lt;/code>&lt;/td>
&lt;td style="text-align:right">generate a new variable, &lt;code>newvar&lt;/code>, and set it equal to values of &lt;code> var1&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>gen newvar =1 if var2 == [exp]&lt;/code>&lt;/td>
&lt;td style="text-align:right">generate a new variable, &lt;code>newvar&lt;/code>, and set it equal to 1 if &lt;code> var2&lt;/code> equals some expression, and missing otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>gen newvar = var2 == [exp]&lt;/code>&lt;/td>
&lt;td style="text-align:right">generate a new variable, &lt;code>newvar&lt;/code>, and set it equal to 1 if &lt;code> var2&lt;/code> equals some expression, and 0 otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>drop var1 var2 &lt;/code>&lt;/td>
&lt;td style="text-align:right">drop the variables &lt;code> var1&lt;/code> and &lt;code> var2&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>drop if [exp]&lt;/code>&lt;/td>
&lt;td style="text-align:right">drop observations for which &lt;code>exp&lt;/code> is true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>keep var1 var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">drop everything but &lt;code> var1&lt;/code> and &lt;code> var2&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>keep if [exp]&lt;/code>&lt;/td>
&lt;td style="text-align:right">keep observations &lt;em>only&lt;/em> if &lt;code>exp&lt;/code> is true&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Displaying your data&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>graph twoway histogram var1&lt;/code>&lt;/td>
&lt;td style="text-align:right">make a histogram for &lt;code>var1.&lt;/code> Check help files for more options&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Looking for more examples? Check out these
&lt;a href="https://geocenter.github.io/StataTraining/portfolio/01_resource/" target="_blank" rel="noopener">&lt;strong>Stata Cheat Sheets&lt;/strong>&lt;/a>&lt;/p>
&lt;p>Suppose I asked you to recreate your analysis from Lab 01. How long would it take you? If you used a do-file, you would just have to click a button, because your analysis would be replicable. We&amp;rsquo;re going to learn about the glory of do-files and a few other descriptive statistics tricks.&lt;/p>
&lt;p>The instant gratification of the Command window is tempting, but getting comfortable with do-files will save you lots of time, make collaboration easier, and reduce errors!&lt;/p>
&lt;h3 id="aside--bad-documentation-big-problems">Aside: Bad documentation, big problems&lt;/h3>
&lt;blockquote>
&lt;p>For an economist, the ﬁve most terrifying words in the English language are: I can’t replicate your results.But for economists Carmen Reinhart and Ken Rogoﬀ of Harvard, there are seven even more terrifying ones: I think you made an Excel error.&lt;/p>
&lt;p>–
&lt;a href="https://www.theatlantic.com/business/archive/2013/04/forget-excel-this-was-reinhart-and-rogoffs-biggest-mistake/275088/" target="_blank" rel="noopener">Matthew O’Brien, The Atlantic (18 April 2013)&lt;/a>&lt;/p>
&lt;/blockquote>
&lt;p>A summary from
&lt;a href="https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646" target="_blank" rel="noopener">The Conversation, (22 April, 2013)&lt;/a>&lt;/p>
&lt;blockquote>
&lt;p>Reinhart and Rogoff’s work showed average real economic growth slows (a 0.1% decline) when a country’s debt rises to more than 90% of gross domestic product (GDP) – and this 90% figure was employed repeatedly in political arguments over high-profile austerity measures&amp;hellip;&lt;/p>
&lt;p>The most serious was that, in their Excel spreadsheet, Reinhart and Rogoff had not selected the entire row when averaging growth figures: they omitted data from Australia, Austria, Belgium, Canada and Denmark.&lt;/p>
&lt;p>In other words, they had accidentally only included 15 of the 20 countries under analysis in their key calculation.&lt;/p>
&lt;p>When that error was corrected, the “0.1% decline” data became a 2.2% average increase in economic growth.&lt;/p>
&lt;p>So the key conclusion of a seminal paper, which has been widely quoted in political debates in North America, Europe Australia and elsewhere, was invalid.&lt;/p>
&lt;/blockquote>
&lt;figure id="figure-excel-error-business-insiderhttpswwwbusinessinsidercomthomas-herndon-michael-ash-and-robert-pollin-on-reinhart-and-rogoff-2013-4">
&lt;a data-fancybox="" href="https://econ3500s26.netlify.app/media/reinhart-rogoff-error.png" data-caption="&amp;lt;a href=&amp;#34;https://www.businessinsider.com/thomas-herndon-michael-ash-and-robert-pollin-on-reinhart-and-rogoff-2013-4&amp;#34;&amp;gt;Excel error (Business Insider)&amp;lt;/a&amp;gt;">
&lt;img src="https://econ3500s26.netlify.app/media/reinhart-rogoff-error.png" alt="" width="400" >
&lt;/a>
&lt;figcaption>
&lt;a href="https://www.businessinsider.com/thomas-herndon-michael-ash-and-robert-pollin-on-reinhart-and-rogoff-2013-4">Excel error (Business Insider)&lt;/a>
&lt;/figcaption>
&lt;/figure>
&lt;h3 id="do-files-and-the-do-file-editor">Do-files and the do-file editor&lt;/h3>
&lt;p>You can get pretty far in Stata relying on the Command and Review window, but we may want a record of the commands we want to run for our analysis. One thing that makes Stata different from a program like Excel is that you can create do-files, essentially small programs that will
run your analysis again and again, in exactly the same way. For
econometric analysis this is CRUCIAL.&lt;/p>
&lt;p>A do-file can be written in any text file and then saved with the extension &lt;code>.do&lt;/code>, but we&amp;rsquo;ll use the do-file editor. You can start a new do-file by
clicking on the do-file button. Or, you can open the do-file template.&lt;/p>
&lt;p>The do-file editor is where we will write our programs, and it has some nice color coding to help us avoid mistakes. For your problem sets and
papers, you must ALWAYS submit a do-file along with your results. Some people will like to practice in the Command window and then copy the
commands they&amp;rsquo;re satisfied with to the do-file, while others will prefer to work entirely in the do-file. It&amp;rsquo;s your call, though the second one
is a little less risky.&lt;/p>
&lt;h4 id="comment-comment-comment">Comment, comment, comment&lt;/h4>
&lt;p>Do-files are used to record your past work and possibly to share your
work with others. It&amp;rsquo;s important to properly &lt;strong>document&lt;/strong> your work
using comments. There are three ways to comment&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Comment the whole line with an asterisk&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Comment the whole line or part of a line with two forward slashes (&lt;code>//&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use slash-asterisk to open (&lt;code>/*&lt;/code>) and close (&lt;code>*/&lt;/code>) a comment section&lt;/p>
&lt;/li>
&lt;/ol>
&lt;figure >
&lt;a data-fancybox="" href="https://econ3500s26.netlify.app/media/stata-comment.png" >
&lt;img src="https://econ3500s26.netlify.app/media/stata-comment.png" alt="" width="400" >
&lt;/a>
&lt;/figure>
&lt;p>The do-file editor will turn all your comments green so you don&amp;rsquo;t get
confused.&lt;/p>
&lt;h3 id="programming-tips">Programming tips&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Put everything in a do-file!&lt;/strong> An important feature of any good
research project is that the results should be reproducible. For
Stata the easiest way to do this is to create a text file that lists
all your commands in order, so anyone can re-run all your Stata work
on a project anytime. Such text files that are produced within Stata
or linked to Stata are called do-files, because they have an
extension .do (like &lt;code>intro_exercise.do&lt;/code>). These files feed commands
directly into Stata without you having to type or copy them into the
command window.&lt;/p>
&lt;p>Imagine you&amp;rsquo;re just about done with the analysis for your research
paper. While working on the final regression, you discover that one
of your variables wasn&amp;rsquo;t cleaned properly, and you need to drop some
outliers from the data. Do you correct it and redo everything from
scratch? Could you even do that? How long would it take?&lt;/p>
&lt;p>With a set of do-files, all you have to do is correct the variable
early in the code, and re-run everything. If your code is quick, it
will take just a few minutes. Easy!&lt;/p>
&lt;p>An added bonus is that having do-files makes it very easy to fix
your typos, re-order commands, and create more complicated chains of
commands that wouldn&amp;rsquo;t work otherwise. You can now quickly reproduce
your work, correct it, adjust it, and build on it.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Log your results.&lt;/strong> Maintaining logs can help you quickly retrieve
results and serve as a record of past work in case you accidentally
overwrite commands. Logs contain the commands &lt;em>and&lt;/em> the results.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Never overwrite your original files.&lt;/strong> A good do-file structure
starts with your original, raw data, then cleans and analyzes it to
get your final results. A &amp;ldquo;master&amp;rdquo; do-file can piece all these
together.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Replicability is key.&lt;/strong> Your code should be replicable to someone
else who picks up your raw files and code.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Comment, comment, comment!&lt;/strong> Clear commenting is essential to help
others understand your code and to remember what you did.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="finding-new-commands">Finding new commands&lt;/h3>
&lt;p>One of the strengths of Stata is that complicated processes can be
completed with simple commands. One of its weaknesses is that it&amp;rsquo;s not
always obvious what those specific commands are. In our problem sets and
your research paper, you will (I promise) have to calculate or estimate
something in a way we haven&amp;rsquo;t covered.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Stata help file: &lt;code>help command&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Search Stata documentation: &lt;code>findit keyword&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Google/ChatGPT the thing you are trying to do&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="lab-2" class="unnumbered">Lab Exercise 2&lt;/h2>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Your written-up answers to exercise questions (1) - (11). This can be typed or written out, then scanned (or photographed). If scanning, please upload as a &lt;code>.pdf&lt;/code>, not a &lt;code>.jpg&lt;/code> or &lt;code>.png&lt;/code>!&lt;/p>
&lt;ul>
&lt;li>Please put your answers in a separate file rather than your do-file. This makes it easier for us! Also, you&amp;rsquo;ll need to include at least one figure, which you cannot paste into a do-file.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>The do-file you&amp;rsquo;ve created that runs this analysis&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A log file that contains the results from this exercise.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="questions">Questions&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>If you haven&amp;rsquo;t yet done so, download our dataset,
&lt;a href="../materials/acs2024_2pct.dta">acs2024_2pct.dta&lt;/a>, and the do-file template
&lt;a href="../materials/econ3500_lab_template.do">&lt;code>econ3500_lab_template.do&lt;/code>&lt;/a>. Move them to your labs folder&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Open &lt;code>econ3500_lab_template.do&lt;/code> and run it. Does it work? Probably not! Fix it until you can run the file from start to finish with no errors.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Drop some variables we don&amp;rsquo;t need right now: &lt;code>gq&lt;/code>, &lt;code>serial&lt;/code>, and &lt;code>hhwt&lt;/code>. How
many variables remain?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the age distribution of the sample? Specifically, report the
mean, median, minimum, and maximum age of the sample.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Because very young workers might still be in school, drop anyone in
your sample who is less than 23 years old (maintain this sample
restriction for the rest of the lab). How many people are left in
your sample?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Generate a new variable, &lt;code>lt35&lt;/code>, that is equal to one if a person is
less than 35 years old and 0 otherwise. What is the mean of &lt;code>lt35&lt;/code> and
what is its interpretation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using the &lt;code>tabstat&lt;/code> command, find the average income and wages for hose under age 35 and those at least age 35. How does it compare to the &lt;em>median&lt;/em> income and wages for each group?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Using the &lt;code>tabstat&lt;/code> command, find the average income and wages for men and women.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>There are several reasons why men might earn more than women. Suppose you hypothesized that men have completed more education than women, and workers with higher education levels earn more. We will test this in two ways.&lt;/p>
&lt;p>a. First, generate a variable equal to one if a person has completed at least some post-secondary education, and zero otherwise. What is the mean of this variable?&lt;/p>
&lt;p>b. What share of men have at least some post-secondary education? What about women?&lt;/p>
&lt;p>c. We can also see if gender-wage gaps are bigger for lower vs. higher-educated workers. For those without post-secondary education, what is the average wage gap? For those with post-secondary education, what is the average wage gap?&lt;/p>
&lt;p>d. Use the &lt;code>lt35&lt;/code> indicator you already created to compare the gender wage gap for younger workers (under 35) and older workers (35 and over). Does the gap appear larger in one age group? What might that tell you about experience or life-cycle effects?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Name &lt;strong>two&lt;/strong> additional reasons that may explain why men&amp;rsquo;s income is higher than women&amp;rsquo;s income on average. How would you test each one? &lt;em>You do not have to actually do this test, just describe in as much detail as possible. You can assume you have additional data beyond what is provided here.&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make two histograms, one of the income distribution for men and one of the income distribution for women. Make sure the y-axis indicates the &amp;ldquo;fraction&amp;rdquo; of individuals, not the density. Copy and paste it into your responses.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="video-recording">Video Recording&lt;/h2>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/GBUxGhv8DjA" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div></description></item><item><title>Problem set 1</title><link>https://econ3500s26.netlify.app/assignment/01-ps/</link><pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/01-ps/</guid><description>&lt;h2 id="welcome">Welcome&lt;/h2>
&lt;p>Welcome to Problem Set 1! There are some &amp;ldquo;problem&amp;rdquo; exercises and one extended Stata exercise. You&amp;rsquo;ll need to submit two things on Brightspace: a problem set and log file. If you have trouble with the Stata basics, head back to
&lt;a href="../01-lab/">Lab 1&lt;/a>.&lt;/p>
&lt;blockquote>
&lt;p>&lt;em>Tip: If, after doing these problems, you still want more practice, the odd-numbered Exercises (not Empirical Exercises) in Chapters 2 and 3 of Stock and Watson are quite useful, and solutions are available online.&lt;/em>&lt;/p>
&lt;/blockquote>
&lt;p>Note that there are superscripted numbers&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup> throughout the page that provide additional information/suggestions to help you out.&lt;/p>
&lt;h2 id="exercises">Exercises&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:right">&lt;/th>
&lt;th style="text-align:right">&lt;/th>
&lt;th style="text-align:right">&lt;/th>
&lt;th style="text-align:right">&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:right">X&lt;/td>
&lt;td style="text-align:right">-1&lt;/td>
&lt;td style="text-align:right">0&lt;/td>
&lt;td style="text-align:right">1&lt;/td>
&lt;td style="text-align:right">4&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right">$P(X=x)$&lt;/td>
&lt;td style="text-align:right">0.25&lt;/td>
&lt;td style="text-align:right">0.30&lt;/td>
&lt;td style="text-align:right">0.40&lt;/td>
&lt;td style="text-align:right">0.05&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ol>
&lt;li>
&lt;p>Consider the above random variable, $X$, with its associated
probability distribution:&lt;/p>
&lt;p>a. &lt;em>Draw&lt;/em> the probability distribution function and the cumulative
distribution function. (That is, you should make a figure/graph!)&lt;/p>
&lt;p>b. What is the expected value of X? That is, what is $E[X]$?&lt;/p>
&lt;p>c. What is the variance of X?&lt;/p>
&lt;!-- 2.6 -->
&lt;/li>
&lt;li>
&lt;p>The following table gives the joint probability distribution between employment status and college graduation among those either employed or looking for work (unemployed) in the working-age U.S. population for 2017:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Unemployed ($Y=0$)&lt;/th>
&lt;th>Employed ($Y=1$)&lt;/th>
&lt;th>Total&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Non-college grads ($X = 0$)&lt;/td>
&lt;td>0.026&lt;/td>
&lt;td>0.576&lt;/td>
&lt;td>0.602&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>College grads ($X = 1$)&lt;/td>
&lt;td>0.009&lt;/td>
&lt;td>0.389&lt;/td>
&lt;td>0.398&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Total&lt;/strong>&lt;/td>
&lt;td>0.035&lt;/td>
&lt;td>0.965&lt;/td>
&lt;td>1.000&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>a. Compute $E[Y]$.&lt;/p>
&lt;p>b. The unemployment rate is the fraction of the labor force that is unemployed. Show that the unemployment rate is given by $1 - E[Y]$.&lt;/p>
&lt;p>c. Calculate $E[Y|X=1]$ and $E[Y|X=0]$.&lt;/p>
&lt;p>d. Calculate the unemployment rate for college graduates and for non-college graduates&lt;/p>
&lt;p>e. A randomly selected member of this population reports being unemployed. What is the probability that this worker is a college graduate? A non-college graduate?&lt;/p>
&lt;p>​ f. Are educational achievement and employment status independent? Explain.&lt;/p>
&lt;!-- 2.10 -->
&lt;/li>
&lt;li>
&lt;p>Compute the following probabilities&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>:&lt;/p>
&lt;p>a. If $Y$ is distributed $N(1,4)$, find $Pr(Y \leq 3)$&lt;/p>
&lt;p>b. If $Y$ is distributed $N(3,9)$, find $Pr(Y &amp;gt;0 )$&lt;/p>
&lt;p>c. If $Y$ is distributed $N(50,25)$, find $Pr(40 \leq Y \leq 52)$&lt;/p>
&lt;p>d. If $Y$ is distributed $N(5,2)$, find $Pr(6 \leq Y \leq 8)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For a randomly selected county in the United States, let $X$
represent the proportion of adults over age 65 who are employed (the
elderly employment rate). Then, $X$ is restricted to a value between
zero and one. Suppose that the cumulative distribution function for
$X$ is given by $F(x) = 3x^2 - 2x^3$ for $0 \leq x \leq 1$.&lt;/p>
&lt;p>a. What is the probability that the elderly employment rate is at
least 0.5 (50%)?&lt;/p>
&lt;p>b. What is the probability that the elderly employment rate is
between 0.4 (40%) and 0.6 (60%)?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- 2.18 -->
&lt;ol start="5">
&lt;li>
&lt;p>In any year, the weather can inflict storm damage to a home. From year to year, the damage is random. Let $Y$ denote the dollar value of damage in any given year. Suppose that in 95% of the years, there is no damage ($Y=0$), but that in 5% of the years, $Y = 20000$.&lt;/p>
&lt;p>a. What are the mean and standard deviation of the damage in any year?&lt;/p>
&lt;p>b. Consider an &amp;ldquo;insurance pool&amp;rdquo; of 100 people whose homes are sufficiently dispersed so that, in any year, the damange to different homes can be viewed as inddependently distributed random variables. Let $\bar{Y}$ denote the average damage to these 100 homes in a year (i) What is $E[\bar{Y}]$? (i) What is the probability that $\bar{Y}$ exceeds $2000?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!--3.16 -->
&lt;ol start="6">
&lt;li>
&lt;p>Grades on a standardized test are known to have a mean of 1000 for students in the United States. The test is administered to 453 randomly selected students in Florida; in this sample, the mean is 1013 and the standard deviation ($s$) is 108.&lt;/p>
&lt;p>a. Construct a 95% confidence interval for the average test score for Florida students&lt;/p>
&lt;p>b. Is there statistically significant evidence that Florida students perform differently than other students in the United States? How do you know?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- c. Another 503 students are selected at random from Florida. They are given a 3-hour preparation course before the test is administered Their average test scores is 1019 with a standard deviation of 95.
(i) First, construct a 95% confidence interval for the **change** in the average test score associated with the prep course.
(ii) Is there statistically significant evidence that the prep course helped?
-->
&lt;p>&lt;strong>For the following question, make sure you submit your log-file alongside your answers!&lt;/strong>&lt;/p>
&lt;ol start="7">
&lt;li>
&lt;p>Download
&lt;a href="../countymurders.dta">&lt;code>countymurders.dta&lt;/code>&lt;/a> to answer this question.&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup> The variable $murders$ is the number of murders reported in the county. The variable $execs$ is the number of executions that took place of people sentenced to death in the given country. Most states in the United States have the death penalty, but several do not.&lt;/p>
&lt;p>a. Keep only data from the year 1996. How many counties are
there in the data set? Of these, how many have zero murders.
What percentage of countries have zero executions?&lt;/p>
&lt;p>b. What is the largest number of murders in a county? What is the
largest number of executions in a county?&lt;/p>
&lt;p>c. Compute the correlation coefficient $r$ between &lt;code>murders&lt;/code> and &lt;code>execs&lt;/code> and describe what you find.&lt;sup id="fnref:4">&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref">4&lt;/a>&lt;/sup> Estimate the correlation coefficient between &lt;code>murdrate&lt;/code> and &lt;code>execrate&lt;/code>. Why do the two coefficients differ so much?&lt;/p>
&lt;p>d. What are two characteristics in the data that are highly correlated with county murder rates?&lt;sup id="fnref:5">&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref">5&lt;/a>&lt;/sup> What are their correlation coefficients?&lt;/p>
&lt;p>e. What is median real per-capita income?&lt;sup id="fnref:6">&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref">6&lt;/a>&lt;/sup>&lt;/p>
&lt;p>f. Generate a variable, &lt;code>highinc&lt;/code> that is equal to 1 if a county
has above-median real per capita income, and 0 otherwise. What
is $E[rpcpersinc | highinc =0]$? What is $E[rpcpersinc | highinc =1]$?&lt;/p>
&lt;p>g. Consider a two-sided hypothesis test of whether murder
rates are different between counties with high (above median) vs
low (below median) real per-capita personal income. Assume the
two samples are independent, with equal variances.
a. First, write a null and alternative hypothesis
b. Use Stata to conduct the hypothesis test. What is the relevant t-statistic?&lt;sup id="fnref:7">&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref">7&lt;/a>&lt;/sup>
c. Can you reject the null hypothesis at the 5% level?&lt;/p>
&lt;p>h. Generate a variable, &lt;code>perc1029&lt;/code>, that is equal to the share of
the population between the ages of 10 and 29. What is the median
share of the population by county that is ages &lt;strong>10-29&lt;/strong>?&lt;/p>
&lt;p>i. Generate a variable, &lt;code>young&lt;/code>, that is equal to 1 if a county has
an above-median share of the population that is age 10-29, and 0
otherwise. What is $E[perc1029| young = 0]$? What is
$E[perc1029| young =1]$?&lt;/p>
&lt;p>j. Consider a two-sided hypothesis test of whether murder rates are
different between states with a high (above-median) share of the
population ages &lt;strong>10-29&lt;/strong> versus a low share. Assume the two
samples are independent, with equal variances.
a. First, write a null and alternative hypothesis
b. Use Stata to conduct the hypothesis test. What is the relevant t-statistic?
c. Can you reject the null hypothesis at the 5% level?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="sources" class="unnumbered">Sources&lt;/h2>
&lt;p>
&lt;a href="../countymurders.dta">&lt;code>countymurders.dta&lt;/code>&lt;/a>&lt;/p>
&lt;p>&lt;em>Source: Compiled by J. Monroe Gamble for a Summer Research Opportunities
Program (SROP) at Michigan State University, Summer 2014. Monroe
obtained data from the U.S. Census Bureau, the FBI Uniform Crime
Reports, and the Death Penalty Information Center.&lt;/em>&lt;/p>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>See?! Neat! :)&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>Remember that we conventionally write $N(\mu,\sigma^2)$, so the second term is the &lt;em>variance&lt;/em>, not the standard deviation.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>Remember to move this file into your
&lt;a href="../01-lab/#working-directories-important">working directory&lt;/a>!&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:4" role="doc-endnote">
&lt;p>Remember, you can use &lt;code>correlate var1 var2&lt;/code> to look at the correlation between two variables.&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:5" role="doc-endnote">
&lt;p>If you want to look at the correlation between lots of variables, you can use &lt;code>correlate var1 var2 ... var99&lt;/code>. If you want to refer to a lot of variables, an asterisk (*) can act as a &amp;ldquo;wild.&amp;rdquo; So if you use &lt;code>correlate var*&lt;/code>, you&amp;rsquo;ll receive a correlation matrix of every variable with a name that starts with &amp;ldquo;var.&amp;rdquo; If you use &lt;code>correlate *var*&lt;/code>, it will give you a correlation matrix of every variable with the letters &amp;ldquo;var&amp;rdquo; somewhere in the name.&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:6" role="doc-endnote">
&lt;p>&lt;code>tabstat&lt;/code> is your friend!&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:7" role="doc-endnote">
&lt;p>The help file for &lt;code>ttest&lt;/code> will be useful. Here we are conducting a two-sample t-test using groups. You will want to use the &lt;code>highinc&lt;/code> variable you generated earlier.&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Lab 1: Introduction to Stata</title><link>https://econ3500s26.netlify.app/assignment/01-lab/</link><pubDate>Tue, 13 Jan 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/01-lab/</guid><description>&lt;p>
&lt;a href="../01-lab.pdf">&lt;strong>Download print-friendly version (pdf)&lt;/strong>&lt;/a>&lt;/p>
&lt;h2 id="materials" class="unnumbered">Materials&lt;/h2>
&lt;ul>
&lt;li>
&lt;a href="../materials/driving_2004.dta">&lt;code>driving_2004.dta&lt;/code>&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="objectives" class="unnumbered">Objectives&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>&lt;/h2>
&lt;p>By the end of this tutorial you should be able to complete the following
tasks in Stata:&lt;/p>
&lt;ul>
&lt;li>Identify key areas of the Stata interface&lt;/li>
&lt;li>Open a data file&lt;/li>
&lt;li>Understand what a working directory is&lt;/li>
&lt;li>Summarize and tabulate data&lt;/li>
&lt;li>Make a variable&lt;/li>
&lt;li>Create and save a log file&lt;/li>
&lt;li>Open, view, and save a data file&lt;/li>
&lt;li>Ask Stata for help&lt;/li>
&lt;/ul>
&lt;p>If you need more help, check out
&lt;a href="https://econ3500s26.netlify.app/bonus/stata-resources">Stata Resources&lt;/a>.&lt;/p>
&lt;p>&lt;em>For the hardcore R users in this class who prefer to use R throughout, you may complete this lab in R. But, it could also be fun to learn a little Stata!&lt;/em>&lt;/p>
&lt;h2 id="general-command-structure">General command structure&lt;/h2>
&lt;p>&lt;code>do {something} ... with {variable(s) x}...if {something is true..}, options&lt;/code>&lt;/p>
&lt;p>&lt;strong>Note:&lt;/strong> In this lab, you may type commands directly into the Command window. Later in the course, we will use &lt;em>do-files&lt;/em>, which allow you to save and rerun your code. For now, your log file will serve as a record of your work.&lt;/p>
&lt;h3 id="key-commands" class="unnumbered">Key commands &lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">command&lt;/th>
&lt;th style="text-align:right">description&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">&lt;code>log using logfile1.log&lt;/code>&lt;/td>
&lt;td style="text-align:right">open and log using &lt;code>logfile1.log&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>log close&lt;/code>&lt;/td>
&lt;td style="text-align:right">close log&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>use dataset.dta, clear&lt;/code>&lt;/td>
&lt;td style="text-align:right">open dataset &lt;code>dataset.dta &lt;/code>, clear out old one&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>describe var1 var2 ... &lt;/code>&lt;/td>
&lt;td style="text-align:right">charcteristics of &lt;code>var1&lt;/code>, &lt;code>var2&lt;/code>, etc.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>browse var1 var2 ... &lt;/code>&lt;/td>
&lt;td style="text-align:right">open data browser, display &lt;code>var1, var2 .. &lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>lookfor text1 &lt;/code>&lt;/td>
&lt;td style="text-align:right">search for &lt;em>text1&lt;/em> in variable names/descriptions&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tabulate var1 &lt;/code>&lt;/td>
&lt;td style="text-align:right">make a frequency table of &lt;code>var1&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>tabulate var1 var2&lt;/code>&lt;/td>
&lt;td style="text-align:right">make a cross-tabulation of &lt;code>var1&lt;/code> and &lt;code> var2&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>summarize var1 &lt;/code>&lt;/td>
&lt;td style="text-align:right">descriptive statistics for &lt;code>var1&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>summarize var1 , detail&lt;/code>&lt;/td>
&lt;td style="text-align:right">detailed descriptive statistics for &lt;code>var1&lt;/code>.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>gen var1 = binexp&lt;/code>&lt;/td>
&lt;td style="text-align:right">generates a variable &lt;code>var1&lt;/code> equal to 1 if binary expression true, 0 otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>replace var1 = 0 if binexp&lt;/code>&lt;/td>
&lt;td style="text-align:right">replaces &lt;code>var1&lt;/code> to be 0 if binary expression true, nothing otherwise&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;code>help command &lt;/code>&lt;/td>
&lt;td style="text-align:right">open help files for &lt;code>command&lt;/code>.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="logic-statements" class="unnumbered">Logic statements &lt;/h3>
&lt;p>These are some common logical statements&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">operation&lt;/th>
&lt;th style="text-align:center">command&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">and&lt;/td>
&lt;td style="text-align:center">&amp;amp;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">or&lt;/td>
&lt;td style="text-align:center">| &lt;br> (vertical bar, on same key as &amp;ldquo;/&amp;quot;)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">equal to&lt;/td>
&lt;td style="text-align:center">==&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">not equal to&lt;/td>
&lt;td style="text-align:center">!=&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">greater than&lt;/td>
&lt;td style="text-align:center">&amp;gt;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">less than&lt;/td>
&lt;td style="text-align:center">&amp;gt;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">greater than or equal to&lt;/td>
&lt;td style="text-align:center">&amp;gt;=&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">less than or equal to&lt;/td>
&lt;td style="text-align:center">&amp;lt;=&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;ul>
&lt;li>
&lt;p>&lt;code>tab bac10 if gdl==1 &amp;amp; sl70plus == 0&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Tabulates the variable &lt;code>bac10&lt;/code> but only if &lt;code>gdl&lt;/code> equals one &lt;em>and&lt;/em> &lt;code>sl70plus&lt;/code> equals 0&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>tab bac10 if year &amp;gt;=2000&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Tabulates the variable &lt;code>bac10&lt;/code> for the years 2000, 2001, 2002, etc.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>tab bac10 if year !=2000&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>Tabulates the variable &lt;code>bac10&lt;/code> for every year &lt;em>but&lt;/em> 2000&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>tab bac10 if year &amp;lt; 2008 &amp;amp; year &amp;gt; 2005&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Tabulates the variable &lt;code>bac10&lt;/code> 2006 and 2007&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>tab bac10 if year &amp;lt; 2008 | year &amp;gt; 2005&lt;/code>&lt;/p>
&lt;ul>
&lt;li>Tabulates the variable &lt;code>bac10&lt;/code> is less than 2008 OR greater than 2005 (all years!)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Also:&lt;/strong> you can use parentheses to group terms appropriately. For example, if you want to tabulate states where the speed limit is 55 or 65 AND the blood alcohol limit is 0.10, then this is wrong:&lt;/p>
&lt;p>&lt;code>tab state if sl55 == 1 | sl65 == 1 &amp;amp; bac10 == 1&lt;/code>&lt;/p>
&lt;p>But this is correct!&lt;/p>
&lt;p>&lt;code>tab state if (sl55 == 1 | sl65 == 1) &amp;amp; bac10 == 1&lt;/code>&lt;/p>
&lt;p>Thanks, parentheses!&lt;/p>
&lt;h2 id="guided-instructions">Guided instructions&lt;/h2>
&lt;h3 id="hey-stata-its-nice-to-meet-you">Hey, Stata. It&amp;rsquo;s nice to meet you&lt;/h3>
&lt;p>Start by opening Stata. You should have a window that looks something
like this (on a PC):&lt;/p>
&lt;p>&lt;img src="https://econ3500s26.netlify.app/img/stata1.png" alt="">&lt;/p>
&lt;p>You should now have the Stata window open. There is a set of pull down
menus as well as 4 smaller windows: Review, Variables, Results, and
Command.&lt;/p>
&lt;p>&lt;img src="https://econ3500s26.netlify.app/img/stata2.png" alt="">&lt;/p>
&lt;p>Also especially helpful are the following buttons:&lt;/p>
&lt;h3 id="log-files">Log files&lt;/h3>
&lt;p>If you want to record anything that you do in a Stata session so that
you can look at results or commands later, you need to open a log-file.
A log-file is simply a record of all the commands you enter into Stata
and the output from those commands. The key is to make sure you have a
log file open at the beginning of a Stata session, and to close it once
you have finished, and before you close Stata.&lt;/p>
&lt;p>There are three ways you can open a log file:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Go to the &lt;strong>FILE&lt;/strong> dropdown menu, choose &lt;strong>Log&lt;/strong>, choose &lt;strong>Begin&lt;/strong>.
You should see a &amp;ldquo;Begin Logging Stata Output&amp;rdquo; dialog box. Browse to
a directory where you can store your log file and type in the
following file name in the File Name space: &lt;code>lab1.log&lt;/code>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Click on the log icon at the top of the Stata workspace (right of
the print button). When you click on the log button, the &amp;ldquo;Begin
Logging Stata Output&amp;rdquo; dialog box pops up. Name your log file as
above.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>You can open a log file by typing the following in the Stata command
window: &lt;code>log using lab1.log, replace&lt;/code>&lt;/p>
&lt;p>The &lt;code>, replace&lt;/code> is optional. If you add it as an &lt;strong>option&lt;/strong>, your
new file will overwrite your old one. Or, you can add &lt;code>, append&lt;/code> to
add it to the bottom of your old log file.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>&lt;em>Tip: Use extension .log, NOT the default .smcl. This will make it
easier for you to edit, cut and paste your log in any text editor.&lt;/em>&lt;/p>
&lt;p>Now that you have a log file open, we can start our STATA session.&lt;/p>
&lt;h3 id="working-directories-important">Working directories (important!)&lt;/h3>
&lt;p>Stata looks for data files in its &lt;em>working directory&lt;/em>. To see your current
working directory, type:&lt;/p>
&lt;p>&lt;code>pwd&lt;/code>&lt;/p>
&lt;p>If your data file is not located there, Stata will not find it. You can
change your working directory using:&lt;/p>
&lt;p>&lt;code>cd &amp;quot;/path/to/your/folder&amp;quot;&lt;/code>&lt;/p>
&lt;p>Once you run this command, Stata will make this working directory its starting point, but only for the rest of the session. The next time you open Stata, you may need to repeat the process.&lt;/p>
&lt;p>I recommend creating a folder for this class (e.g., &lt;code>econ3500/labs/&lt;/code>) and saving both your data and log files there.&lt;/p>
&lt;h3 id="opening-data-files">Opening data files&lt;/h3>
&lt;p>Stata data files end with the extension &lt;code>.dta&lt;/code>, and they can only be
read by Stata. You can import text files and excel files into Stata, and
you can export &lt;code>.dta&lt;/code> files into text files or Excel files, but we&amp;rsquo;ll
cover this later.&lt;/p>
&lt;p>There are three ways to open a data file:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Outside Stata, double click on the data file you want to open&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use the &lt;strong>FILE/OPEN&lt;/strong> drop down menu in Stata and open the data set
that you copied into your folder. Note that in
the command window, the &lt;code>use&lt;/code> command appears. We&amp;rsquo;ll use that one
later.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Type &lt;code>use filename.dta, clear&lt;/code> into the command window within Stata. The option &lt;code>, clear&lt;/code> tells Stata to remove any data currently in memory.
Stata can only hold one dataset at a time.&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Download
&lt;a href="../materials/driving_2004.dta">&lt;code>driving_2004.dta&lt;/code>&lt;/a> and open it. I recommend moving it to your brand new class folder first. It is a
file of driving laws, vehicle accidents, and fatalities in the United
States in 2004.&lt;/p>
&lt;p>You should now see the list of variables appear in the Variables window,
with the variable name, variable label, and some other information.&lt;/p>
&lt;h3 id="looking-at-data">Looking at data&lt;/h3>
&lt;p>Let&amp;rsquo;s take a more detailed look at the variables in the dataset.&lt;/p>
&lt;p>In the command window type: &lt;code>describe&lt;/code>&lt;/p>
&lt;p>At the top of the output you will see some overall features of the file,
including the number of variables. Below that you will see a list of
every variable, including the variable name, the &amp;ldquo;storage type&amp;rdquo; (byte,
float, int, etc.) and the variable label. If you see &lt;code>–more–&lt;/code> at the
bottom of your screen, press the space bar to continue scrolling.&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>&lt;/p>
&lt;p>To learn more about the variables and the organization of the data, use
the &lt;code>browse&lt;/code> command. Type: &lt;code>browse&lt;/code> (or click on the &amp;ldquo;browse&amp;rdquo; button).&lt;/p>
&lt;p>Another approach is to add a variable list to the browse command. Type
the following:&lt;/p>
&lt;p>&lt;code>browse year sl70plus bac10 bac08 gdl&lt;/code>&lt;/p>
&lt;p>&lt;em>Again, note that you can also double click on the variable names so you
don&amp;rsquo;t have to type them all!&lt;/em>&lt;/p>
&lt;p>This command directs you to a spreadsheet inside Stata where the data
appears. This looks a lot like an Excel spreadsheet!&lt;/p>
&lt;p>Note the following:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Each observation appears on a separate row of the spreadsheet, which
represents data from a certain year and a certain state. For example
the first row is for state 1 (Alabama) in 1980. If you move along
the row, you can see other characteristics about Alabama in 1980.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each variable appears in a separate column, and the name of the
variable is at the column heading.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>How many observations are there? What type of data set is this?&lt;/p>
&lt;/blockquote>
&lt;h3 id="examining-variables">Examining variables&lt;/h3>
&lt;p>Let&amp;rsquo;s look at the variables that are included in the data set. There is
an efficient way to find the names of variables you are interested in.
Suppose you are interested in a variable related to alcohol laws. Type
in:&lt;/p>
&lt;p>&lt;code>lookfor alcohol&lt;/code>&lt;/p>
&lt;p>This will give you a list of all the variables that have &amp;ldquo;alcohol&amp;rdquo; in
either their variable name or variable label. In this case, two
variables appear - &lt;code>bac10&lt;/code> and &lt;code>bac08&lt;/code>.&lt;/p>
&lt;p>You can also experiment with all possible combinations of the col, row,
and cell options, and add the &lt;code>nofreq&lt;/code> option to suppress the number of
observations. Use help for details:&lt;/p>
&lt;p>&lt;code>help tab&lt;/code>&lt;/p>
&lt;p>When you are analyzing variables, you will want to think carefully about
whether you should be looking at row percentages, column percentages, or
cell percentages.&lt;/p>
&lt;h3 id="creating-new-variables">Creating new variables&lt;/h3>
&lt;p>You can create new variables using the &lt;code>generate&lt;/code> command. For example:&lt;/p>
&lt;p>&lt;code>gen highfatal = fatal_rate &amp;gt; 1.5&lt;/code>&lt;/p>
&lt;p>This creates a variable equal to 1 when the condition is true, and 0 otherwise.&lt;/p>
&lt;p>You could create the same variable in a slightly different way:&lt;/p>
&lt;pre tabindex="0">&lt;code class="language-stata" data-lang="stata">gen highfatal = 1 if fatal_rate &amp;gt; 1.5
replace highfatal = 0 if fatal_rate &amp;lt; 1.5
&lt;/code>&lt;/pre>&lt;h2 id="videos">Videos&lt;/h2>
&lt;p>
&lt;a href="https://youtu.be/uze1JNmFTq8" target="_blank" rel="noopener">Lab 1 Overview&lt;/a>&lt;/p>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe src="https://www.youtube.com/embed/uze1JNmFTq8" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video">&lt;/iframe>
&lt;/div>
&lt;h2 id="lab-1" class="unnumbered">Lab Exercise 1&lt;/h2>
&lt;p>&lt;em>First, work through the above steps. Then, work through the 7 questions below.&lt;/em>&lt;/p>
&lt;h3 id="what-do-i-submit">What do I submit?&lt;/h3>
&lt;ol>
&lt;li>Your written up answers to exercise questions (1) - (7). This can be typed or written out then scanned (or photographed), in any reasonable format&lt;/li>
&lt;li>A log file that contains the results from the steps prior to the exercise &lt;em>and&lt;/em> the exercise itself.&lt;/li>
&lt;/ol>
&lt;p>&lt;em>If you struggled or explored, this might get excessively long! Three choices (1) submit it anyways, (2) open it in a text editor manually delete the nonsense, (3) close your log file and start a new one, and this time run through your code with less backtracking. Option (1) is completely fine.&lt;/em>&lt;/p>
&lt;h3 id="questions">Questions&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>How many states have graduated drivers license laws (GDLs)? How many
states have speed limits of 70 mph or higher (including no speed
limit)?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What percentage of states with GDLs &lt;em>and&lt;/em> with low speed limits
(&lt;em>below&lt;/em> 70 mph) have blood-alcohol limits of 0.10 (the more lenient
level)? &lt;em>Note that some states have blood-alcohol limit for a
fraction of a year. If so, consider having a limit of 0.10 in place
for part of the year as having a limit&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the mean fatality rate per 100 million miles across all
states? What is the standard deviation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What was the fatality rate (deaths per 100 million miles) in
Vermont? (Vermont is state 46)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Generate a variable $Y$ equal to one if a state has a fatality rate
per 100 million miles that is above the mean, and zero otherwise.
What is $E(Y)$?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Write a joint probability distribution table for the following two
random variables: $X$, a random variable equal to one if a state has
a speed limit of 70 or greater and zero otherwise (see &lt;code>sl70plus&lt;/code>),
and $Y$, the random variable developed in the previous part.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Look up the command &lt;code>correlate&lt;/code> in the help files&lt;/em>: What is the
correlation coefficient between nighttime fatalities per 100,000
population and weekend accidents per 100,000 population? Why might
this correlation be so strong?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>This lab draws heavily on Anne Fitzpatrick&amp;rsquo;s (UMass-Boston) excellent materials.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>Yes, this is a pain.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>If you are tired of dealing with the &amp;ldquo;more&amp;rdquo; issue, you can enable &lt;code>set more off&lt;/code> into the command window to enable continuous scrolling for your session. If you&amp;rsquo;re just done with it, try &lt;code>set more off, perm&lt;/code> to enable continuous scrolling for this and all future sessions.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Research Paper: Idea Proposal</title><link>https://econ3500s26.netlify.app/assignment/rp-02-ideas/</link><pubDate>Sun, 11 Jan 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-02-ideas/</guid><description>&lt;h2 id="the-assignment">The assignment&lt;/h2>
&lt;ol>
&lt;li>Come up with three research ideas and give me a bit of detail. It should include the following, and will likely span 2-3 paragraphs per research idea:&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>A research question&lt;/li>
&lt;li>A hypothesis/hypotheses about that research question&lt;/li>
&lt;li>Proposed data set (within the
&lt;a href="https://www.ipums.org/" target="_blank" rel="noopener">IPUMS&lt;/a> universe).&lt;/li>
&lt;li>For example: &amp;ldquo;I will use the 1990 and 2000 U.S. Census, focusing on the Northeast.&amp;rdquo; But not: &amp;ldquo;CPS.&amp;rdquo;&lt;/li>
&lt;li>Happy for you to propose a non-IPUMS research question, but you must include at least &lt;strong>one&lt;/strong> IPUMS-based question.&lt;/li>
&lt;li>A rough plan of analysis. How will you answer your research question? What key varialbes will be important?&lt;/li>
&lt;/ul>
&lt;p>All ideas should meet the basic criteria in our
&lt;a href="../RP-01#topic-selection">assignment overview&lt;/a> (relevant to economic theory, answerable using data, use cross-sectional or panel data)&lt;/p>
&lt;!--
2. [Set up a meeting using Calendly](https://calendly.com/emily-a-beam/15min) to meet with me to discuss your idea proposal. We need to meet within one week of the deadline (**October 7 at the latest**).
-->
&lt;h2 id="coming-up-with-an-idea">Coming up with an idea&lt;/h2>
&lt;p>For some of you, this may be &lt;strong>the best.&lt;/strong> For me, being told, &amp;ldquo;think of an idea &amp;hellip; go!&amp;rdquo; is the WORST. That&amp;rsquo;s okay!&lt;/p>
&lt;p>So here&amp;rsquo;s some advice.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Open up
&lt;a href="https://www.ipums.org/" target="_blank" rel="noopener">IPUMS&lt;/a> and start digging through variables and datasets. What looks interesting?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Read Nick Huntington-Klein&amp;rsquo;s
&lt;a href="https://theeffectbook.net/ch-ResearchQuestions.html" target="_blank" rel="noopener">excellent chapter on research questions&lt;/a>^[The
&lt;a href="https://theeffectbook.net/" target="_blank" rel="noopener">rest&lt;/a> of it is great, too!]&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Jot down every question you can think of. If you need inspiration, open up a newspaper of your choice. I highly recommend the
&lt;a href="https://www.nytimes.com/section/upshot" target="_blank" rel="noopener">NYTimes Upshot&lt;/a>, which has lots of data-driven, economics-linked questions.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;!-- 3. Browse through [data set possibilities](../../bonus/rp-datasets/) and see if they spark joy -->
&lt;ol start="4">
&lt;li>Start with some
&lt;a href="../../bonus/rp-resources#suggested-topics">suggested research ideas&lt;/a> and iterate from there.&lt;/li>
&lt;/ol>
&lt;h2 id="other-questions">Other questions?&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>*So I have a paper from another class &amp;hellip; * If you&amp;rsquo;ve worked on a topic for another class, that&amp;rsquo;s fine to keep working on it here. However, it&amp;rsquo;s important that the work you do for EC200 be original and an additional contribution beyond what the work for your other class. So same topic = okay, but same paper = not okay. Reach out if it would help to talk more.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>If I have a partner, what does this mean&lt;/em> If working with a partner, all assignments will be submitted jointly (aside from the referee report). So just one person needs to submit.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="rubric">Rubric&lt;/h2>
&lt;p>You will be graded on four criteria for each question (8 points per idea, 24 points total):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:right">&lt;/th>
&lt;th style="text-align:center">Does not meet&lt;/th>
&lt;th style="text-align:center">Partially meets&lt;/th>
&lt;th style="text-align:center">Fully meets&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:right">Research question is clearly stated, specific, and answerable&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right">Hypothesis proposed and explained&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right">Feasible data set(s) explicitly identified&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:right">Rough description of how will test hypothesis to answer research question&lt;/td>
&lt;td style="text-align:center">0&lt;/td>
&lt;td style="text-align:center">1&lt;/td>
&lt;td style="text-align:center">2&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>Research Paper: Overview</title><link>https://econ3500s26.netlify.app/assignment/rp-01/</link><pubDate>Tue, 06 Jan 2026 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/rp-01/</guid><description>&lt;p>One main product of the course is an original research paper you&amp;rsquo;ll
produce that incorporates econometric data using the methods we&amp;rsquo;ve
learned in class.&lt;/p>
&lt;p>&lt;strong>You can work alone or in pairs.^[No groups of three!]&lt;/strong>&lt;/p>
&lt;h2 id="learning-objectives" class="unnumbered">Learning objectives&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Develop clear, answerable research questions and link them to
economic theory.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Identify and apply appropriate econometric methods to answer
research question, recognizing necessary assumptions and limitations&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Conduct and interpret original data analysis using Stata&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Strengthen written and oral communication skills&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="overview" class="unnumbered">Overview of requirements&lt;/h2>
&lt;p>All the nitty-gritty is below, but here&amp;rsquo;s a general sense of what I&amp;rsquo;ll ask you to do:&lt;/p>
&lt;ul>
&lt;li>You will write a journal-style paper in which you ask and answer an economic question, relying on regression analysis that you conduct with cross-sectional or panel data.&lt;/li>
&lt;li>You&amp;rsquo;ll apply the various econometric techniques we&amp;rsquo;ve worked on throughout the semester.&lt;/li>
&lt;/ul>
&lt;p>The assignment specifications depend on whether you are working alone or in pairs:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&amp;mdash;&lt;/th>
&lt;th>Alone&lt;/th>
&lt;th>Pairs&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Word count&lt;/td>
&lt;td>2500-4500&lt;/td>
&lt;td>3500-5500&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Tables&lt;/td>
&lt;td>3-4&lt;/td>
&lt;td>4-5&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>If you work in pairs, you will submit all assignments jointly, except for your referee report.&lt;/p>
&lt;h2 id="topic-selection" class="unnumbered">Topic selection&lt;/h2>
&lt;p>Select a research question that is interesting to you and answerable
with data that you can obtain. Your question should accomplish the
following.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>It must have clear relevance to economic theory.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It must be answerable using data (with a sample size of at least 100, ideally much (much) higher!)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It may not be an exact replication of previous work. It may, however, be an extension.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It must use &lt;em>cross-sectional&lt;/em> or &lt;em>`panel&lt;/em> data. There are lots of interesting time-series questions, but we will not cover these topics in ECON350. I highly, highly recommend that work with data from
&lt;a href="https://www.ipums.org/" target="_blank" rel="noopener">IPUMS&lt;/a>. I am open to other &lt;strong>large&lt;/strong> data sets ($n&amp;gt;100$, ideally $n&amp;gt;1000$), but these require prior approval.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A strong paper will make a &lt;em>reasonable attempt&lt;/em> at causal identification.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Where folks get into trouble:&lt;/strong> The biggest problem I see is that people get very excited about an interesting question, but they don&amp;rsquo;t necessarily have the data they need. In order for us to be able to have reasonable standard errors and good asymptotic properties, I recommend questions that (a) have a data set you can access and (b) have at least 100 observations. I would strongly discourage you from any country or state cross-sections that don&amp;rsquo;t also have a time component. I also would be wary about studies predicting athlete performance - our methods work well, but the link to economics is often tenuous.&lt;/p>
&lt;h2 id="a-quick-note-on-genai">A quick note on GenAI&lt;/h2>
&lt;p>Now is a good time to refer to our class
&lt;a href="../../syllabus/#gen-ai-policy">Gen AI policy&lt;/a>! &lt;em>TLDR; ChatGPT can be your friend, but no more.&lt;/em>&lt;/p>
&lt;h2 id="research-paper-process" class="unnumbered">Research paper process&lt;/h2>
&lt;h3 id="research-ideas" class="unnumbered">
&lt;a href="../rp-02-ideas">Research ideas&lt;/a>&lt;/h3>
&lt;p>Prepare a set of 3 research ideas. An &amp;ldquo;idea&amp;rdquo; only needs to consist of about 2-3 paragraphs, which should include a research question, a hypothesis, a
proposed data set, and a rough plan of analysis for testing your hypothesis.&lt;/p>
&lt;h3 id="data-preparation" class="unnumbered">
&lt;a href="../rp-03-annotated">Annotated bibliography&lt;/a> &lt;/h3>
&lt;p>To make sure your question ties closely to the economic literature, you&amp;rsquo;ll prepare an annotated bibliography that identifies useful papers and summarizes them in relation to your question.&lt;/p>
&lt;!-- In research, sometimes questions drive the data we choose, and sometimes
the data we have drive the questions we ask. With only one semester to
build a paper, you may need to let the data take the lead.
You'll prepare a [``data abstract''](../data-abstract) that includes your research question, a description of your data set, and a set of summary statistics. -->
&lt;h3 id="research-proposal" class="unnumbered">
&lt;a href="../rp-04-proposal">Research proposal&lt;/a> &lt;/h3>
&lt;p>From the list of topics, choose and develop one research idea for your
&lt;a href="../rp-04-proposal">research proposal&lt;/a>. You&amp;rsquo;ll write up research proposal of at least 1200 words. This proposal should provide as much detail as possible to help me and your classmates assess your plan and provide useful feedback.&lt;/p>
&lt;h3 id="peer-review" class="unnumbered">
&lt;a href="../rp-05-referee">Peer review&lt;/a>&lt;/h3>
&lt;p>A classmate will provide a
&lt;a href="../rp-05-referee">peer review&lt;/a> of your proposal, providing
feedback to help you turn your proposal into a final paper&lt;/p>
&lt;h3 id="rough-draft-optional" class="unnumbered">
&lt;a href="../rp-06-roughdraft">Rough draft (optional)&lt;/a> &lt;/h3>
&lt;p>You may submit one
&lt;a href="../rp-06-roughdraft">rough draft&lt;/a> to me for comments. This is optional, but
I &lt;strong>highly recommend&lt;/strong> you do it, because the early deadline can help
you stay on track, and you&amp;rsquo;ll have a chance to get an early sense of how
things are going.&lt;/p>
&lt;h3 id="presentation" class="unnumbered">Presentation&lt;/h3>
&lt;p>You&amp;rsquo;ll make a brief (6-8 minute)
&lt;a href="../rp-07-presentation">presentation&lt;/a> of your paper in the
final week of class. I will provide specifics later.&lt;/p>
&lt;h3 id="final-submission">Final submission&lt;/h3>
&lt;p>Your
&lt;a href="../rp-08-final-submission">final draft&lt;/a> will be due on May 04. Please make sure to review all the
&lt;a href="#paper-components">requirements&lt;/a> carefully!&lt;/p>
&lt;h2 id="paper-components" class="unnumbered">Paper components &lt;/h2>
&lt;p>A number of excellent guides can help you put together an effective and
interesting research paper. I&amp;rsquo;ve provided a set of
&lt;a href="https://econ3500s26.netlify.app/bonus/paper-resources">paper resources&lt;/a>.&lt;/p>
&lt;ul>
&lt;li>Your paper should include the following elements.
&lt;ul>
&lt;li>
&lt;a href="#abstract">Title and Abstract&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#introduction">Introduction&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#background">Background/literature review&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#methodology">Methodology&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#results%7d">Results&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#conclusion">Conclusion&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#references">References&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#tables">Figures &amp;amp; Tables&lt;/a>&lt;/li>
&lt;li>
&lt;a href="#replication">Replication package&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="abstract" class="unnumbered">Abstract &amp;amp; Title&lt;/h3>
&lt;p>You&amp;rsquo;ll need a title &lt;em>and an abstract&lt;/em>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Descriptive title&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Abstract that summarizes the paper and findings in &lt;strong>250 words or fewer&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="introduction" class="unnumbered">Introduction&lt;/h3>
&lt;p>In an economics paper the &lt;strong>introduction stands alone!&lt;/strong>&lt;/p>
&lt;p>That is, a busy (or tired) person could read the introduction and understand what you did, what you found, and why it matters. Our papers are not mystery novels&amp;mdash;there&amp;rsquo;s no need for a plot twist on page 8!&lt;/p>
&lt;p>I recommended following
&lt;a href="http://blogs.ubc.ca/khead/research/research-advice/formula" target="_blank" rel="noopener">introduction formula&lt;/a>, which is written for folks writing a longer academic paper, but the principles are still solid.&lt;/p>
&lt;h4 id="guidelines-and-structure">Guidelines and structure&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Introduction reads like an academic article. Motivates, describes
what you do and what you find. (Almost like a mini-paper!)&lt;/p>
&lt;ul>
&lt;li>Reader can infer all main points of paper just from introduction&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>States your research question clearly&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Explains what economic theory says about the potential answers to
your questions, and/or defines clear hypotheses that you test&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Describes why your topic is important&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Describes what you do&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Describes what you find&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Describe how it contributes to our knowledge&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="background" class="unnumbered">Background/Literature Review&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>What you include here depends on topic. Sometimes the reader needs
to know how your question links to economic theory. Sometimes it&amp;rsquo;s
more important to know specific context first, and then to turn to
the literature. Sometimes it&amp;rsquo;s most important to summarize what the
literature already knows. Your call.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>At the back of your mind, when motivating your paper, ask &amp;ldquo;what is
the link to economics&amp;rdquo;?&lt;/p>
&lt;ul>
&lt;li>
&lt;p>If studying discrimination, what does economic theory tell us
about why discrimination exists/persists&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If studying stock market returns, what do economic models tell
us about our ability to predict returns?&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Includes papers that have answered your research question (or
similar research questions)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Research results described in present tense (&amp;ldquo;Smith finds,&amp;rdquo; not
&amp;ldquo;Smith found&amp;rdquo;)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Papers are put in context. That is, rather than just listing paper A
and finding, paper B and finding, etc, you link each one (or group)
to their contribution (as relates to your research question)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="methodology" class="unnumbered">Methodology/Data&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Describe the data you use, where did it come from? If you didn&amp;rsquo;t
create it, cite it&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What is the unit of observation? Is it people, households, states,
etc? Make sure the unit is appropriate to your question&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If you&amp;rsquo;re working with individual-level data, what is the age range you want in your sample? What years of data do you need?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If dealing with labor force variables, do you want all people of working age, all those who are in the labor force, or all who are employed?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Describe your methodology. Are you estimating a model using OLS? If
so, say so.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Clarify whether we are looking at causal estimates or something
else. What are the estimated parameters of interest? What do they
mean?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Correct standard errors: robust? Clustered? Something else?&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="population-model">Population model&lt;/h4>
&lt;p>Write out your population model!&lt;/p>
&lt;ul>
&lt;li>
&lt;p>If you&amp;rsquo;re using Word, use equation editor. Make it look nice.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Don&amp;rsquo;t forget the error term!&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use proper equation notation ($\beta$, $u$, etc)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use appropriate subscripts ($i$, $t$, $y$, etc)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>All relevant variables explained/defined&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use descriptive variable names when possible (ie use $female$ for women, not $w1$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make sure your variables are written correctly - an equation like $wage = \alpha_0 + \alpha_1 race$ doesn&amp;rsquo;t make sense - race isn&amp;rsquo;t continuous!&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If you are using a lot of categorical variables and find it awkward to write them out, you can simplify:&lt;/p>
&lt;ul>
&lt;li>Showing that you have state fixed effects:&lt;/li>
&lt;/ul>
&lt;p>$$y_{st} = \beta_0 + \beta_1 X_{st} + \beta_2 Z_t + &amp;hellip; + f_s + u_{st}$$&lt;/p>
&lt;p>and the in the text, &amp;ldquo;&amp;hellip;where $f_s$ is a vector of state fixed effects&amp;rdquo;&lt;/p>
&lt;ul>
&lt;li>Including a set of occupational dummy variables
$$y_{st} = \beta_0 + \beta_1 X_{st} + \beta_2 Z_t + &amp;hellip; +\sum^K_{s=1}\delta_SD_s + u_{st}$$&lt;/li>
&lt;/ul>
&lt;p>and in the text, &amp;ldquo;&amp;hellip;where $D_k$ is a dummy variable for occupation $s$, from $s \in [1,S]$&amp;rdquo; &lt;em>(or something in that general spirit)&lt;/em>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="results" class="unnumbered">Results&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>When using categorical/dummy variables, what is your omitted
category? Make sure you know and that it&amp;rsquo;s clear.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What are the units of your measures? Is that percent or percentage points?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Discuss using a reasonable number of decimal places (usually only 1
or 2)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="limitations-or-discussion">Limitations or Discussion&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Include as a separate section or integrate into results&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What might us from making causal interpretations about our
coefficient of interest?&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Omitted variable bias?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Reverse causality?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Measurement error?&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Are the results externally valid?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>What other considerations are important?&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="conclusion" class="unnumbered">Conclusion&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Brief summary of paper (yes, &lt;em>another&lt;/em> summary)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Limitations (summary of limitations/discussion section)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Implications for policy (if relevant)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Implications for future research&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="tables" class="unnumbered">Tables&lt;/h3>
&lt;p>You will need the following tables:&lt;/p>
&lt;ol>
&lt;li>Descriptive statistics: This will present key information about your datat set that we will need to understand your context. Choose relevant variables to describe, including key dependent and independent variables&lt;/li>
&lt;li>Main regression results: This will be a table of your key specifications. You may have the results from a few regressions in the same table. It&amp;rsquo;s this table that would be the &amp;ldquo;takeaway&amp;rdquo; table&lt;/li>
&lt;li>Secondary regression results: Results that help dig deeper, consider subgroups, consider related hypotheses or outcomes, etc.&lt;/li>
&lt;/ol>
&lt;p>How you arrange regressions between (2) and (3) will depend on how you structure your argument.&lt;/p>
&lt;p>Additional tables (especially two-person papers) will extend your analysis through other modeling approaches, other dependent variables, additional displays of robustness.&lt;/p>
&lt;p>You may also include figures, but they would not substitute for the required tables unless the figures themselves presents new results.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Please embed tables near the place where they are referenced (rather than at the end)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Tables should be properly formatted. That is, they should be made in
Excel (or LaTeX) and NEVER copied and pasted out of Stata raw output&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Variables should be described using real words. That is, &amp;ldquo;number of
children,&amp;rdquo; not `numchld.'&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Tables and figures should be numbered (Table 1, Table 2, etc&amp;hellip;
Figure 1, Figure 2, etc.) and should also be given a title. Refer to
tables by their numbers in the text.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Regression tables include standard errors. Use stars to indicate
statistical significance. (The Stata package &lt;code>outreg2&lt;/code> is a big help!)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In most contexts, about 3 places past the decimal point is right,
but it depends on the magnitudes. If you really want to be precise,
set and stick to a reasonable number of significant digits. There&amp;rsquo;s
no place for a number like 0.05403823 or 0.0000000 in your tables.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="references">References&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>You&amp;rsquo;ll use outside sources in your introduction and background/literature review, at a minimum. Make sure that you have (1) at least &lt;em>5 academic sources&lt;/em> (published academic journals), and (2) at least &lt;em>8 sources total&lt;/em> (could also include working papers, newspaper articles, policy papers, etc.)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Make sure to cite your data (does not count for totals above)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Use footnotes, not endnotes&lt;/p>
&lt;/li>
&lt;li>
&lt;p>At the end of your paper, include list of references cited&lt;/p>
&lt;/li>
&lt;li>
&lt;p>You can format using APA, MLA, or Chicago style, but it must be a consistent style&lt;/p>
&lt;ul>
&lt;li>Citation Owl or Google Scholar will do it for you&lt;/li>
&lt;li>Microsoft Word&amp;rsquo;s bibliography management system can be hard to work with. Beware!&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>In-text, cite with author and year (Author, Year; Author, Year) or
(Author Year, Author Year)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="replication" class="unnumbered">Replication package&lt;/h3>
&lt;p>You must also submit materials such that I can replicate your analysis easily. How you do this exactly is up to you, but it must include the following:&lt;/p>
&lt;ol>
&lt;li>Stata do-file that replicates your analysis
a. This should be &amp;ldquo;push-button.&amp;rdquo; That is, I should be able to load it, adjust the main file path, and run it once to replicate all your results.
b. The do-file should therefore declare one a file path at beginning of file rather than using local paths throughout&lt;/li>
&lt;li>Raw data file.&lt;/li>
&lt;li>Log file that shows your results fron running your do-file&lt;/li>
&lt;li>If anything might seem confusing to me, include a Readme file to tell me what to do!&lt;/li>
&lt;/ol>
&lt;p>These do not count toward your page limit, and they should be submitted as separate files.&lt;/p>
&lt;h4 id="how-to-submit">How to submit?&lt;/h4>
&lt;ul>
&lt;li>Ideally, zip your files and then upload directly to Brightspace.&lt;/li>
&lt;li>If the file is too large, use
&lt;a href="https://filetransfer.uvm.edu/" target="_blank" rel="noopener">UVM File Transfer&lt;/a>&lt;/li>
&lt;li>I&amp;rsquo;m open to other options depending on your preferences - Github repository, shared Google folder, etc.&lt;/li>
&lt;/ul>
&lt;h3 id="style">Style&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Use present test and first-person active voice! (I estimate a regression, &lt;strong>NOT&lt;/strong> &amp;ldquo;A
regression is estimated&amp;rdquo;)&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Single-authored paper first person singular, &amp;ldquo;I.&amp;rdquo; (You&amp;rsquo;re not
the queen!)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Joint-authored paper first person plural, &amp;ldquo;we.&amp;rdquo;&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Don&amp;rsquo;t believe me? Check out any economics paper published in the
past 20 years. There&amp;rsquo;s &lt;em>some&lt;/em> variation in I vs. we, but a lot of active voice.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Divide paper into numbered, labeled sections.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A research paper is not an essay!&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Personal opinions don&amp;rsquo;t have a place&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Sources should be primarily academic (peer-reviewed journals,
working papers, etc.), maybe some non-academic sources for
motivation only&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Clear, labeled sections&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;!---
## Parameters {#parameters .unnumbered}
- All paper elements -- idea proposal, paper proposal, first draft,
peer review forms, final research paper -- should be submitted
electronically via Blackboard in Word or PDF format.
- **Format your tables in Excel or comparable software.** Do not copy and
paste Stata output.
- Length requirements vary depending on the number of people in your
group and are based on the the text of your paper only. A one-person
project should be 2500-3500 words. A two-person project should be
3500-4500 words. These guidelines **exclude** the title,
abstract, references, figures, and tables.
- No appendices. If it's important, include it in the paper. If it's
not, then don't include it.
- Formatting: 12 pt font, 1\" margins, double spacing, Times New
Roman. No spaces between paragraphs. Include page numbers.
- Please embed your figure/tables.
-->
&lt;h2 id="deadlines" class="unnumbered">Deadlines&lt;/h2>
&lt;p>See
&lt;a href="../../schedule">course schedule&lt;/a> for deadlines. Submit materials by &lt;strong>11:59pm&lt;/strong> on
the deadline unless otherwise specificed. Submit all assignments via Brightspace. (Late assignments without an extension
will be penalized 10% per day, and they may not receive detailed
feedback.)&lt;/p>
&lt;h2 id="grading" class="unnumbered">Grading&lt;/h2>
&lt;p>I will provide formal or informal grading rubrics for each component, so
you have a clear idea of how you&amp;rsquo;ll be graded. As the
&lt;a href="../../syllabus#grading">syllabus&lt;/a> shows, your total research paper score will account for 35% of your final grade.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">&lt;/th>
&lt;th style="text-align:center">&lt;/th>
&lt;th style="text-align:center">&lt;/th>
&lt;th>&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Process&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">30%&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">Research ideas&lt;/td>
&lt;td style="text-align:center">5%&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">Annotated bibliography&lt;/td>
&lt;td style="text-align:center">5%&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">Research proposal&lt;/td>
&lt;td style="text-align:center">10%&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">Peer review&lt;/td>
&lt;td style="text-align:center">5%&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">Rough draft&lt;/td>
&lt;td style="text-align:center">0%&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">&lt;/td>
&lt;td style="text-align:center">Presentation&lt;/td>
&lt;td style="text-align:center">5%&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Final draft&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">&lt;/td>
&lt;td style="text-align:center">70%&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="faq" class="unnumbered">FAQ&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;em>How does my group size affect grading?&lt;/em> The grading rubric is the
same regardless of your group sizes. However, I expect that in a
larger group, your analysis will go deeper, your review of the
literature will be more comprehensive, you&amp;rsquo;ll have additional
robustness or placebo tests, etc. See the page requirements for a
guide. If you have questions, feel free to talk with me in more
detail.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Can I turn in a paper with 10 pages of text and 3 tables, or 10
pages of tables and 5 pages of text?&lt;/em> So long as the word and exhibit count are met, there&amp;rsquo;s no &amp;ldquo;right&amp;rdquo; place to be in that! What
matters most is that your paper clearly addresses your research
question.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>My results aren&amp;rsquo;t statistically significant, should I start over?&lt;/em> NO. Remember that our goal here isn&amp;rsquo;t to find statistically significant relationships, it&amp;rsquo;s to answer questions. Let the data speak for itself about what relationships are or are not there.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>How should I format my citations and bibliography?&lt;/em> Consistently.
APA or Chicago is fine. MLA is not.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>How much data analysis do I need to do?&lt;/em> You should incorporate
data analysis to answer your research question or test your
hypotheses. You may also use data to provide some descriptive
statistics, however that alone would not be sufficient. Exactly how
much analysis is involved will depend on the question you pose and
your approach to answering it.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Do I have to use Stata?&lt;/em> You can use an alternative programmable language like R or Python. Your analysis should not be conducted in Excel. My ability to support your programing in languages besides Stata is more limited.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Do I have to use IPUMS data?&lt;/em> I&amp;rsquo;m open to other possibilities, but it must be approved by me first.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>What if I&amp;rsquo;ve worked on this topic for another class?&lt;/em> This can work, but first talk to me so we can figure out a plan that ensures you&amp;rsquo;re building beyond what you&amp;rsquo;re already doing.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="recommendations" class="unnumbered">Recommendations&lt;/h2>
&lt;p>See
&lt;a href="https://econ3500s26.netlify.app/bonus/rp-resources">paper resources&lt;/a> for dataset and topic suggestions.&lt;/p></description></item><item><title/><link>https://econ3500s26.netlify.app/assignment/ps3/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/ps3/</guid><description>&lt;p>Version: Spring 2018&lt;br>
EC200 Econometrics and Applications&lt;/p>
&lt;p>&lt;strong>Problem Set 3&lt;/strong>\&lt;/p>
&lt;ol>
&lt;li>
&lt;p>The following table shows, for eight vintages of select, delicious,
wine, purchases per buyer ($y$) and the wine buyer&amp;rsquo;s rating ($x$) in
a given year:&lt;/p>
&lt;p>$x$ 3.6 3.3 2.8 2.6 2.7 2.9 2.0 2.6&lt;/p>
&lt;hr>
&lt;p>$y$ 24 21 22 22 18 13 9 6&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Estimate &lt;em>by hand&lt;/em> the regression of purchases per buyer on the
buyer&amp;rsquo;s rating.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Interpret the slope of the estimated regression line.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Interpret the intercept of the estimated regression line .\&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>(Stock and Watson 4.2) Suppose that a random sample of 200
20-year-old men is selected from a population and that these men&amp;rsquo;s
height and weight are recorded. A regression of weight (measured in
pounds) on height (measured in inches) yields&lt;/p>
&lt;p>$$\widehat{Weight}=-99.41 + 3.94 Height$$&lt;/p>
&lt;p>$R^2 = 0.81$; $SER = 10.2$&lt;/p>
&lt;ol>
&lt;li>
&lt;p>What is the predicted weight for someone who is 70 inches tall?
65 inches tall?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>One 20-year-old man has a late growth spurt and grows 1.5 inches
over the course of the year. What is the regression&amp;rsquo;s prediction
for the increase in his weight?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Suppose that you want to translate the results of this equation
into centimeters and kilograms. What are the regression
estimates from this new regression? Give all results, including
estimated coefficients, $R^2$, and SER.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Interpret the $R^2$ value. Does it indicate anything about
whether these estimates are likely to be biased? Explain.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>(Stock and Watson 5.2) Suppose tha a researcher, using wage data on
250 randomly selected male workers and 280 randomly selected female
workers, estimates the following OLS regression:&lt;/p>
&lt;p>$$\begin{aligned}
\widehat{Wage}=&amp;amp;12.52 + &amp;amp;2.12 Male\
&amp;amp;(0.23) &amp;amp; (0.36)\end{aligned}$$&lt;/p>
&lt;p>$R^2 = 0.06$; $SER = 4.2$&lt;/p>
&lt;p>where $Wage$ is measured in dollars per hour and $Male$ is a binary
variable equal to 1 if a person is male and 0 if female. Define the
wage gender gap as the difference in mean earnings between men and
women.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>What is the estimated gender gap?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Is the estimated gender gap significantly different from zero?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Construct a 95% confidence interval for the gender gap&lt;/p>
&lt;/li>
&lt;li>
&lt;p>In the sample, what is mean wage of women? Of men?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Another researcher uses these data, but regresses $Wage$ on
$Female$, a variable equal to 1 if the person is female and 0 if
the person is male. What are the regression estimates from this
regression? (Include the coefficients, $R^2$, and $SER$.)&lt;/p>
&lt;p>$$\begin{aligned}
\widehat{Wage}=&amp;amp;___ + ___ ( Female)\end{aligned}$$&lt;/p>
&lt;p>$R^2 = ___$; $SER = ___$&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ol></description></item><item><title/><link>https://econ3500s26.netlify.app/assignment/ps4/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/ps4/</guid><description>&lt;p>Version: Spring 2017&lt;br>
EC200 Econometrics and Applications&lt;/p>
&lt;p>&lt;strong>Problem Set 4&lt;/strong>\&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Stock and Watson 6.6&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Stock and Watson 7.4&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Stock and Watson 7.8 (skip part c)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Stock and Watson, Additional Empirical Exercise 6.1&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Stock and Watson, Additional Empirical Exercise 5.3&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Suppose that average worker productivity at manufacturing firms
($avgprod$) depends on two factors: average hours of training
($avgtrain$) and average worker ability ($avgabil$)&lt;/p>
&lt;p>$$avgprod = \beta_0 + \beta_1 avgtrain + \beta_2 avgabil + u$$&lt;/p>
&lt;p>Assume this equation satisfies the Gauss-Markov assumptions. If
grants have been given to firms whose workers have less than average
ability, so that $avgtrain$ and $avgabil$ are negatively correlated,
what is the likely bias on $\widetilde{\beta_1}$ obtained from the
simple regression of $avgprod$ on $avgtrain$?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Finish and submit Lab 4.&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title/><link>https://econ3500s26.netlify.app/assignment/ps4_s18/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/ps4_s18/</guid><description>&lt;p>Version: Spring 2018&lt;br>
EC200 Econometrics and Applications&lt;/p>
&lt;p>&lt;strong>Problem Set 4&lt;/strong>\&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Suppose that $(Y_i,X_i)$ satisfy the three key least squares
assumptions and, in addition, $u_i$ is $N(0,\sigma^2_u)$ and is
independent of $X_i$. A sample of size $n = 30$ yields&lt;/p>
&lt;p>$$\begin{aligned}
\widehat{Y} &amp;amp; = 43.2 + &amp;amp;61.5 X \
&amp;amp; (10.02) &amp;amp; (7.4) \
R^2 = 0.54 &amp;amp; SER = 1.52 &amp;amp;\end{aligned}$$&lt;/p>
&lt;p>where the numbers in parentheses are the homoskedastic-only standard
errors for the regression coefficients.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Construct a 95% confidence interval for $\beta_0$.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Construct a 90% confidence interval for $\beta_1$.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Test $H_0: \beta_1=55$ against $H_1: \beta_1 \neq 55$ at the 5%
level.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Test $H_0: \beta_1=55$ against $H_1: \beta_1 &amp;gt; 55$ at the 5%
level.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Explain briefly why the test of $H_0: \beta_1=55$ against
$H_1: \beta_1 &amp;lt; 55$ is trivial. You can use a picture if is
helps make things clearer.\&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>In the 1980s, Tennessee conducted an experiment in which
kindergarten students were randomly assigned to &amp;ldquo;regular&amp;rdquo; and
&amp;ldquo;small&amp;rdquo; classes and given standardized tests at the end of the year.
(Regular classes contained approximately 24 students, and small
classes contained approximately 15 students.)&lt;br>
Suppose that, in the population, the standardized tests have a mean
score of 925 points and a standard deviation of 75 points. Let
$SmallClass$ be a binary variable equal to 1 if the student is
assigned to a small class and equal to 0 otherwise. A regression of
$TestScore$ on $SmallClass$ yields $$\begin{aligned}
TestScore &amp;amp;= 918.0 + 13.9 &amp;amp;SmallClass\
&amp;amp; (1.6) &amp;amp; (2.5)\
R^2 = 0.01, &amp;amp; SER = 74.6&amp;amp;\end{aligned}$$&lt;/p>
&lt;p>where the numbers in parentheses are the standard errors for the
regression coefficients.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Do small classes improve test scores? By how much? Is the effect
large? Explain.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Is the estimated effect of class size on test scores
statistically significant? Carry out a test at the 5% level.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Do you think that the regression errors are plausibly
homoskedastic? Explain.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$SE(\widehat{\beta_1})$ was computed using the initial formula
for standard errors (based on equations 5.3 and 5.4 in Stock and
Watson). Would having heteroskedastic errors and using this
formula affect the validity of your hypothesis tests? What if
the errors are actually homoskedastic? Explain.\&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>Visit the Stock and Watson webpage (here:
&lt;a href="http://wps.aw.com/aw_stock_ie_3/178/45691/11696959.cw/index.html" target="_blank" rel="noopener">http://wps.aw.com/aw_stock_ie_3/178/45691/11696959.cw/index.html&lt;/a>)
and click on the &amp;ldquo;Additional Empirical Exercises.&amp;rdquo; tab. Complete
Additional Empirical Exercise 5.3 using the data set
&lt;code>CollegeDistance&lt;/code>. Note that you can download this data from the
Additional Emprical Exercises page.\&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Finish Lab 3 - include do-file, log-file, and answers to questions.&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title/><link>https://econ3500s26.netlify.app/assignment/ps7/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/ps7/</guid><description>&lt;p>Version: Spring 2018&lt;br>
EC200 Econometrics and Applications&lt;/p>
&lt;p>&lt;strong>Problem Set 7&lt;/strong>\&lt;/p>
&lt;ol>
&lt;li>
&lt;p>In 1985, neither Florida nor Georgia had laws banning open alcohol
containers in vehicle passenger compartments. By 1990, Florida had
passed such a law, but Georgia had not.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Suppose you collect random samples of the driving-age population
in both states, for 1985 and 1990. Let $arrest$ be a binary
variable equal to one if a person was arrested for drunk driving
during the year. Without controlling for any other factors,
write down a linear probability model that allows you to test
whether the open container law reduced the probability of being
arrested for drunk driving. Which coefficient measures the
effect of the law?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Why might you want to control for other factors in the model?
What might some of these factors be?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Now, suppose that you can only collect data for 1985 and for
1990 at the county level for the two states. The dependent
variable would be the fraction of licensed drivers arrested for
drunk driving during the year. How does this data structure
differ from the individual-level data described in part (a)?
What econometric method would you use?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>For this exercise, use &lt;code>JTRAIN.dta&lt;/code> to determine the effect of a job
training grant on hours of job training per employee. The basic
model for the three years is the following: $$\begin{split}
hrsemp_{it} &amp;amp;= \beta_0 + \delta_1 d88_t + \delta_2 d89_t +\
&amp;amp; \beta_1 grant_{it} + \beta_2 grant_{i,t-1} + \beta_3 log(employ_{it}) + a_i + u_{it}
\end{split}$$&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Estimate the equation using first differencing. How many firms
are used in the estimation? How many total observations would be
used if each firm had data on all variables (in particular,
$hrsemp$) for all three time periods?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Interpret the coefficient on $grant$, and comment on its
significance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Is it surprising that $grant_{-1}$ is insignificant? Explain.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Do larger firms train their employees more or less, on average?
How big are the differences in training?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>Use &lt;code>CRIME4.dta&lt;/code> for this exercise, and see scanned upload for
example 13.9.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Replicate the results in Example 13.9.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Re-estimate the unobserved effects model for crime in Example
13.9, but use fixed effects rather than differencing. Are there
any notable sign or magnitude changes in the coefficients? What
about statistical significance?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Add the logs of each wage variable in the data set and estimate
the model by fixed effects. How does including these variables
affect the coefficient on the criminal justic variables in part
(b)?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Do the wage variables in part (c) have the expected sign? Are
they jointly significant?&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>Finish and submit Lab 6.&lt;/p>
&lt;/li>
&lt;/ol></description></item><item><title/><link>https://econ3500s26.netlify.app/assignment/ps8/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://econ3500s26.netlify.app/assignment/ps8/</guid><description>&lt;p>Version: Spring 2018&lt;br>
EC200 Econometrics and Applications&lt;/p>
&lt;p>&lt;strong>Problem Set 8&lt;/strong>\&lt;/p>
&lt;ol>
&lt;li>
&lt;p>9.3, 9.5, 9.6, 9.10 (odd-numbered answers are online, but think
through them carefully!)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Additional empirical exercise 9.1, 10.1 - make sure you include a
do-file and log-file that reflects your analysis!&lt;/p>
&lt;/li>
&lt;/ol></description></item></channel></rss>