class: center, middle, inverse, title-slide .title[ # Hypothesis Testing ] .subtitle[ ## Part 1: What it Is ] .date[ ### Updated 2023-02-28 ] --- # Truth and Reality - Our goal as econometricians is to uncover the *true data generating process* - Unfortunately, that's impossible - Unlike in, say, pure mathematics, we will never know the truth. No matter how far we walk, it will always be on the horizon - All we can hope to do is move a little bit closer --- # Truth and Reality - Today we'll be covering hypothesis testing, which is one approach to using *reality* to get closer to the *truth* - It works by subtraction - We test whether certain versions of the truth are *likely or unlikely* - And if we find that they're *unlikely*, we can reject that version of the truth, narrowing down what the actual possibilities are and getting closer and closer to the actual truth --- # Truth and Reality - When we're talking about *the truth* here, we're referring to the *true data generating process* (DGP) - For example, if this is the *true DGP*: $$ Wage_i = \beta_0 + \beta_1AdultHeight_i + \varepsilon_i $$ where `\(cor(AdultHeight_i,\varepsilon_i) = 0\)`, then... - Person `\(i\)`'s wage is *truly determined* by a linear function of your height, plus an unrelated error term `\(\varepsilon_i\)` - Why might someone have a high wage? Either they're tall, or they have a high error term, or both. No other way! --- # Truth and Reality - Our ability to *estimate* that true model depends on our ability to avoid inference and identification error - If we assume that's the true DGP, there's no endogeneity, and the relationship between `\(Wage\)` and `\(AdultHeight\)` is a straight line, so regular 'ol OLS of `\(Wage\)` on `\(AdultHeight\)` will not give us identification error - But we also need to be careful about inference error - When we run that regression, what does our `\(\hat{\beta}_1\)` say about the true value `\(\beta_1\)`? --- # Terminology Remember: - Greek letters like `\(\beta_1\)` are the *truth*. They are part of the *true DGP* - Modified Greek letters like `\(\hat{\beta_1}\)` are our *estimate*. They are what we *think* the truth is based on our data - English (Latin) letters like `\(X\)` are *actual data from our sample* - Modified English letters like `\(\bar{X}\)` are *calculations from our sample*. They're what we *do* with our data (we can also just write out the calculation; `\(\bar{X}=(1/N)\sum_iX_i\)`) - We can say that our estimate of the truth is that calculation, e.g. `\(\hat{\mu} = \bar{X}\)` $$ Data \rightarrow Calculation \rightarrow Estimate \xrightarrow[]{Hopefully!} Truth $$ $$ X, Y \rightarrow \frac{\sum_iX_iY_i}{\sum_iX_i^2} \rightarrow \hat{\beta_1} \xrightarrow[]{Hopefully!} \beta_1 $$ --- # Hypothesis Testing $$ Data \rightarrow Calculation \rightarrow Estimate \xrightarrow[]{Hopefully!} Truth $$ - We want to be able to take our `\(Data\)` and learn something about the `\(Truth\)` - We acknowledge that there is lots of random variation in the `\(Data\)`, which means lots of random variation in the `\(Calculation\)`, which means that our `\(Estimate\)` will vary from sample to sample even though the `\(Truth\)` won't! - Hypothesis testing *uses* that sampling variation to try to *eliminate false versions of the truth* --- # Sampling Variation - We talked last time about *sampling variation in an estimate - This of course comes because data varies from sample to sample, and `\(Data \rightarrow Calculation \rightarrow Estimate\)`. - In this data, the true `\(\beta_1\)` is 2: `\(Y_i = 3 +2X_i + \varepsilon_i\)` ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- # Sampling Variation - Notice that there is plenty of variation around the true value of `\(2\)` - Now let's imagine we don't know that and are trying to answer the question "is the truth `\(\beta_1 = 2\)`?" - We don't have the full sampling distribution, we just have a single estimate: ```r tib <- tibble(X = runif(100)) %>% mutate(Y = 3 + 2*X + rnorm(100)) model <- feols(Y ~ X, data = tib) coef(model) ``` ``` ## (Intercept) X ## 2.798554 2.215880 ``` - All we see is `\(\hat{\beta}_1 =\)` 2.22. So... is `\(\beta_1 = 2\)`? --- # Null and Alternative Hypotheses - The *null hypothesis* is the version of the truth we're *testing to see whether we can prove it's wrong* - The *alternative hypothesis* is, well, every version of the truth that's not the null hypothesis - Here, if we're trying to check whether `\(\beta_1 = 2\)`, then `\(\beta_1 = 2\)` is our null hypothesis - And `\(\beta_1 \neq 2\)` is our alternative hypothesis - I bring this up here because in order to do a hypothesis test, we need to think about the *sampling distribution under the null* --- # Null Distribution - The "null distribution" is what the sampling distribution of the estimator *would be* if our null distribution were true - We can see that in the sampling distribution we have! - `\(\beta_1 = 2\)` *is* true, and here's what the sampling distribution looks like! (although it would be smoother with more samples) ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- # Null Distribution - So the key question that a hypothesis test asks is: *given this null distribution, how unlikely is it that we get the result we get*? - If it's *super unlikely* that the null is true and we get our result, well... - We definitely got our result... - So the null must be the part that's wrong! - That's when we *reject the null* - we find that the sampling distribution under the null *hardly ever* produces a result like ours, so that's probably the wrong sampling distribution and thus the wrong null! --- # Null Distribution - How does this work out with our estimate of 2.22? - Let's stick it on the graph ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # Hypothesis Test - Our test comes down to: how weird would it be to get a result this far from the "truth" or farther? - We can figure this out by shading in the parts of the null distribution this far from the null truth or farther - So we shade 2.22 and above, and also 2 - abs(2.22 - 2) = 1.78 and below. ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # Hypothesis Test - Based off of this sampling distribution, there's a 31% + 26% = 57% chance of getting something as weird as we got or weirder (or for a *one-tailed* test, a 26% chance of getting something that high or higher) if the null of `\(\beta_1 = 2\)` is true - That's not too unlikely! So, we would *fail to reject* the null of `\(\beta_1 = 2\)` - This doesn't mean that we conclude that `\(\beta_1 = 2\)` *is true*, it just means we can't rule it out ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- # The Null Distribution - Of course, we generated this null distribution by just randomly creating a few random samples - We also happen to know that if we had *infinite* samples, the sampling distribution of OLS would be a normal distribution with the mean at the true value and the standard deviation determined by `\(var(X)\)`, the variance of the residual, and the sample size. The real null distribution looks like this: ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- # The Null Distribution - So with the estimate we made from the sample we got (2.22), we can't reject a null `\(\beta_1 = 2\)` - Which is good!! That's the truth. We don't want to reject it! - How about other nulls? Can we reject those? - Can we reject a null that `\(\beta_1 = 0\)`? - (by default, most null hypotheses are that the parameter is 0) - Let's follow the same steps! --- # The Null Distribution - Now *that's* unlikely. We can reject that the true value is 0. ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- # Concept Checks - What is a null distribution? - Why would we want to look at whether our estimate is *unlikely* given the null? What does this get us? - How is it possible to use the sampling distribution of the null when we only have one sample? --- # p-values - Ok, so *when* is it weird enough to reject the null? - A *p-value* is the probability of getting a result as-weird-as-we-actually-got or weirder under the assumption that the null is true - So in the below graph, the p-value is .56 or 56% ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- # p-values - The lower the p-value, the less likely it is that we got our result AND the null is true - And since we definitely got our result, a really low p-value says we should reject the null - How low does it need to be to reject the null? - Well... --- # p-values and .05 - It's common practice to decide on a *confidence level* and a corresponding `\(\alpha\)`, most commonly a 95% confidence level `\(\rightarrow \alpha = .05\)`, and reject the null if the p-value is lower than `\(\alpha\)` - Why 95%? Completely arbitrary. Someone picked it out of thin air a hundred years ago as a just-for-instance and we still use it 🤷 - Having a hard-and-fast threshold like this is not a great idea ( `\(p=.04\)` is rejection, but `\(p=.06\)` is not?), but it's a very hard habit to break - Key takeaway: get familiar with the concept of rejecting the null when the p-value is lower than .05, because you'll see it - BUT don't get too hung up on black-and-white rejection in general --- # Power - When we pick an `\(\alpha\)`, like .05 or .1 or .01 or whatever, we're saying "I'm comfortable rejecting the null 5% (or 1%, etc.) of the time even when it's true" - Since 5% of the time, you'll get a p-value below .05 even if the null is true, because of sampling variation - This is a "false positive" - Why not just pick a super low confidence level so we don't have false positives? --- # Power - Because the smaller we make our confidence level, the less likely we are to reject the null *in general* - Which means we'll also *fail to reject* it if it's actually *false* (a "false negative") - We want a low false-positive rate, but also a low false-negative rate - The false negative rate is called "power." If we will reject the null 80% of the time when it's actually false, we have 80% power - As `\(\alpha\)` shrinks, false-positive rates decline, but false-negative rates increase - Strike a balance! <span style="font-size: small">(minor sidenote: "false positive" and "false negative" are sometimes referred to as "Type I Error" and "Type II Error" - these are not great terms because they are hard to remember! If you encounter them, just remember that in "The boy who cried wolf" the townspeople think there's a wolf when there's not, then think there's not a wolf when there is, committing Type I and II error in that order)</small> --- # Concept Checks - Say we run an two-sided test on the parameter `\(\beta_0\)` with a null hypothesis that `\(\beta_0 = 3\)` and get an estimate of `\(\hat{\beta}_0 = 3.5\)` and a p-value of `\(.07\)`. Complete the sentence: "There is a 7% chance that..." - If someone said a study was "underpowered," what might you guess they mean by that? - If you have chosen a confidence level of 95% and you get a p-value of `\(p = .06\)`, do you reject the null or not? --- # Describing Uncertainty - Let's come at this from a different angle, shall we? - The real purpose of all of this is to avoid *inferential error*. That doesn't necessarily mean we need an up-or-down rejection of a null hypothesis - Why don't we just describe the variation in the estimator without necessarily making a judgement about a null hypothesis? - Instead of centering around a null hypothesis, these will focus on the *estimate we have* and think about the sampling variation around that estimate --- # Standard Errors and Confidence Intervals - One way we can describe sampling variation is to think about the standard errors - The standard error of an estimate is the standard deviation of that estimate's sampling distribution - Since the sampling distribution is often normal, the mean of that distribution (our estimate) and the standard error are enough to describe what the sampling distribution looks like - This is why standard errors are often shown on regression tables! --- # What we just did... - This is how we thought of it - we picked a null, figured out the null distribution (the sampling distribution of the estimator assuming the null was true), and checked if our estimate was unlikely enough that we could reject the null ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- # Instead... - By thinking about standard errors, we instead center the sampling distribution around our estimate. If the null is far away, we reject that null (result should be the same)! ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- # Confidence Intervals - So what we're thinking now is not "is our estimate close to the null?" but rather "is that null close to our estimate?" - We can go one step further and ask "*which* nulls are close to our estimate?" and figure out which nulls we can think about rejecting from there - A *confidence interval* shows the range of nulls that would *not* be rejected by our estimate - Everything outside that range can be rejected --- # Confidence Intervals - This takes the form of $$ \hat{\beta}_1 \pm Z(s.e.) $$ - Where `\(Z\)` is some value from our distribution that gives us the `\(1-\alpha\)` percentile (for a 95% confidence interval with a normal sampling distribution, `\(Z = 1.96\)`) - and `\(s.e.\)` is the standard error of `\(\hat{\beta}_1\)` --- # Confidence Intervals - Thinking back to our estimate:
Model 1
(Intercept)
2.80
(0.21)
X
2.22
(0.37)
N
100
- Our estimate of the coefficient is 2.22, and our estimate of the standard error is 0.37 - So for a 95% confidence interval, assuming normality, we get 2.22 `\(\pm\)` 1.96* 0.37, or [1.49,2.94] --- # Confidence Intervals - We can see this graphically as well - we should only reject nulls in those 5% tails for a 95% confidence interval ![](data:image/png;base64,#Week_03_Slides_1_Hypothesis_Testing_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- # Concept Checks - What feature of the sampling distribution of the `\(\beta\)` does the standard error describe? - Why do we get the same reject/don't reject result if we center the sampling distribution around the null as around our estimate? - We perform an estimate of `\(\hat{\beta}_1\)` and get a 95% confidence interval of [-1.3, 2.1]. Describe what this means in a sentence. - In the above confidence interval, can we reject the null of `\(\beta_1 = 0\)`? --- # Applied Example Regress traffic fatality rate on the legal drinking age, showing (s.e.) and [p-values]
Model 1
(Intercept)
3.22
(0.71)
[0.00]
mlda
-0.06
(0.03)
[0.10]
N
336
- Check the p-value. Can we reject `\(\beta_1 = 0\)` at the 95% confidence level? - Using 1.96, construct the 95% confidence interval on `\(\hat{\beta}_1\)` - Draw the sampling distribution at a null of `\(\beta_1 = 0\)`. What is its standard deviation? How much shaded area is there to the left of -0.06?