class: middle, title-slide # Null Hypothesis Significance Tests III ## Continuous Data I ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- class: middle ## Identical, independent normal distributed data with known `\(\sigma\)` --- ## z-Test for a Mean `\(\mu\)` .row[.col-7[ * i.i.d. random sample * population data is normally distributed, therefore the sample statistic is normally distributed * or: population data is **not** normally distributed, yet the sample statistic is asymptotically normal distributed because the sample size is large enough (rule of thumb: `\(n > 30\)`) * the population standard deviation `\(\sigma\)` is known The z-test for a mean `\(\mu\)` is a statistical test for a population mean. The test statistics is the sample mean `\(\bar{x}\)`. The standardized test statistic is `\(z\)` `$$z = \frac{\bar{x}-\mu }{\sigma/\sqrt{n}}$$` ]] --- ## Example .tip[ .row[.col-6[ In auto racing, a pit stop is where a racing vehicle stops for new tires, fuel, repairs, and other mechanical adjustments. The efficiency of a pit crew that makes these adjustments can affect the outcome of a race. A pit crew claims that its mean pit stop time (for 4 new tires and fuel) is less than 13 seconds. A random sample of 32 pit stop times has a sample mean of 12.9 seconds. Assume the population standard deviation is 0.19 second. Is there enough evidence to support the claim at `\(\alpha = 0.01\)`? ] .col-6[ Because `\(\sigma\)` is known `\(\sigma = 0.19\)`, the sample is i.i.d. random, and `\(n = 32 \geq 30\)`, we can use the z-test. The claim is "the mean pit stop time is less than 13 seconds." So, the null and alternative hypotheses are `\(H_0: \mu \geq 13\)` seconds, and `\(H_a: \mu < 13\)` seconds. (Claim) `$$z= \frac{12.9-13}{0.19/\sqrt{32}} \approx -2.98$$` It's a left-tailed test. ]]] --- .row[.col-7[ <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ ```r pnorm(-2.98) ``` ``` ## [1] 0.001441242 ``` ```r pnorm(12.9,13,0.19/sqrt(32)) ``` ``` ## [1] 0.001454036 ``` Because the P-value is less than `\(\alpha = 0.01\)`, we reject the null hypothesis. #### Note: A p-value gives no indication of uncertainty, whereas a CI is much more informative. ]] --- ## Example .tip[ .row[.col-6[ According to a study of U.S. homes that use heating equipment, the mean indoor temperature at night during winter is 68.3°F. You think this information is incorrect. You randomly select 25 U.S. homes that use heating equipment in the winter and find that the mean indoor temperature at night is 67.2°F. From past studies, the population standard deviation is known to be 3.5°F and the population is normally distributed. Is there enough evidence to support your claim at `\(\alpha = 0.05\)`? ] .col-6[ Because `\(\sigma\)` is known `\(\sigma = 3.5\)`, the sample is random and normally distributed, we can use the z-test. The claim is "the mean indoor temperature at night during winter is equal to 68.3°F." So, the null and alternative hypotheses are `\(H_0: \mu = 68.3\)`°F (Claim), and `\(H_a: \mu \neq 68.3\)`°F. `$$z= \frac{67.2-68.3}{3.5/\sqrt{25}} \approx -1.57$$` It's a two-tailed test. ]]] --- .row[.col-7[ <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ ```r 2*pnorm(-1.57) ``` ``` ## [1] 0.1164151 ``` ```r pnorm(67.2,68.3,3.5/sqrt(25))+ (1-pnorm(2*68.3-67.2,68.3,3.5/sqrt(25))) ``` ``` ## [1] 0.1160831 ``` Because the P-value is greater than `\(\alpha = 0.05\)`, we fail to reject the null hypothesis. ]] --- ## Rejection Regions and Critical Values .row[.col-7[ The range of values for which the null hypothesis is not probable. * If a (standardized) test statistic falls in this region, the null hypothesis is rejected. * A critical value, `\(z_0\)`, separates the rejection region from the nonrejection region. ]] --- ## Finding critital values ### in a standard normal distribution .row[.col-7[ 1. Specify the level of significance `\(\alpha\)` 2. Decide whether the test is left-, right-, or two-tailed 3. Find the critical values `\(z_0\)` If the standardized test statistic * is in the rejection region, then reject `\(H_0\)`. * is not in the rejection region, then fail to reject `\(H_0\)`. ] .col-5[ **left-tailed** find the z-score that corresponds to an area of `\(\alpha\)` **right-tailed** find the z-score that corresponds to an area of `\(1-\alpha\)` **two-tailed** find the z-score that corresponds to `\(\frac{1}{2}\alpha\)` and `\(1-\frac{1}{2}\alpha\)` ] ] --- ## Left-tailed rejection area .row[.col-7[ `\(\alpha=0.05\)`, `\(z_0=-1.645\)` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ ```r qnorm(0.05) ``` ``` ## [1] -1.644854 ``` ] ] --- ## Right-tailed rejection area .row[.col-7[ `\(\alpha=0.01\)`, `\(z_0=2.33\)` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ ```r qnorm(0.99) ``` ``` ## [1] 2.326348 ``` ] ] --- ## Two-tailed rejection area .row[.col-7[ `\(\alpha=0.05\)`, `\(z_0=(-1.96, 1.96)\)` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ ```r cbind(qnorm(0.025),qnorm(1-0.025)) ``` ``` ## [,1] [,2] ## [1,] -1.959964 1.959964 ``` ] ] --- class: middle ## Identical, independent normal dsitributed data with **unknown** `\(\sigma\)` --- ## t-Test for a Mean `\(\mu\)`, with unknown `\(\sigma\)` .row[.col-7[ * i.i.d. random sample * population data is normally distributed, therefore the sample statistic is normally distributed * or: population data is **not** normally distributed, yet the sample statistic is asymptotically normal distributed because the sample size is large enough (rule of thumb: `\(n > 30\)`) * the standard deviation `\(\sigma\)` is unknown, but the variance follows a `\(\chi^2\)` distribution with `\(n-1\)` degrees of freedom. This assumption is met if the observation come from a normal distribution. If the sample size is large, the distribution of the sample variance has little effect on the distribution of the test statistic. ] .col-5[ The **t-test** for a mean `\(\mu\)` is a statistical test for a population mean. The test statistics is the sample mean `\(\bar{x}\)`. The standardized test statistic is `\(t\)` `$$t = \frac{\bar{x}-\mu}{s/\sqrt{n}}$$` ]] --- ## Student's t Distribution .row[ .col-7[ <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ **Mean** `\(E(t) = 0\)` **Var** `\(V(t) = v / (v - 2)\)` for `\(v > 2\)` The Student's t distribution has more probability mass in its tails compared to the standard normal distribution to account for the additional uncertainty of an unknown variance. ]] --- ## Flight Departure Delays in NYC: EDA .row[.col-6[ ```r fli_small %>% group_by(season) %>% summarise( mean_dealy = mean(dep_delay), median_delay = median(dep_delay), sd_delay = sd(dep_delay) ) ``` ``` ## # A tibble: 2 x 4 ## season mean_dealy median_delay sd_delay ## <chr> <dbl> <dbl> <dbl> ## 1 summer 15.4 -1 42.5 ## 2 winter 7.59 -2 29.1 ``` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-16-1.png" width="95%" style="display: block; margin: auto;" /> ] .col-6[ <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-17-1.png" width="95%" style="display: block; margin: auto;" /> <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-18-1.png" width="95%" style="display: block; margin: auto;" /> ]] --- ## R: The univariate t-test .row[.col-9[ ```r fli_small %>% select(dep_delay) %>% t.test(mu = 10) ``` ``` ## ## One Sample t-test ## ## data: . ## t = 0.89995, df = 499, p-value = 0.3686 ## alternative hypothesis: true mean is not equal to 10 ## 95 percent confidence interval: ## 8.258399 14.685601 ## sample estimates: ## mean of x ## 11.472 ``` ] .col-3[ `\(H_0: \mu =10\)` `\(H_a: \mu \neq 10\)` ] ] --- ## Single sided test .row[.tip.col-7[ A business student claims that, on average, an MBA student is required to prepare more than five cases per week. To examine the claim, a statistics professor asks a random sample of 10 MBA students to report the number of cases they prepare weekly. The results are exhibited here. Can the professor conclude at the 5% significance level that the claim is true or false? ```r # The Data cases <- c(2, 7, 4, 8, 9, 5, 11, 3, 7, 4) mean(cases) ``` ``` ## [1] 6 ``` ] .col-5[ ] ] --- .row[.col-6[ `\(H_0: \mu \leq 5\)` `\(H_a: \mu > 5\)` `\(\alpha = 0.05\)` .midis[ ```r t.test(cases, mu = 5, alternative = "greater" ) ``` ``` ## ## One Sample t-test ## ## data: cases ## t = 1.1028, df = 9, p-value = 0.1494 ## alternative hypothesis: true mean is greater than 5 ## 95 percent confidence interval: ## 4.337798 Inf ## sample estimates: ## mean of x ## 6 ``` ]] .col-6[ `\(H_0: \mu \geq 5\)` `\(H_a: \mu < 5\)` `\(\alpha = 0.05\)` .midis[ ```r t.test(cases, mu = 5, alternative = "less" ) ``` ``` ## ## One Sample t-test ## ## data: cases ## t = 1.1028, df = 9, p-value = 0.8506 ## alternative hypothesis: true mean is less than 5 ## 95 percent confidence interval: ## -Inf 7.662202 ## sample estimates: ## mean of x ## 6 ``` ]]] --- ## Bootstrap test for the mean .row[.col-7[ ```r # Generate the null distribution null_distn <- data.frame(cases = cases) %>% specify(response = cases) %>% hypothesize(null = "point", mu = 5) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") # Visualize visualize(null_distn) + shade_p_value( obs_stat = mean(cases), direction = "right" ) # Get the p-value null_distn %>% get_p_value( obs_stat = mean(cases), direction = "right" ) ``` ] .col-5[ `\(H_0: \mu \leq 5\)` `\(H_a: \mu > 5\)` `\(\alpha = 0.05\)` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-23-1.png" width="90%" style="display: block; margin: auto;" /> ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0.14 ``` ]] --- ## Nonparametric Tests .row[.col-7[ A hypothesis test that does not require any specific conditions concerning the shape of the population or the value of any population parameters. * Usually easier to perform than parametric tests. * Nonparametric tests are sometimes called distribution-free tests because they are based on fewer assumptions (e.g., they do not assume that the outcome is approximately normally distributed). * May be less efficient than parametric tests (stronger evidence is required to reject the null hypothesis). ]] --- ## Nonparametric Tests .row[.col-7[ There are some situations when it is clear that the outcome does not follow a normal distribution. These include situations: * when the outcome is an ordinal variable or a rank, * when there are definite outliers or * when the outcome has clear limits of detection. In nonparametric tests, the hypotheses are often not about a single population parameter (e.g., `\(\mu=50\)` or `\(\mu_11=\mu_2\)`). Instead, the null hypothesis may be more general. In a nonparametric test the null hypothesis may be that the two populations are equal, often this is interpreted as the two populations are equal in terms of their central tendency. ]] --- ## Nonparametric Tests ### Sign Test for a Population Median .row[.col-6[ A nonparametric test that can be used to test a population median against a hypothesized value k. **Left-tailed test** `\(H_0: median \geq k\)` and `\(H_a: median < k\)` **Right-tailed test** `\(H_0: median \leq k\)` and `\(H_a: median > k\)` **Two-tailed test** `\(H_0: median = k\)` and `\(H_a: median \neq k\)` ] .col-6[ To use the sign test, each entry is compared with the hypothesized median k. * If the entry is below the median, a `\(-\)` sign is assigned. * If the entry is above the median, a `\(+\)` sign is assigned. * If the entry is equal to the median, 0 is assigned. Compare the number of `\(+\)` and `\(-\)` signs. ]] --- ## Flight Depature Delays in NYC .row[.col-6[ ```r fli_small %>% ggplot() + geom_bar(aes(x = dep_delay)) ``` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-6[ ```r fli_small %>% summarise( mean_delay = mean(dep_delay), median_delay = median(dep_delay), sd_delay = sd(dep_delay) ) ``` ``` ## # A tibble: 1 x 3 ## mean_delay median_delay sd_delay ## <dbl> <dbl> <dbl> ## 1 11.5 -1 36.6 ``` ]] --- ## Nonparametric Tests ### R: The Sign-test .row[.col-8[ ```r fli_small %>% pull(dep_delay) %>% BSDA::SIGN.test(md=0) ``` ``` ## ## One-sample Sign-Test ## ## data: . ## s = 201, p-value = 0.0005028 ## alternative hypothesis: true median is not equal to 0 ## 95 percent confidence interval: ## -2 -1 ## sample estimates: ## median of x ## -1 ## ## Achieved and Interpolated Confidence Intervals: ## ## Conf.Level L.E.pt U.E.pt ## Lower Achieved CI 0.9456 -2 -1 ## Interpolated CI 0.9500 -2 -1 ## Upper Achieved CI 0.9559 -2 -1 ``` ] .col-4[ `\(H_0: \text{median} =0\)` `\(H_a: \text{median} \neq 0\)` ]] --- ## Nonparametric Tests ### R: The Sign-test .row[.col-8[ ```r BSDA::SIGN.test(cases, md = 5, alternative = "less") ``` ``` ## ## One-sample Sign-Test ## ## data: cases ## s = 5, p-value = 0.7461 ## alternative hypothesis: true median is less than 5 ## 95 percent confidence interval: ## -Inf 8.106667 ## sample estimates: ## median of x ## 6 ## ## Achieved and Interpolated Confidence Intervals: ## ## Conf.Level L.E.pt U.E.pt ## Lower Achieved CI 0.9453 -Inf 8.0000 ## Interpolated CI 0.9500 -Inf 8.1067 ## Upper Achieved CI 0.9893 -Inf 9.0000 ``` ] .col-4[ `\(H_0: \text{median} \geq 5\)` `\(H_a: \text{median} < 5\)` ]] --- ## Nonparametric Tests ### R: The Sign-test .row[.col-8[ ```r BSDA::SIGN.test(cases, md = 5, alternative = "greater") ``` ``` ## ## One-sample Sign-Test ## ## data: cases ## s = 5, p-value = 0.5 ## alternative hypothesis: true median is greater than 5 ## 95 percent confidence interval: ## 3.893333 Inf ## sample estimates: ## median of x ## 6 ## ## Achieved and Interpolated Confidence Intervals: ## ## Conf.Level L.E.pt U.E.pt ## Lower Achieved CI 0.9453 4.0000 Inf ## Interpolated CI 0.9500 3.8933 Inf ## Upper Achieved CI 0.9893 3.0000 Inf ``` ] .col-4[ `\(H_0: \text{median} \leq 5\)` `\(H_a: \text{median} > 5\)` ]] --- ## Nonparametric Tests ### Wilcoxon signed rank test .row[.col-7[ The Wilcoxon-signed-rank test is a non-parametric test for the one-sample location problem, e.g., to test the hypothesis that the median of a symmetrical distribution equals a given constant. The test is usually applied to the comparison of locations of two **dependent** samples. The distribution-free test is based on ranks. `\(H_0\)`: the random variable follows a symmetric distribution around zero `\(H_a\)`: the random variable does not follow a symmetric distribution around zero. ] .col-5[ The test statistic `\(W\)` is the sum of signed ranks of differences. For sample sizes above 20 a z-score approximation can be applied, for smaller samples the exact distribution of `\(W\)` needs to be used. ]] --- .row[.col-8[ ## Nonparametric Tests ### R: Wilcoxon signed rank test ] .col-4[ <br/> `\(H_0: \mu = 0\)` `\(H_a: \mu \neq 0\)` ]] .row[.col-8[ ```r fli_small %>% pull(dep_delay) %>% wilcox.test(exact=T, conf.int=T) ``` ``` ## ## Wilcoxon signed rank test with continuity correction ## ## data: . ## V = 62818, p-value = 0.0781 ## alternative hypothesis: true location is not equal to 0 ## 95 percent confidence interval: ## -5.863473e-05 3.500058e+00 ## sample estimates: ## (pseudo)median ## 1.499993 ``` ] .col-4[ The pseudo-median, also called "Hodges–Lehmann" statistic, estimates the location parameter for a univariate population. In may differ from the sample median, the difference between the median and pseudo-median is relatively small, though. ] ] --- .row[.col-8[ ## Nonparametric Tests ### R: Wilcoxon signed rank test ] .col-4[ <br/> `\(H_0: \mu = 0\)` `\(H_a: \mu \neq 0\)` ]] .row[.col-7[ ```r (ES <- fli_small %>% pull(dep_delay) %>% effectsize::rank_biserial() ) ``` ``` ## r (rank biserial) | 95% CI ## --------------------------------- ## 0.09 | [-0.01, 0.18] ``` ```r effectsize::interpret_r(ES$r_rank_biserial) ``` ``` ## [1] "very small" ## (Rules: funder2019) ``` ] .col-5[ #### Effect size The rank-biserial correlation ranges from `\(-1\)` indicating that all values are smaller than the null hypothesis value, to `\(+1\)` indicating that all values are larger. ] ] --- ## Nonparametric Tests ### R: Bootstrap test for the median .row[.col-7[ ```r # Observed stat (x_tilde <- fli_small %>% specify(response = dep_delay) %>% calculate(stat = "median") %>% pull()) # Null distribution null_distn <- fli_small %>% specify(response = dep_delay) %>% hypothesize(null = "point", med = 0) %>% generate(reps = 30000, type = "bootstrap") %>% calculate(stat = "median") # Visualize null_distn %>% ggplot(aes(x=stat)) + geom_histogram(binwidth =0.1) + geom_vline(xintercept = c(x_tilde,-x_tilde), color="red") # p-value null_distn %>% summarise(pvalue = sum(abs(stat) >= abs(x_tilde))/n() ) ``` ] .col-5[ `\(H_0: \text{median} = 0\)` `\(H_a: \text{median} \neq 0\)` ``` ## [1] -1 ``` <img src="08.inference.cont-1_files/figure-html/unnamed-chunk-32-1.png" width="90%" style="display: block; margin: auto;" /> ``` ## # A tibble: 1 x 1 ## pvalue ## <dbl> ## 1 0.452 ``` ]] --- ## A standardized effect size measure: Cohen's d .row[.col-7[ **Original units** are the units of the dependent variable, in which the data are originally expressed. ]] .row[.col-7[ **Cohen’s d** is a standardized effect size measure, which is expressed in standard deviation units. `\begin{align*} d &= \frac{\text{Effect size in original units}}{\text{an appropriate standard deviation}}\\ &=\frac{ES}{SD} \end{align*}` Cohen suggested **0.2**, **0.5**, and **0.8** as values for **small**, **medium**, and **large** effects, but recommended interpretation of d in context whenever possible. ] .col-5[ d is also referred to as the **standardized mean difference**. If a relevant population SD is known, use it as standardizer for d. If not, use an estimate calculated from data. Always state what standardizer was used to calculate `\(d\)`. To calculate a CI on `\(d\)` we need to assume (i) random sampling from (ii) a normal population ] ] --- ## Interpreting Cohen's d: Context matters .row[.col-6[ #### Psychology In a meta-study in social psychology, about 30% of effects had `\(d < 0.2\)`, meaning small or less than small in Cohen’s terms. Only around 17% had `\(d > 0.8\)`, meaning large. The average ES was about d = 0.4. #### Education In the context of school learning, it’s reasonable to regard `\(d = 0.2\)` as small, 0.4 as medium, and 0.6 as large. In a large meta-study, average `\(d\)` was 0.40 for the influence on learning over one school year of numerous variables, including a wide range of teaching innovations. ] .col-6[ #### Medicine Many common treatments and recommendations to change behavior to improve health are prompted by values of `\(d\)` between about 0.05 and 0.2. Taking aspirin routinely can decrease the risk of heart attack, with d = 0.03, and that being fit can reduce mortality in the next 8 years by d = 0.08. These seemingly tiny effects matter: Taking aspirin could avoid hundreds of heart attacks each year in a large city. #### Pharmacology In pharmacology, researchers typically study effects for which `\(d\)` is greater than 5. ]] --- ## R: Cohen's d .row[.col-7[ ```r fli_small %>% summarise(C_d = mean(dep_delay)/sd(dep_delay) ) ``` ``` ## # A tibble: 1 x 1 ## C_d ## <dbl> ## 1 0.314 ``` ] ] .row[.col-7[ ```r (ES <- fli_small %>% pull(dep_delay) %>% effectsize::cohens_d(ci=0.95) ) ``` ``` ## Cohen's d | 95% CI ## ------------------------ ## 0.31 | [0.22, 0.40] ``` ] .col-4[ CI approximated assuming normal distribution ] ] .row[.col-7[ ```r effectsize::interpret_d(ES$Cohens_d) ``` ``` ## [1] "small" ## (Rules: cohen1988) ``` ]]