class: middle, title-slide # Inference and Significance II ## Null Hypothesis Significance Tests in R ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- class: middle # Inference overview --- ## What do you want to do? .col-7[ Estimation -> Confidence interval Decision -> Hypothesis test First step: Ask the following questions - How many variables? - What types of variables? - What is the research question? ] --- ## Confidence intervals .col-7[ - Bootstrap - Bounds: cutoff values for the middle XX% of the distribution - Interpretation: We are XX% confident that the true population parameter is in the interval. - Definition of confidence level: XX% of random samples of size n are expected to produce confidence intervals that contain the true population parameter. - `infer::generate(reps, type = "bootstrap")` ] --- ## Accuracy vs. precision .col-7[ .question[ What happens to the width of the confidence interval as the confidence level increases? Why? Should we always prefer a confidence interval with a higher confidence level? ] ] --- ## Sample size and width of intervals .row[.col-6[ <img src="08.inference-3_files/figure-html/unnamed-chunk-2-1.png" width="85%" style="display: block; margin: auto;" /> ] .col-6[ <img src="08.inference-3_files/figure-html/unnamed-chunk-3-1.png" width="85%" style="display: block; margin: auto;" /> ] ] .row[ .col-6[ <img src="08.inference-3_files/figure-html/unnamed-chunk-4-1.png" width="85%" style="display: block; margin: auto;" /> ]] --- ## Equivalency of confidence and significance levels .row[.col-7[ Two sided alternative HT with `\(\alpha\)` `\(\rightarrow\)` `\(CL = 1 - \alpha\)` One sided alternative HT with `\(\alpha\)` `\(\rightarrow\)` `\(CL = 1 - (2 \times \alpha)\)` <img src="08.inference-3_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ]] --- ## Interpretation of confidence intervals .col-7[ .question[ Which of the following is more informative: + The difference in price of a gallon of milk between Whole Foods and Harris Teeter is 30 cents. + A gallon of milk costs 30 cents more at Whole Foods compared to Harris Teeter. ] .question[ What does your answer tell you about interpretation of confidence intervals for differences between two population parameters? ] ] --- ## Hypothesis testing framework .row[.col-7[ - Start with a null hypothesis, `\(H_0\)`, that typically represents the status quo - Set an alternative hypothesis, `\(H_A\)`, that typically represents the research question, i.e. what we’re testing for - Conduct a hypothesis test under the assumption that the null hypothesis is true and calculate a **p-value** (probability of observed or more extreme outcome given that the null hypothesis is true) - if the test results suggest that the data do not provide convincing evidence for the alternative hypothesis, stick with the null hypothesis - if they do, then reject the null hypothesis in favor of the alternative ]] --- ## Hypothesis testing in `R` .row[.col-6[ - Set the hypotheses. - Calculate the observed sample statistic. - Calculate the p-value. - Make a conclusion, about the hypotheses, in context of the data and the research question. ] .col-6[ In `R` you use: * `infer::hypothesize(null = "point")` and `infer::generate(reps, type = "simulate")` or `infer::generate(reps, type = "bootstrap")` * `infer::hypothesize(null = "independence")` and `infer::generate(reps, type = "permute")` ]] --- class: middle # Examples --- ## What do you want to do? Estimation -> Confidence interval Decision -> Hypothesis test First step: Ask the following questions - How many variables? - What type(s) of variable(s)? - What is the research question? --- ## Data: NC births .col-7[ The dataset is in the `openintro` package. ```r glimpse(ncbirths) ``` ``` ## Rows: 1,000 ## Columns: 13 ## $ fage <int> NA, NA, 19, 21, NA, NA, 18, 17, NA, 20, … ## $ mage <int> 13, 14, 15, 15, 15, 15, 15, 15, 16, 16, … ## $ mature <fct> younger mom, younger mom, younger mom, y… ## $ weeks <int> 39, 42, 37, 41, 39, 38, 37, 35, 38, 37, … ## $ premie <fct> full term, full term, full term, full te… ## $ visits <int> 10, 15, 11, 6, 9, 19, 12, 5, 9, 13, 9, 8… ## $ marital <fct> not married, not married, not married, n… ## $ gained <int> 38, 20, 38, 34, 27, 22, 76, 15, NA, 52, … ## $ weight <dbl> 7.63, 7.88, 6.63, 8.00, 6.38, 5.38, 8.44… ## $ lowbirthweight <fct> not low, not low, not low, not low, not … ## $ gender <fct> male, male, female, male, female, male, … ## $ habit <fct> nonsmoker, nonsmoker, nonsmoker, nonsmok… ## $ whitemom <fct> not white, not white, white, white, not … ``` ] --- ## Length of gestation <img src="08.inference-3_files/figure-html/unnamed-chunk-7-1.png" width="60%" style="display: block; margin: auto;" /> ``` ## # A tibble: 1 x 7 ## min xbar med s q1 q3 max ## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> ## 1 20 38.3 39 2.93 37 40 45 ``` --- ## Length of gestation .row[.col-7[ .question[ Assuming that this sample is representative of all births in NC, we are 95% confident that the average length of gestation for babies in NC is between ---- and ---- weeks. ] **(1) How many variables?** 1 variable: length of gestation, `weeks` **(2) What type(s) of variable(s)?** Numerical **(3) What is the research question?** Estimate the average length of gestation and the corresponding confidence interval ]] --- ## Simulation for CI for a mean .row[.col-7[ **Goal:** Use bootstrapping to estimate the sampling variability of the mean, i.e. the variability of means taken from the same population with the same sample size. 1. Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample. 2. Calculate the mean of the bootstrap sample. 3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap means. 4. Calculate the bounds of the 95% confidence interval as the middle 95% of the bootstrap distribution. ]] --- ## Set a seed first .col-7[ From the documentation of `set.seed`: - `set.seed` uses a single integer argument to set as many seeds as are required. There is no guarantee that different values of seed will seed the RNG differently, although any exceptions would be extremely rare. - Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. ```r set.seed(20180326) ``` ] --- ## Computation for CI for a mean .row[.col-6[ ```r boot_means <- ncbirths %>% filter(!is.na(weeks)) %>% # remove NAs specify(response = weeks) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") boot_means %>% ggplot(aes(x = stat)) + geom_histogram(binwidth = 0.03) ``` ] .col-6[ <img src="08.inference-3_files/figure-html/ci-mean-1.png" width="95%" style="display: block; margin: auto;" /> ]] --- ## Length of gestation .col-7[ ```r boot_means %>% summarise( lower = quantile(stat, 0.025), upper = quantile(stat, 0.975) ) ``` ``` ## # A tibble: 1 x 2 ## lower upper ## <dbl> <dbl> ## 1 38.2 38.5 ``` Assuming that this sample is representative of all births in NC, we are 95% confident that the average length of gestation for babies in NC is between 38.1 and 38.5 weeks. ] --- ## Length of gestation, revisited .row[ .col-6[.question[ The average length of human gestation is 280 days, or 40 weeks, from the first day of the woman's last menstrual period. Do these data provide convincing evidence that average length of gestation for women in NC is different than 40 weeks? Use a significance level of 5%. ] `$$H_0: \mu = 40$$` `$$H_a: \mu \ne 40$$` ] .col-6[ We just said, "we are 95% confident that the average length of gestation for babies in NC is between 38.1 and 38.5 weeks". Since the null value is outside the CI, we would reject the null hypothesis in favor of the alternative. But an alternative, more direct, way of answering this question is using a hypothesis test. ]] --- ## Simulation for HT for a mean .row[.col-7[ **Goal:** Use bootstrapping to generate a sampling distribution under the assumption of the null hypothesis being true. Then, calculate the p-value to make a decision on the hypotheses. ]] .row[.col-7[ 1. Take a bootstrap sample - a random sample taken with replacement from the original sample, of the same size as the original sample. 2. Calculate the mean of the bootstrap sample. 3. Repeat steps (1) and (2) many times to create a bootstrap distribution - a distribution of bootstrap means. 4. Shift the bootstrap distribution to be centered at the null value by subtracting/adding the difference between the center of the bootstrap distribution and the null value to each bootstrap mean. ] .col-5[ <ol start="5"> <li> Calculate the p-value as the proportion of simulations that yield a sample mean at least as extreme as the observed sample mean. </li></ol> ]] --- ## Computation for HT for a mean .row[.col-6[ ```r boot_means_shifted <- ncbirths %>% # remove NAs filter(!is.na(weeks)) %>% specify(response = weeks) %>% # hypothesize step hypothesize(null = "point", mu = 40) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") boot_means_shifted %>% ggplot(aes(x = stat)) + geom_histogram(binwidth = 0.03) + geom_vline(xintercept = 38.33, color = "red") + geom_vline(xintercept = 40 + (40 - 38.33), color = "red") ``` ] .col-6[ <img src="08.inference-3_files/figure-html/ht-mean-1.png" width="95%" style="display: block; margin: auto;" /> ]] --- ## Length of gestation .row[.col-7[ ```r boot_means_shifted %>% filter(stat <= 38.33) %>% summarise(p_value = 2 * (n() / 1000)) ``` ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0 ``` Since p-value less than the significance level, we reject the null hypothesis. The data provide convincing evidence that the average length of gestation of births in NC is different than 40. ]] --- ## **infer** structure ```r df %>% specify(response, explanatory) %>% # explanatory optional generate(reps, type) %>% # type: bootstrap, simulate, or permute calculate(stat) ``` .row[.col-7[ Always start with data frame Result is always a data frame with a variable called `stat` - See the documentation for `calculate` to see which `stat`istics can be calculated For hypothesis testing add a `hypothesize()` step between `specify()` and `generate()` - `null = "point"`, and then specify the null value - `null = "independence"` ]] --- class: middle # More Examples for Hypothesis Tests ## using `infer` --- .row[.col-7[ #### Example: Consider a manufacturer that advertises its new hybrid car has a mean gas mileage of 50 miles per gallon. If you suspect that the mean mileage is less than 50 miles per gallon, how could you show that the advertisement is false? #### Solution Run an experiment and test whether the mean gas mileage is at least 50 miles per gallon. #### Hypotheses `\(H_0: m \geq 50\)` (claim) `\(H_a: m < 50\)` Note, this is a one-sided test. ] ] --- .row[.col-7[ 1. Collect a sample of `\(n\)` new hybrid cars and measure their mileage. 2. Compute the sample mean. 3. Determine the probability of observing the sample mean or something more extreme under the assumption that the null hypothesis is true. - We don't know the distribution of the population! - If we are willing to assume the sampling distribution of the means is approximately normal distributed because the central limit theorem holds, we still need to estimate its standard deviation - If we don't want to make this assumption, we can try to simulate the null distribution from the data that we collected ]] --- .row[.col-6[ Let's assume we are able to get 12 new cars for our experiment. <img src="08.inference-3_files/figure-html/unnamed-chunk-13-1.png" width="80%" style="display: block; margin: auto;" /> ```r df %>% summarise(m_mileage= mean(mileage), sd_mileage = sd(mileage)) ``` ``` ## # A tibble: 1 x 2 ## m_mileage sd_mileage ## <dbl> <dbl> ## 1 48.8 1.67 ``` ] .col-6[ #### Mean Estimates and Bootstrap CI ```r ci_dist <- df %>% specify(response = mileage) %>% generate(reps = 15000, type = "bootstrap") %>% calculate(stat = "mean") percentile_ci <- get_ci(ci_dist) visualize(ci_dist) + shade_confidence_interval(endpoints = percentile_ci) ``` <img src="08.inference-3_files/figure-html/unnamed-chunk-15-1.png" width="80%" style="display: block; margin: auto;" /> ```r percentile_ci ``` ``` ## # A tibble: 1 x 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 47.8 49.6 ``` ]] --- ## Hypothesis Test in R .row[.col-7[ ```r # Compute the mean mileage (estimate) ( x_bar <- df %>% specify(response = mileage) %>% calculate(stat = "mean") ) ``` ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 48.8 ``` ] .col-5[ **Compute the test statistic** * Here we are using the tidymodels approach to compute the mean mileage to guarantee that we are performing the exact same computation in the bootstrap samples ]] --- ## Hypothesis Test in R .row[.col-7[ ```r # Compute the mean mileage (estimate) ( x_bar <- df %>% specify(response = mileage) %>% calculate(stat = "mean") ) # Create Null distribution with mean 50 null_distn <- df %>% specify(response = mileage) %>% hypothesize(null = "point", mu = 50) %>% generate(reps = 15000, type="bootstrap") %>% calculate(stat = "mean") ``` ] .col-5[ Compute the test statistic * Here we are using the tidymodels approach to compute the mean mileage to guarantee that we are performing the exact same computation in the bootstrap samples **Create the Null distribution** * specifying the variable of interest * stating the hypothesis * generating 15000 bootstrap samples * compute the mean for each sample ]] --- ## Hypothesis Test in R .row[.col-7[ ```r # Create Null distribution with mean 50 null_distn <- df %>% specify(response = mileage) %>% hypothesize(null = "point", mu = 50) %>% generate(reps = 15000, type="bootstrap") %>% calculate(stat = "mean") # Visualize Null distribution visualize(null_distn) + shade_p_value(obs_stat = x_bar, direction = "left") ``` <img src="08.inference-3_files/figure-html/unnamed-chunk-18-1.png" width="60%" style="display: block; margin: auto;" /> ] .col-5[ Create the Null distribution * specifying the variable of interest * stating the hypothesis * generating 15000 bootstrap samples * compute the mean for each sample **Visualize Null distribution** * create a histogram of the sample means * mark the observed statistic * shade the area that is at least as far away from the point of the null hypothesis as the observed statistic ]] --- ## Hypothesis Test in R .row[.col-7[ ```r # Visualize Null distribution visualize(null_distn) + shade_p_value(obs_stat = x_bar, direction = "left") ``` <img src="08.inference-3_files/figure-html/unnamed-chunk-20-1.png" width="60%" style="display: block; margin: auto;" /> ```r # Compute p-value null_distn %>% get_p_value(obs_stat = x_bar, direction = "left") ``` ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0.00480 ``` ] .col-5[ Visualize the Null distribution * create a histogram of the sample means * mark the observed statistic * shade the area that is at least as far away from the point of the null hypothesis as the observed statistic **Compute the p-value** * compute the size of the shaded area that represents the probability of obtaining a test statistic at least as extreme as our observed test statistic if the null hypothesis was true ]] --- ## Two independent samples .row[.col-7[ <img src="08.inference-3_files/figure-html/unnamed-chunk-22-1.png" width="95%" style="display: block; margin: auto;" /> ] .col-5[ #### Research Question Do the departure delays at the NYC airports differ between the seasons? #### Hypothesis `\(H_0: d = 0\)` `\(H_a: d \neq 0\)` Note, this is a two-sided test. ]] --- ## Testing whether two independent samples ### have the same mean (come from the same population) .row[.col-7[ #### Effect size estimate and bootstrap CI <img src="08.inference-3_files/figure-html/unnamed-chunk-23-1.png" width="60%" style="display: block; margin: auto;" /> #### Permutation test ```r ( d_hat <- fli_small %>% specify(dep_delay ~ season) %>% calculate(stat = "diff in means", order = c("summer", "winter")) ) ``` ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 7.80 ``` ] .col-5[ **Compute the test statistic** * specify the response variable and the grouping variable * calculate the effect size: differences in means (here: mean delays) * define which sample mean is subtracted from the other ] ] --- ## Testing whether two independent samples ### have the same mean (come from the same population) .row[.col-7[ ```r null_distn <- fli_small %>% specify(dep_delay ~ season) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "diff in means", order = c("summer", "winter")) ``` ] .col-5[ Compute the test statistic **Create the Null distribution** * specify the response and the grouping variable * state the null hypothesis: the response is independent of the other variable * create 1000 permutation samples * calculate the test statistic ] ] --- ## Testing whether two independent samples ### have the same mean (come from the same population) .row[.col-7[ ```r visualize(null_distn) + shade_p_value(obs_stat = d_hat, direction = "two_sided") ``` <img src="08.inference-3_files/figure-html/unnamed-chunk-26-1.png" width="60%" style="display: block; margin: auto;" /> ```r null_distn %>% get_p_value(obs_stat = d_hat, direction = "two_sided") ``` ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0.016 ``` ] .col-5[ Compute the test statistic Create the Null distribution **Visualize the Null distribution** * create a histogram of the permutation samples' effect sizes * shade the area that is at least as far away from the point of the null hypothesis as the observed statistic (now on both sides of the simulated distribution) **Compute the p-value** * compute the size of the shaded area ] ] --- ## Two dependent samples .row[.col-8[ <img src="08.inference-3_files/figure-html/unnamed-chunk-28-1.png" width="95%" style="display: block; margin: auto;" /> ] .col-4[ <img src="08.inference-3_files/figure-html/unnamed-chunk-29-1.png" width="95%" style="display: block; margin: auto;" /> ]] --- ## Two dependent samples .row[.col-8[ #### Research Question Do the two flavors have the same amount of calories? Since the two samples are dependent, we have effectively only one sample of differences with one observation for each brand. ] .col-4[ #### Hypothesis .row[.col-6[ `\(H_0: d = 0\)` ] .col-6[ `\(H_a: d \neq 0\)` ]] This is a two-sided test. ]] .row[.col-7[ ```r ( d_hat <- IceCream %>% mutate(diffC = VanillaCalories - ChocolateCalories ) %>% specify(response = diffC ) %>% * calculate(stat = "median") ) null_distn <- IceCream %>% mutate(diffC = VanillaCalories - ChocolateCalories ) %>% specify(response = diffC ) %>% * hypothesize(null = "point", med=0) %>% generate(reps = 5000, type= "bootstrap") %>% * calculate(stat = "median") visualize(null_distn) + shade_p_value(obs_stat = d_hat, direction = "two_sided") null_distn %>% get_p_value(obs_stat = d_hat, direction = "two_sided") ``` ] .col-5[ ``` ## # A tibble: 1 x 1 ## stat ## <int> ## 1 -9 ``` <img src="08.inference-3_files/figure-html/icecream-1.png" width="85%" style="display: block; margin: auto;" /> ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0 ``` ]]