class: middle, title-slide # Null Hypothesis Significance Tests III ## Continuous Data II ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- class: middle # Multiple samples ## Two dependent Samples --- ## Two Sample Hypothesis Test .row[.col-7[ Compares parameters from two populations. Sampling methods: **Independent Samples** The sample selected from one population is not related to the sample selected from the second population. **Dependent Samples** (paired or matched samples) Each member of one sample corresponds to a member of the other sample. ]] --- ## Matched Pairs .row[.col-8[ A good way to eliminate a source of variation and the errors in interpretation associated with it is through the use of **matched pairs** Example: Sales data for locations with standard menu and new sandwich <table> <tbody> <tr> <td style="text-align:left;"> New </td> <td style="text-align:right;"> 48722 </td> <td style="text-align:right;"> 28965 </td> <td style="text-align:right;"> 36581 </td> <td style="text-align:right;"> 40543 </td> <td style="text-align:right;"> 55423 </td> <td style="text-align:right;"> 38555 </td> <td style="text-align:right;"> 31778 </td> <td style="text-align:right;"> 45643 </td> </tr> <tr> <td style="text-align:left;"> Standard </td> <td style="text-align:right;"> 46555 </td> <td style="text-align:right;"> 28293 </td> <td style="text-align:right;"> 37453 </td> <td style="text-align:right;"> 38324 </td> <td style="text-align:right;"> 54989 </td> <td style="text-align:right;"> 35687 </td> <td style="text-align:right;"> 32000 </td> <td style="text-align:right;"> 43289 </td> </tr> </tbody> </table> <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-3-1.png" width="80%" style="display: block; margin: auto;" /> ] .col-4[ The **paired design** uses a single group of participants, each of whom contributes a pair of data values, one for each of the conditions being compared. A **within-group** independent variable gives a within-group design, meaning that all levels of the variable are seen by a single group of participants. Equivalently, a **repeated measure** gives a repeated measure design; for example such as pretest and posttest. ] ] --- ## Effect size for matched pairs .row[.col-6[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-4-1.png" width="80%" style="display: block; margin: auto;" /> <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-5-1.png" width="80%" style="display: block; margin: auto;" /> The **effect size** for the paired design is the mean of the differences, ** `\(M_{diff=}M_2 - M_1\)`**. The **degrees of freedom** for the paired design is ** `\(df=N-1\)`**, with `\(N\)` beging the number of observational units. ] .col-6[ With the paired design, when the two **measures** (for example Pretest and Posttest) are highly **correlated**, **precision is high**: If differences are fairly consistent over participants, the CI on the mean difference is likely to be short. Any comparison of groups must state the design (paired / within-group, or between group), or it can’t be interpreted. ] ] --- ## Cohen's d for the paired design .row[.col-7[ #### Standardized effect size `\begin{align*} d &= \frac{\text{Effect size in original units}}{\text{an appropriate standard deviiation}}\\ &=\frac{ES}{SD} \end{align*}` ]] .row[.col-7[ #### Standard deviation for paired design `\begin{align*} s_{av}= \sqrt{\frac{s_1^2 +s_2^2}{2}} \end{align*}` ] .col-5[ For the paired design, some recommend to use `\(s_{av}\)` as standardizer for `\(d\)`, not `\(s_{\bar{x}_1 - \bar{x}_2}\)`. ] ] --- .row[.col-8[ ## R: Cohen's d for paired design ] .col-4[ <img src="08.inference.cont-2_files/figure-html/newstd_box-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-8[ ```r data %>% summarise(C_d = mean(New-Standard)/sd(New-Standard) ) ``` ``` ## # A tibble: 1 x 1 ## C_d ## <dbl> ## 1 0.873 ``` ] .col-4[ `\(s_{\bar{x}_1-\bar{x}_2}\)` ]] <br/> .row[.col-8[ ```r data %>% summarise(C_d = mean(New-Standard)/sqrt((var(New)+var(Standard))/2) ) ``` ``` ## # A tibble: 1 x 1 ## C_d ## <dbl> ## 1 0.139 ``` ] .col-4[ `\(s_{av} = \sqrt{(s_1^2+s_2^2)/2}\)` ] ] <br/> .row[.col-8[ ```r data %>% summarise(C_d = mean(New-Standard)/ sqrt((var(New)+var(Standard)- 2*cor(New,Standard)*sd(New)*sd(Standard) )) ) ``` ``` ## # A tibble: 1 x 1 ## C_d ## <dbl> ## 1 0.873 ``` ] .col-4[ `\(s_{rm} = \sqrt{(s_1^2+s_2^2 -2rs_1s_2)}\)` for repeated measures with correlation `\(r\)` ] ] --- .row[.col-8[ ## R: Cohen's d for paired design ] .col-4[ <img src="08.inference.cont-2_files/figure-html/newstd_box-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-9[ ```r effectsize::cohens_d(New, Standard, data=data, paired=T, ci=0.95) ``` ``` ## Cohen's d | 95% CI ## ------------------------ ## 0.87 | [0.03, 1.79] ``` ] .col-3[ #### Data in _wide_ format ]] <br/> .row[.col-9[ ```r effectsize::hedges_g(New, Standard, data=data, paired=T, ci=0.95) ``` ``` ## Hedges' g | 95% CI ## ------------------------ ## 0.78 | [0.03, 1.59] ## ## - Bias corrected using Hedges and Olkin's method. ``` ] .col-3[ small sample correction ] ] --- ### Tests for differences in central tendency for paired data .row[.col-7[ The recommended approach is to use estimation and reporting uncertainty with a confidence interval. ]] .row[.col-7[ Any test for univariate variables can be applied to paired data. * The test statistic is (approximately) normal distributed * z-score test if `\(\sigma\)` is known * t-test if `\(\sigma\)` is unknown * The test statistic is not (approximately) normal distributed * Sign test * Wilcoxon signed rank test * Non-parametric bootstrap ] .col-5[ The null hypothesis is for no difference between the two matched measurements: `\(H_0: d = 0\)` `\(H_a: d \neq 0\)` or any of the one-sided alternatives. ]] --- .row[.col-8[ ### R: t-test for paired data ] .col-4[ <img src="08.inference.cont-2_files/figure-html/newstd_box-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-8[ ```r t.test(New, Standard, data=data, paired=T, alternative="g") ``` ``` ## ## Paired t-test ## ## data: New and Standard ## t = 2.4704, df = 7, p-value = 0.0214 ## alternative hypothesis: true difference in means is greater than 0 ## 95 percent confidence interval: ## 280.3029 Inf ## sample estimates: ## mean of the differences ## 1202.5 ``` ] .col-4[ #### Data in _wide_ format ] ] .row[.col-8[ ```r t.test(New-Standard, data=data, alternative="g") ``` ``` ## ## One Sample t-test ## ## data: New - Standard ## t = 2.4704, df = 7, p-value = 0.0214 ## alternative hypothesis: true mean is greater than 0 ## 95 percent confidence interval: ## 280.3029 Inf ## sample estimates: ## mean of x ## 1202.5 ``` ] .col-4[ #### Data in _wide_ format ]] --- class: middle ## Two independent Samples --- ## Two Sample Hypothesis Test ### with Independent Samples .row[.col-7[ **Null hypothesis `\(H_0\)` ** * A statistical hypothesis that usually states there is no difference between the parameters of two populations. * Always contains the symbol `\(\leq\)`, `\(=\)`, or `\(\geq\)`. **Alternative hypothesis `\(H_a\)` ** * A statistical hypothesis that is true when `\(H_0\)` is false. * Always contains the symbol `\(>\)`, `\(\neq\)`, or$<$. Regardless of which hypotheses you use, you always assume there is no difference between the population means, or `\(\mu_1 = \mu_2\)`. ]] --- ## Two Sample z-Test ### for the Difference Between Means .row[.col-7[ To perform a z-test for the difference between two population means `\(\mu_1\)` and `\(\mu_2\)` when the samples are independent, the following conditions are necessary. 1. The population standard deviations are **known**. 2. The samples are randomly selected. 3. The samples are independent. 4. The populations are normally distributed, or the sample statistics is at least asymptotically normal distributed, i.e. each sample size is at least 30. ]] --- ## Two Sample z-Test ### for the Difference Between Means .row[.col-7[ When these conditions are met, the sampling distribution for `\(x_1 - x_2\)`, the difference of the sample means, is a normal distribution with mean `\(\mu_1-\mu_2\)` and standard error `\(\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}\)`. The standardized test statistic `\(z\)` is `$$z=\frac{(\bar{x_1}-\bar{x_2}) - (\bar{\mu_1}-\bar{\mu_2})}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}$$` ]] --- ## Example .row[.tip.col-7[ A credit card watchdog group claims that there is a difference in the mean credit card debts of households in California and Florida. The results of a random survey of 250 households from each state are shown below. The two samples are independent. Assume that `\(\sigma_1 = 960\)` for California and `\(\sigma_2 = 845\)` for Florida. Do the results support the group's claim? Use `\(\alpha = 0.05\)`. California: `\(\bar{x_1} = 3060\)`, `\(n_1=250\)` Florida: `\(\bar{x_2} = 2910\)`, `\(n_2=250\)` ]] --- ## Example: Solution .row[.tip.col-7[ Since the sample standard deviations are known, the samples are random and independent, and both sample sizes exceed 30, we can use the z-test. The claim is "there is a difference in the mean credit card debts of households in California and Florida." So, the null and alternative hypotheses are: `\(H_0: \mu_1 =\mu_2\)` and `\(H_a: \mu_1 \neq \mu_2\)`. (Claim) ]] --- ## Example: Solution .row[.tip.col-7[ Because the test is a two-tailed test and the level of significance is `\(\alpha = 0.05\)`, the critical values are `\(-z_0=-1.96\)` and `\(z_0=1.96\)` `$$z=\frac{(3060-2910)-0}{\sqrt{\frac{960^2}{250}+\frac{845^2}{250}}}\approx 1.85$$` Because z is not in the rejection region, we fail to reject the null hypothesis. ```r 2*(1-pnorm(3060-2910, 0, sqrt(960^2/250+845^2/250) )) ``` ``` ## [1] 0.0636722 ``` ]] --- ## R: Cohen's d for between-design .row[.col-9[ ```r effectsize::cohens_d(New, Standard, data=data, paired=F, ci=0.95) ``` ``` ## Cohen's d | 95% CI ## ------------------------- ## 0.14 | [-0.84, 1.12] ## ## - Estimated using pooled SD. ``` ] .col-3[ equal variances ] ] .row[.col-9[ ```r effectsize::cohens_d(New, Standard, data=data, paired=F, pooled_sd=F, ci=0.95) ``` ``` ## Cohen's d | 95% CI ## ------------------------- ## 0.14 | [-0.84, 1.12] ## ## - Estimated using un-pooled SD. ``` ] .col-3[ unequal variances ] ] .row[.col-9[ ```r effectsize::hedges_g(New, Standard, data=data, paired=F, pooled_sd=F, ci=0.95) ``` ``` ## Hedges' g | 95% CI ## ------------------------- ## 0.13 | [-0.80, 1.06] ## ## - Estimated using un-pooled SD. ## - Bias corrected using Hedges and Olkin's method. ``` ] .col-3[ small sample correction unequal variances ] ] --- ## Two Sample t-Test ### for the Difference Between Means .row[.col-7[ A two-sample t-test is used to test the difference between two population means `\(\mu_1\)` and `\(\mu_2\)` when the samples are independent and `\(\sigma_1\)` and `\(\sigma_1\)` are unknown. * The population standard deviations are unknown. * The samples are random. * The samples are independent. * The populations are normally distributed, or the sample statistic is at least asymptotically normal distributed, i.e. each sample size is at least 30. ] .col-5[ The standardized test statistic `\(t\)` is `$$t=\frac{(\bar{x}_1-\bar{x}_2) - (\bar{\mu_1}-\bar{\mu_2})}{s_{\bar{x}_1-\bar{x}_2}}$$` ]] --- ## Estimate of population standard deviation ### and standard error .row[.col-6[ **Variances are equal**: `$$\hat{\sigma}= \sqrt{\frac{(n_1-1)s_1^2 +(n_2-1)s_2^2}{n_1+n_2-2}}$$` The standard error for the sampling distribution of `\(\bar{x_1}-\bar{x_2}\)` is `$$s_{\bar{x_1}-\bar{x_2}} = \hat{\sigma}\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}$$` `\(d.f. = n_1+n_2-1\)` ] .col-6[ **Variances are not equal**: `$$s_{\bar{x_1}-\bar{x_2}} = \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$` The (effecitve) pooled degrees of freedom `\(\nu\)` is approximated by `$$\nu\approx\frac{\left(\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}\right)^2}{ \dfrac{s_1^4}{n_1-1} + \dfrac{s_2^4}{n_2-1} }$$` This approximation is better when both `\(n > 5\)`. ]] --- ## R: t-Test for two independent samples .row[.col-7[ ```r t.test(dep_delay~season, data=fli_small, var.equal=T) ``` ``` ## ## Two Sample t-test ## ## data: dep_delay by season ## t = 2.3969, df = 498, p-value = 0.0169 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 1.406958 14.200844 ## sample estimates: ## mean in group summer mean in group winter ## 15.389558 7.585657 ``` ] .col-5[ #### Data in _long_ format **Are the standard deviations in the two samples the same?** <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-18-1.png" width="95%" style="display: block; margin: auto;" /> It is not recommended to pre-test for equal variances and then choose between the two tests. ]] .row[.col-7[ ```r t.test(dep_delay~season, data=fli_small) ``` ``` ## ## Welch Two Sample t-test ## ## data: dep_delay by season ## t = 2.3934, df = 438.36, p-value = 0.01711 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 1.395562 14.212240 ## sample estimates: ## mean in group summer mean in group winter ## 15.389558 7.585657 ``` ] .col-5[ Rather, Welch's t-test for unequal variances can be applied directly and without any substantial disadvantages to Student's t-test for equal variances. ]] --- ## Nonparametric Tests ### R: Permutation test for equal group means .row[.col-7[ ```r # Compute test statistics: difference in means (d_hat <- fli_small %>% specify(dep_delay ~ season) %>% calculate( stat = "diff in means", order = c("summer", "winter") )) # Create null distribution null_distn <- fli_small %>% specify(dep_delay ~ season) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate( stat = "diff in means", order = c("summer", "winter") ) # Visualize Null distribution visualize(null_distn) + shade_p_value(obs_stat = d_hat, direction = "two_sided" ) # Compute p-value null_distn %>% get_p_value(obs_stat = d_hat, direction = "two_sided" ) ``` ] .col-5[ `\(H_0: \mu_1 =\mu_2\)` `\(H_a: \mu_1 \neq \mu_2\)`. ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 7.80 ``` <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-20-1.png" width="90%" style="display: block; margin: auto;" /> ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0.01 ``` ]] --- ## Nonparametric Tests ### Wilcoxon rank sum / Wilcoxon-Mann-Whitney U test .row[.col-7[ * All observations from both groups are independent of each other, * The responses are at least ordinal (i.e., one can at least say, of any two observations, which is the greater), * Under the null hypothesis `\(H_0\)`, the distributions of both populations are equal. * The alternative hypothesis `\(H_1\)` is that the distributions are not equal. The Wilcoxon-Mann-Whitney U test is often interpreted as testing for a difference in medians. ] .col-5[ The test statistic `\(U\)` is `\begin{align*} U &= \sum_{i=1}^n \sum_{j=1}^m S(X_i,Y_i)\\ \text{with}\\ S(X,Y) &= \begin{cases} 1, \text{if } Y < X\\ 1/2, \text{if } Y = X\\ 0, \text{if } Y > X\end{cases} \end{align*}` `\(U\)` is approximately normal distributed. ] ] --- ## Nonparametric Tests ### R: Wilcoxon rank sum / Wilcoxon-Mann-Whitney U test .row[.col-9[ ```r wilcox.test(dep_delay~season, data=fli_small) ``` ``` ## ## Wilcoxon rank sum test with continuity correction ## ## data: dep_delay by season ## W = 33674, p-value = 0.1331 ## alternative hypothesis: true location shift is not equal to 0 ``` <br/> ```r wilcox.test(dep_delay~season, data=fli_small, exact=T, conf.int=T) ``` ``` ## ## Wilcoxon rank sum test with continuity correction ## ## data: dep_delay by season ## W = 33674, p-value = 0.1331 ## alternative hypothesis: true location shift is not equal to 0 ## 95 percent confidence interval: ## -6.987990e-06 2.999975e+00 ## sample estimates: ## difference in location ## 1.000013 ``` ] .col-3[ #### Data in _long_ format ] ] --- ## Nonparametric Tests ### R: Wilcoxon rank sum / Wilcoxon-Mann-Whitney U test .row[.col-7[ #### R base approach for the effect size ```r effectsize::rank_biserial(dep_delay~season, data=fli_small, iterations=1000) ``` ``` ## r (rank biserial) | 95% CI ## --------------------------------- ## 0.08 | [-0.03, 0.18] ``` ] .col-5[ Values for the **rank bi-serial correlation** range from -1 indicating that all values of the second sample are smaller than the first sample, to +1 indicating that all values of the second sample are larger than the first sample. ] ] --- ## Nonparametric Tests ### The Kruskal-Wallis Test .row[.col-6[ A nonparametric test that can be used to determine whether two or more independent samples were selected from populations having the same distribution. The parametric equivalent of the Kruskal--Wallis test is the one-way analysis of variance (**ANOVA**). The null and alternative hypotheses for the Kruskal-Wallis test are always similar to these statements: ** `\(H_0\)` ** All of the populations have the same distribution. ** `\(H_a\)` ** At least one population has a distribution that is different from the others. ] .col-6[ Two conditions for using the Kruskal-Wallis test are: 1. Samples must be random and independent. 2. The size of each sample must be at least 5. If these conditions are met, then the sampling distribution for the Kruskal-Wallis test is approximated by a `\(\chi^2\)` distribution with `\(k – 1\)` degrees of freedom, where `\(k\)` is the number of samples. ]] --- ## Nonparametric Tests ## The Kruskal-Wallis Test .row[.col-6[ The test statistic is given by `$$H=(N-1)\frac{\sum_{i=1}^g n_i(\bar{r_{i.}} -\bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i} n_i(\bar{r_{ij}} -\bar{r})^2}$$` `\(H\)` is approximately `\(\chi^2\)` distributed. If some `\(n_{i}\)` values are small (less than 5) the exact probability distribution of `\(H\)` can be quite different from the `\(\chi^2\)` distribution. ] .col-6[ where ** `\(n_{i}\)` ** is the number of observations in group `\(i\)` ** `\(r_{ij}\)`** is the rank (among all observations) of observation `\(j\)` from group `\(i\)` ** `\(N\)` ** is the total number of observations across all groups ** `\(\bar{r}_{i.} = \frac{\sum_{j=1}^{n_i}{r_{ij}}}{n_i}\)` ** is the average rank of all observations in group `\(i\)` ** `\(\bar{r} = \frac{1}{2} (N+1)\)` ** is the average of all the `\(r_{ij}\)`. ]] --- .row[.col-7[ ## Nonparametric Tests ### R: Kruskal Wallis Test ] .col-5[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-24-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-8[ ```r kruskal.test(dep_delay~season, data=fli_small) ``` ``` ## ## Kruskal-Wallis rank sum test ## ## data: dep_delay by season ## Kruskal-Wallis chi-squared = 2.2563, df = 1, p-value = 0.1331 ``` ] .col-3[ #### Data in _long_ format ] ] <br/> .row[.col-8[ ```r effectsize::rank_epsilon_squared(dep_delay~season, data=fli_small, iterations = 1000) ``` ``` ## Epsilon2 (rank) | 95% CI ## ------------------------------ ## 4.52e-03 | [0.00, 0.02] ``` ] .col-4[ The **rank `\(\epsilon^2\)`** is an effect size for non-parametric tests of differences between 2 or more samples. Values range from 0 to 1, multiplied by 100 indicates the percentage of variance in the dependent variable explained by the independent variable. ] ] --- class: middle # Comparing Variances ### Tests for Homoskedasticity (homogenous variances) and ### Equality of Multiple Means --- ## Comparing Two Variances .row[.col-7[ ### The F distribution Let `\(s_1^2\)` and `\(s_2^2\)` represent the sample variances of two different populations. If both populations are **normal** and the population **variances** `\(\sigma_1^2\)` and `\(\sigma_2^2\)` are **equal**, then the sampling distribution of `$$F = \frac{s_1^2}{s_2^2}$$` is called an F-distribution. ] ] --- ## The F distribution .row[.col-5[ The F-distribution is a family of curves each of which is determined by two types of degrees of freedom: - The degrees of freedom corresponding to the variance in the numerator, denoted `\(d.f._N\)` - The degrees of freedom corresponding to the variance in the denominator, denoted `\(d.f._D\)` The F-distributions is positively skewed and therefore the distribution is not symmetric. ] .col-7[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-26-1.png" width="90%" style="display: block; margin: auto;" /> The total area under each F-distribution curve is equal to 1. All F-values are greater than or equal to 0. For all F-distributions, the mean value of F is approximately equal to 1. ]] --- ## Finding Critical Values for the F-Distribution .row[.col-7[ 1. Specify the level of significance `\(\alpha\)`. 2. Determine the degrees of freedom for the numerator, `\(d.f._N\)`. 3. Determine the degrees of freedom for the denominator, `\(d.f._D\)`. 4. If the hypothesis test is * one-tailed, use the `\(\alpha\)` F-values. * two-tailed, use the `\(1/2 \alpha\)` F-values. 5. Note that because F is always greater than or equal to 1, all one-tailed tests are right-tailed tests. For two-tailed tests, you need only to find the right-tailed critical value. ]] --- ## Finding Critical Values for the F-Distribution .row[.col-4[ Find the critical F-value for a right-tailed test when `\(\alpha = 0.10\)`, `\(d.f._N = 5\)` and `\(d.f._D = 28\)`. ] .col-8[ ```r qf(p=0.10, df1=5, df2=28, lower.tail = F) ``` ``` ## [1] 2.064473 ``` <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## Two-Sample F-Test for Variances .row[.col-7[ A two-sample F-test is used to compare two population variances `\(\sigma_1^2\)` and `\(\sigma_2^2\)`. To perform this test, these conditions must be met: 1. The samples must be random. 2. The samples must be independent. 3. Each population must have a normal distribution. ]] .row[.col-6[ The test statistic is `$$F = \frac{s_1^2}{s_2^2}$$` where `\(s_1^2\)` and `\(s_2^2\)` represent the sample variances with `\(s_1^2 \geq s_2^2\)`. ] .col-6[ The numerator has `\(d.f._N = n_1 - 1\)` degrees of freedom and the denominator has `\(d.f._D = n_2 - 1\)` degrees of freedom, where `\(n_1\)` is the size of the sample having variance `\(s^2_1\)` and `\(n_2\)` is the size of the sample having variance `\(s^2_2\)`. ]] --- ## Performing a Two-Sample F-Test .tip.row[.col-6[ A restaurant manager is designing a system that is intended to decrease the variance of the time customers wait before their meals are served. Under the old system, a random sample of 10 customers had a variance of 400. Under the new system, a random sample of 21 customers had a variance of 256. At `\(\alpha = 0.10\)`, is there enough evidence to convince the manager to switch to the new system? Assume both populations are normally distributed. ] .col-6[ Because `\(400 > 256\)`, `\(s_1^2 = 400\)` and `\(s_2^2 = 256\)`. Therefore, `\(s_1^2\)` and `\(\sigma_1^2\)` represent the sample and population variances for the old system, respectively. With the claim "the variance of the waiting times under the new system is less than the variance of the waiting times under the old system," the null and alternative hypotheses are `\(H_0: \sigma_1^2 \leq \sigma_2^2\)` and `\(H_a: \sigma_1^2 > \sigma_2^2\)`. (Claim) The degrees of freedom are `\(df_N = n_1-1 = 10-1=9\)` and `\(df_D = n_2-1 = 21-1=20\)`. ]] --- ## Performing a Two-Sample F-Test .tip.row[.col-6[ The critical value is ```r qf(0.1,9,20, lower.tail = F) ``` ``` ## [1] 1.964853 ``` The F-statistic is `\(F=\frac{400}{256}\approx 1.56\)` The F-statistic is not in the rejection region, we cannot reject the null hypothesis. ] .col-6[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" /> The p-value is ```r pf(400/256,9,20,lower.tail = F) ``` ``` ## [1] 0.1939035 ``` ]] --- .row[.col-7[ ## R: The Two-Sample F-Test ### for homogeneity of Variances ] .col-5[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-32-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-8[ ```r var.test(New, Standard, data=data) ``` ``` ## ## F test to compare two variances ## ## data: New and Standard ## F = 1.076, num df = 7, denom df = 7, p-value = 0.9255 ## alternative hypothesis: true ratio of variances is not equal to 1 ## 95 percent confidence interval: ## 0.2154207 5.3745561 ## sample estimates: ## ratio of variances ## 1.076007 ``` ] ] --- .row[ .col-7[ ## R: The Two-Sample F-Test ### for homogeneity of Variances ] .col-5[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-34-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-8[ ```r var.test(dep_delay~season, data=fli_small) ``` ``` ## ## F test to compare two variances ## ## data: dep_delay by season ## F = 2.1316, num df = 248, denom df = 250, p-value = 3.638e-09 ## alternative hypothesis: true ratio of variances is not equal to 1 ## 95 percent confidence interval: ## 1.661910 2.734481 ## sample estimates: ## ratio of variances ## 2.13164 ``` ] .col-4[ #### The data is in _long_ format ]] --- ## One-way analysis of variance .row[.col-7[ ...is a hypothesis-testing technique that is used to compare the means of three or more populations. Analysis of variance is usually abbreviated as ANOVA. For a one-way ANOVA test, the null and alternative hypotheses are always similar to these statements. `\(H_0: \mu_1 = \mu_2 = \mu_3 = \ldots = \mu_k\)` (all population means are equal) `\(H_a:\)` At least one of the means is different from the others. ] .col-5[ Before performing a one-way ANOVA F-test, you must check that these conditions are satisfied. - Each sample must be randomly selected from a normal, or approximately normal, population. - The samples must be independent of each other. - Each population must have the same variance. ]] --- ## One-way analysis of variance: F-test .row[.col-7[ The test statistic for a one-way ANOVA F-test is the ratio of two variances: the variance between samples and the variance within samples. `$$\text{Test statistic} = \frac{\text{Variance between samples}}{\text{Variance within samples}}$$` The variance between samples measures the differences related to the treatment given to each sample. This variance, sometimes called the mean square between. The variance within samples measures the differences related to entries within the same sample and is usually due to sampling error. This variance, sometimes called the mean square within. ]] --- ## One-way analysis of variance: F-test .row[.col-7[ If the required conditions are met, the sampling distribution for the test is approximated by the F-distribution. The test statistic is `$$F = \frac{\text{MS}_B}{\text{MS}_W}$$` with `$$SS_B = \sum_{ n_i} ( x_i - \bar{x} )^2$$` $$ SS_W = \sum ( n_i - 1 )s_i^2 $$ ] .col-5[ `$$MS_B = \frac{SS_B}{df_N}$$` `$$MS_W = \frac{SS_W}{df_N}$$` The degrees of freedom are `\(d.f._N = k - 1\)` Degrees of freedom for numerator `\(d.f._D = N - k\)` Degrees of freedom for denominator where `\(k\)` is the number of samples and `\(N\)` is the sum of the sample sizes. ]] --- ## R: Comparing multiple groups .row[.col-6[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-36-1.png" width="85%" style="display: block; margin: auto;" /> ] .col-6[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-37-1.png" width="85%" style="display: block; margin: auto;" /> ] ] .row[ **Variable type: numeric** <table> <thead> <tr> <th style="text-align:left;"> skim_variable </th> <th style="text-align:left;"> origin </th> <th style="text-align:right;"> n_missing </th> <th style="text-align:right;"> complete_rate </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sd </th> <th style="text-align:right;"> p0 </th> <th style="text-align:right;"> p25 </th> <th style="text-align:right;"> p50 </th> <th style="text-align:right;"> p75 </th> <th style="text-align:right;"> p100 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> dep_delay </td> <td style="text-align:left;"> EWR </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 16.68 </td> <td style="text-align:right;"> 39.95 </td> <td style="text-align:right;"> -14 </td> <td style="text-align:right;"> -3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 20.0 </td> <td style="text-align:right;"> 222 </td> </tr> <tr> <td style="text-align:left;"> dep_delay </td> <td style="text-align:left;"> JFK </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 11.03 </td> <td style="text-align:right;"> 35.38 </td> <td style="text-align:right;"> -12 </td> <td style="text-align:right;"> -4 </td> <td style="text-align:right;"> -2 </td> <td style="text-align:right;"> 7.5 </td> <td style="text-align:right;"> 245 </td> </tr> <tr> <td style="text-align:left;"> dep_delay </td> <td style="text-align:left;"> LGA </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 5.64 </td> <td style="text-align:right;"> 32.65 </td> <td style="text-align:right;"> -15 </td> <td style="text-align:right;"> -7 </td> <td style="text-align:right;"> -4 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 264 </td> </tr> </tbody> </table> ] --- ## R: Comparing multiple groups - Oneway ANOVA .row[.col-8[ ```r aov(dep_delay~origin, data=fli_small) %>% summary() ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## origin 2 10254 5127 3.877 0.0213 * ## Residuals 497 657242 1322 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r oneway.test(dep_delay~origin, data=fli_small, var.equal=T) ``` ``` ## ## One-way analysis of means ## ## data: dep_delay and origin ## F = 3.8771, num df = 2, denom df = 497, p-value = 0.02134 ``` ] .col-4[ #### Data is in _long_ format ] ] <br/> .row[.col-8[ ```r oneway.test(dep_delay~origin, data=fli_small, var.equal=F) ``` ``` ## ## One-way analysis of means (not assuming equal variances) ## ## data: dep_delay and origin ## F = 3.909, num df = 2.00, denom df = 330.81, p-value = 0.02099 ``` ] .col-4[ **Allowing for unequal variances** similar to Welch's t-test for means of groups with unequal variances ] ] --- ## R: Comparing multiple groups - Oneway ANOVA .row[.col-6[ #### Effect Size ```r aov(dep_delay~origin, data=fli_small) %>% effectsize::eta_squared(ci=0.90) ``` ``` ## Parameter | Eta2 | 90% CI ## ------------------------------- ## origin | 0.02 | [0.00, 0.04] ``` <br/> ```r aov(dep_delay~origin, data=fli_small) %>% effectsize::omega_squared(ci=0.95) ``` ``` ## Parameter | Omega2 | 95% CI ## --------------------------------- ## origin | 0.01 | [0.00, 0.03] ``` <br/> ```r aov(dep_delay~origin, data=fli_small) %>% effectsize::epsilon_squared(ci=0.95) ``` ``` ## Parameter | Epsilon2 | 95% CI ## ----------------------------------- ## origin | 0.01 | [0.00, 0.03] ``` ] .col-6[ These indices represent an estimate of how much variance in the response variables is accounted for by the explanatory variable(s). `$$\eta^2 = \frac{SS_{\text{between}}}{SS_{\text{between}}+SS_{\text{within}}}$$` `$$\eta^2_{\text{effect}} = \frac{SS_{\text{effect}}}{SS_{\text{effect}}+SS_{\text{residual}}}$$` When you only have one independent variable, the partial `\(\eta^2\)` is the same as `\(\eta^2\)` Both `\(\Omega\)` and `\(\epsilon\)` are unbiased estimators of the population’s `\(\eta\)`, which is especially important is small samples. Omega seems to be the more popular choice among practitioners. ] ] --- ## Oneway Anova: Are the assumptions valid? .row[.col-7[ 1. Each population must have a normal distribution. * Testing all samples simultaneously: The difference between each observation and the respective group mean must have a normal distribution. ]] .row[.col-6[ #### Quantile-Quantile Plot Get the difference between each observation and the respective group mean (**residuals**) ```r fli_resid <- aov(dep_delay~origin, data=fli_small) %>% resid() ``` Plot the Quantile-Quantile Plot with the normal as a reference distribution ```r ggplot(tibble(fli_resid), aes(sample=fli_resid)) + stat_qq() + stat_qq_line() ``` ] .col-6[ <img src="08.inference.cont-2_files/figure-html/qqplot1-1.png" width="90%" style="display: block; margin: auto;" /> ] ] --- ## Oneway Anova: Are the assumptions valid? .row[.col-7[ Each population must have a normal distribution. * Testing all samples simultaneously: The difference between each observation and the respective group mean must have a normal distribution. ]] .row[.col-7[ #### Shapiro-Wilk test for normality Get the difference between each observation and the respective group mean (**residuals**) and run the Shapiro-Wilk normality test ```r aov(dep_delay~origin, data=fli_small) %>% resid() %>% shapiro.test() ``` ``` ## ## Shapiro-Wilk normality test ## ## data: . ## W = 0.56964, p-value < 2.2e-16 ``` ] .col-5[ Note, the normality test is sensitive to sample size. Small samples most often pass normality tests. In large samples even small deviation from normality will lead to a rejection of the null hypothesis. Therefore, it’s important to combine visual inspection and significance test in order to take the right decision. ]] --- ## Oneway Anova: Are the assumptions valid? .row[.col-7[ #### Normality check for each group ```r fli_small %>% group_by(origin) %>% do(tidy(shapiro.test(.$dep_delay))) ``` ``` ## # A tibble: 3 x 4 ## # Groups: origin [3] ## origin statistic p.value method ## <chr> <dbl> <dbl> <chr> ## 1 EWR 0.618 4.09e-20 Shapiro-Wilk normality test ## 2 JFK 0.535 1.27e-20 Shapiro-Wilk normality test ## 3 LGA 0.462 2.11e-21 Shapiro-Wilk normality test ``` ] .col-5[ Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than your chosen level of confidence `\(\alpha\)`. ] ] --- ## Oneway Anova: Are the assumptions valid? ### More robust test for homogeneity of variance .row[.col-7[ In the standard ANOVA F-test, each population must have the same variance. The **F-test** is known to be extremely **sensitive** to **non-normality**, so an alternative more robust test may be advisable. ]] .row[.col-6[ **Levene's test** is equivalent to a one-way analysis of variance (ANOVA) with the dependent variable being the absolute value of the difference between a score and the mean of the group to which the score belongs. ] .col-6[ The **Fligner-Killeen test** is a non-parametric alternative and a better option when data are non-normally distributed or when problems related to outliers in the dataset cannot be resolved. The test starts out like a median-centering version of the Levene’s test by calculating the absolute values of the residuals from the group medians. Next all these residuals are ranked and then normalized. ]] --- ## Levene's test for homogeneity of variance .row[.col-6[ The test statistic `\(W\)` is is approximately F-distributed with `\(k-1\)` and `\(N-k\)` degrees of freedom: `$$W={\dfrac {(N-k)}{(k-1)}}\cdot {\dfrac {\sum _{i=1}^{k}N_{i}(Z_{i\cdot }-Z_{\cdot \cdot })^{2}}{\sum _{i=1}^{k}\sum _{j=1}^{N_{i}}(Z_{ij}-Z_{i\cdot })^{2}}}$$` with ** `\(k\)` ** is the number of different groups to which the sampled cases belong, ** `\(N_{i}\)` ** is the number of cases in the `\(i\)`th group, ** `\(N\)` ** is the total number of cases in all groups, ** `\(Y_{{ij}}\)` ** is the value of the measured variable for the `\(j\)`th case from the `\(i\)`th group, ] .col-6[ ** `\(Z_{ij}=|Y_{ij}-{\bar {Y}}_{i\cdot }|\)`, `\({\bar {Y}}_{i\cdot }\)` ** is a mean of the `\(i\)`th group, ** `\(Z_{{i\cdot }}={\frac {1}{N_{i}}}\sum _{{j=1}}^{{N_{i}}}Z_{{ij}}\)` ** is the mean of the `\(Z_{ij}\)` for group `\(i\)`, ** `\(Z_{\cdot \cdot }={\frac {1}{N}}\sum _{i=1}^{k}\sum _{j=1}^{N_{i}}Z_{ij}\)` ** is the mean of all `\(Z_{ij}\)`. ]] --- .row[ .col-7[ ## R: Levene's test for homogeneity of variance ] .col-5[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-48-1.png" width="95%" style="display: block; margin: auto;" /> ]] .row[.col-9[ ```r car::leveneTest(dep_delay ~ origin, data=fli_small, center="mean") ``` ``` ## Levene's Test for Homogeneity of Variance (center = "mean") ## Df F value Pr(>F) ## group 2 4.1483 0.01634 * ## 497 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] .col-3[ #### The original Levene's test centered around the mean ]] <br/> .row[.col-9[ ```r car::leveneTest(dep_delay ~ origin, data=fli_small, center="median") ``` ``` ## Levene's Test for Homogeneity of Variance (center = "median") ## Df F value Pr(>F) ## group 2 2.091 0.1246 ## 497 ``` ] .col-3[ #### A more robust version centered around the median ]] --- ## The Fligner-Killeen test ### for homogeneity of variance .row[.col-6[ ...is a non-parametric alternative and a better option when data are non-normally distributed or when problems related to outliers in the dataset cannot be resolved. The test starts out like a median-centering version of the Levene’s test by calculating the absolute values of the residuals from the group medians. Next all these residuals are ranked and then normalized. ] .col-6[ The Fligner Killeen statistic is `$$FK=\dfrac{\sum_{j=1}^k n_j (\bar{a}_j -\bar{a})^2}{s^2}$$` where `\(k\)` is the number of groups, `\(n_j\)` the size of the `\(j\)`th group, `\(\bar{a}_j\)` is the mean of the normalization values for the `\(j\)`th group, `\(\bar{a}\)` is the mean of all the normalization values and `\(s^2\)` is the variance of all the normalization values. The FK statistic is `\(\chi^2\)`-distributed with `\(k-1\)` degrees of freedom. ]] --- ## R: Fligner-Killeen test ### for homogeneity of variance .row[.col-8[ ```r fligner.test(dep_delay ~ origin, data=fli_small) ``` ``` ## ## Fligner-Killeen test of homogeneity of variances ## ## data: dep_delay by origin ## Fligner-Killeen:med chi-squared = 12.244, df = 2, p-value = 0.002194 ``` ] .col-4[ #### Data is in _long_ format ]] <br/> #### How big are the differences in variance? .row[.col-6[ ```r fli_small %>% group_by(origin) %>% summarize(var = var(dep_delay)) ``` ``` ## # A tibble: 3 x 2 ## origin var ## <chr> <dbl> ## 1 EWR 1596. ## 2 JFK 1251. ## 3 LGA 1066. ``` ] .col-6[ <img src="08.inference.cont-2_files/figure-html/unnamed-chunk-53-1.png" width="90%" style="display: block; margin: auto;" /> ] ] --- ## R: Kruskal-Wallis test for more than two groups .row[.col-8[ ```r kruskal.test(dep_delay ~ origin, data=fli_small) ``` ``` ## ## Kruskal-Wallis rank sum test ## ## data: dep_delay by origin ## Kruskal-Wallis chi-squared = 26.448, df = 2, p-value = 1.807e-06 ``` ]] <br/> .row[.col-8[ ```r effectsize::rank_epsilon_squared(dep_delay~origin, data=fli_small, iterations = 1000) ``` ``` ## Epsilon2 (rank) | 95% CI ## ------------------------------ ## 0.05 | [0.02, 0.10] ``` ] .col-4[ The **rank `\(\epsilon^2\)`** is an effect size for non-parametric tests of differences between 2 or more samples (a rank based ANOVA). Values range from 0 to 1, with larger values indicating larger differences between groups. ]] --- ## R: Permutation test for multiple means .row[.col-7[ ```r (F_stat <- fli_small %>% specify(dep_delay~origin) %>% calculate(stat = "F") ) # Generate the null distribution null_distn <- fli_small %>% specify(dep_delay~origin) %>% hypothesize(null = "independence") %>% generate(reps = 5000, type = "permute") %>% calculate(stat = "F") # Visualize visualize(null_distn,method="both") + shade_p_value( obs_stat = F_stat, direction = "right" ) # Get the p-value null_distn %>% get_p_value( obs_stat = F_stat, direction = "right" ) ``` ] .col-5[ ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 3.88 ``` <img src="08.inference.cont-2_files/figure-html/anova-test-1.png" width="95%" style="display: block; margin: auto;" /> ``` ## # A tibble: 1 x 1 ## p_value ## <dbl> ## 1 0.019 ``` ]]