class: middle, title-slide # Inference for Categorical Data ## Estimation, Effect Size, and Margin of Error ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- ## Estimation: The confidence interval .row[.col-7[ A **confidence interval**, or interval estimate, provides a range of values that, with a certain level of confidence, contains the population parameter of interest. A **confidence interval** is associated with a **margin of error** that accounts for the **standard error** of the estimator and the desired confidence level of the interval. If we repeated an experiment and generated numerous samples, the fraction of calculated **confidence intervals** (which would differ for each sample) that encompass the true population parameter would tend toward the confidence level (often 95%). ] .col-5[ The **confidence interval** represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the `\(\alpha\)` level. ]] --- ### Estimate and Confidence interval for a proportion .row[.col-6[ **Frequencies** are counts of the number of data points in an interval or category. A **proportion** `\(P\)` is simply a chosen frequency X divided by N, the total number of cases. ] .col-6[ Frequencies are needed for **nominal** scaling, and can be used with **ordinal** or **interval** scaling. ]] .row[.col-6[ To calculate a confidence interval on `\(P = X/N\)`, assume the `\(N\)` events are independent and each has the same probability of being a hit. ] .col-6[ #### Normal approximation `\begin{align*} SE = \sqrt{pq/n} \end{align*}` the `\(100(1 -\alpha)\%\)` confidence interval for the proportion in the population is calculated as `\begin{align*} p-[z_{1-\alpha/2}\times SE(p)]; p+[z_{1-\alpha/2}\times SE(p) ] \end{align*}` The number of successes and failures must be at least 5 each to yield a reasonably good approximation. ] ] --- ## R: Confidence interval for a proportion .row[.col-7[ For `\(n=10\)` and `\(X = 3\)`, we have `\(p=3/10 = 0.3\)` and `\(q= (10-3)/10 = 0.7\)`. ]] .row[.col-7[ The normal approximation (also called the **Wald** interval) for the confidence interval for the proportion is: ] .col-5[ `\begin{align*} SE &= \sqrt{\frac{0.3\times 0.7}{10}}\\ \end{align*}` ]] .row[.col-7[ ```r 3/10 - qnorm(0.975) * sqrt((3/10 * 7/10 )/ 10) ; ``` ``` ## [1] 0.01597423 ``` ```r 3/10 + qnorm(0.975) * sqrt((3/10 * 7/10 )/ 10) ``` ``` ## [1] 0.5840258 ``` ] .col-5[ The _exact_ binomial confidence interval for the **counts** is ```r qbinom(0.025, 10, 0.3); qbinom(0.975, 10, 0.3) ``` ``` ## [1] 0 ``` ``` ## [1] 6 ``` ]] --- ## R: Many different approximations ### to the confidence interval for one proportion .row[.col-8[ ```r DescTools::BinomCI(3, 10, method = c("wilson", "wald", "waldcc", "agresti-coull", "jeffreys", "modified wilson", "wilsoncc", "modified jeffreys", "clopper-pearson", "arcsine", "logit", "witting", "pratt", "midp", "lik", "blaker" )) ``` ``` ## est lwr.ci upr.ci ## wilson 0.3000000 0.10779127 0.6032219 ## wald 0.3000000 0.01597423 0.5840258 ## waldcc 0.3000000 0.00000000 0.6340258 ## agresti-coull 0.3555066 0.10333842 0.6076747 ## jeffreys 0.3000000 0.09269459 0.6058183 ## modified wilson 0.3000000 0.10779127 0.6032219 ## wilsoncc 0.3000000 0.08094782 0.6463293 ## modified jeffreys 0.3000000 0.09269459 0.6058183 ## clopper-pearson 0.3000000 0.06673951 0.6524529 ## arcsine 0.3139535 0.07897893 0.6181383 ## logit 0.3000000 0.09976832 0.6236819 ## witting 0.3000000 0.09725005 0.5481635 ## pratt 0.3000000 0.07743751 0.6528046 ## midp 0.3000000 0.08262612 0.6199217 ## lik 0.3000000 0.08458545 0.6065389 ## blaker 0.3000000 0.08725951 0.6194129 ``` ] .col-4[ #### Which interval should we use? The Wald interval often has inadequate coverage, particularly for small n and values of p close to 0 or 1. Brown et al. (2001) recommends the **Wilson** or Jeffreys methods for small n and Agresti-Coull, **Wilson**, or Jeffreys, for larger n as providing more reliable coverage than the alternatives. ] ] --- ## R: One categorical variable: one proportion .row[.col-6[ ```r fli_small %>% tabyl(day_hour) %>% adorn_totals(c("row")) %>% kbl() ``` <table> <thead> <tr> <th style="text-align:left;"> day_hour </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> morning </td> <td style="text-align:right;"> 214 </td> <td style="text-align:right;"> 0.428 </td> </tr> <tr> <td style="text-align:left;"> not morning </td> <td style="text-align:right;"> 286 </td> <td style="text-align:right;"> 0.572 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 500 </td> <td style="text-align:right;"> 1.000 </td> </tr> </tbody> </table> ] .col-6[ ```r fli_small %>% ggplot() + geom_bar(aes(x=day_hour)) ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ### R: Obtain the Confidence Interval with Bootstrapping: .row[.col-6[ generating numerous samples from the existing one ```r boot <- fli_small %>% specify(response = day_hour, success = "morning") %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "prop") ( percentile_ci <- get_ci(boot) ) ``` ``` ## # A tibble: 1 x 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 0.386 0.47 ``` ] .col-6[ ```r visualize(boot) + shade_confidence_interval( endpoints = percentile_ci) ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## R: Obtain the Confidence Interval ### with direct Computation: .row[.col-6[ Since we know the data generating process is a binomial experiment we can compute theoretical confidence intervals. ] .col-6[ The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the `\(\alpha\)` level. ]] .row[.col-6[ #### lower bound for the number of successes ```r qbinom(p=0.025,size=500,prob=214/500) ``` ``` ## [1] 192 ``` #### upper bound for the number of successes ```r qbinom(0.975,500,214/500) ``` ``` ## [1] 236 ``` ] .col-6[ #### lower bound for the prop. of success ```r qbinom(0.025,500,214/500)/500 ``` ``` ## [1] 0.384 ``` #### upper bound for the prop. of success ```r qbinom(0.975,500,214/500)/500 ``` ``` ## [1] 0.472 ``` ```r DescTools::BinomCI(214, 500, method="wilson") ``` ``` ## est lwr.ci upr.ci ## [1,] 0.428 0.3853418 0.4717561 ``` ]] --- ## Two categorical variables with 2 levels .row[.col-7[ Using proportions can often be a good strategy for examining the relation between two classification variables. .pink[The contingency table] ```r fli_small %>% tabyl(day_hour,season) %>% adorn_totals(c("row", "col")) %>% adorn_percentages("all") %>% adorn_pct_formatting(rounding = "half up", digits = 0) %>% adorn_title("combined") %>% kbl() ``` <table> <thead> <tr> <th style="text-align:left;"> day_hour/season </th> <th style="text-align:left;"> summer </th> <th style="text-align:left;"> winter </th> <th style="text-align:left;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> morning </td> <td style="text-align:left;"> 22% </td> <td style="text-align:left;"> 21% </td> <td style="text-align:left;"> 43% </td> </tr> <tr> <td style="text-align:left;"> not morning </td> <td style="text-align:left;"> 28% </td> <td style="text-align:left;"> 29% </td> <td style="text-align:left;"> 57% </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:left;"> 49% </td> <td style="text-align:left;"> 51% </td> <td style="text-align:left;"> 100% </td> </tr> </tbody> </table> ] .col-5[ <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> <br/> <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ### Difference between two independent proportions .row[.col-7[ The **difference** between two population proportions is estimated by `\begin{align*} D= p_1 - p_2 \end{align*}` The **standard error** of `\(p_1 - p_2\)` is `\begin{align*} SE(D)= \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} } \end{align*}` ]] .row[.col-7[ The normal approximation for the **confidence interval** for the population difference in proportions is then given by `\begin{align*} [D-z_{1-\alpha/2}\times SE(D) ; D+z_{1-\alpha/2}\times SE(D)] \end{align*}` ] .col-5[This normal approximation may yield reasonably good results if each sample contains at least 30 observations. ] ] --- ### R: Difference between two independent proportions .row[.col-7[ What is the difference in the proportion of flights in the morning between the seasons? ```r (D <- 108/247 -106/253 ) ``` ``` ## [1] 0.01827463 ``` ```r (SE <- sqrt( (108/247)*(139/247)/247 + (106/253)*(147/253)/253 )) ``` ``` ## [1] 0.04425375 ``` ] .col-5[ <table> <thead> <tr> <th style="text-align:left;"> day_hour/season </th> <th style="text-align:right;"> summer </th> <th style="text-align:right;"> winter </th> <th style="text-align:right;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> morning </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 106 </td> <td style="text-align:right;"> 214 </td> </tr> <tr> <td style="text-align:left;"> not morning </td> <td style="text-align:right;"> 139 </td> <td style="text-align:right;"> 147 </td> <td style="text-align:right;"> 286 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 247 </td> <td style="text-align:right;"> 253 </td> <td style="text-align:right;"> 500 </td> </tr> </tbody> </table> ]] .row[.col-7[ ```r cbind(D - qnorm(.975)* SE, D + qnorm(.975)* SE) ``` ``` ## [,1] [,2] ## [1,] -0.06846113 0.1050104 ``` ] ] --- ### R: Difference between two independent proportions ### ...the many different CI approximations .row[.col-7[ What is the difference in the proportion of flights in the morning between the seasons? ```r DescTools::BinomDiffCI(108, 247, 106, 253, method = c( "ac", "wald", "waldcc", "score", "scorecc", "mn", "mee", "blj", "ha", "hal", "jp" )) ``` ``` ## est lwr.ci upr.ci ## ac 0.01827463 -0.06826217 0.1045485 ## wald 0.01827463 -0.06846113 0.1050104 ## waldcc 0.01827463 -0.07246170 0.1090110 ## score 0.01827463 -0.06799395 0.1042148 ## scorecc 0.01827463 -0.07078487 0.1069957 ## mn 0.01827463 -0.06835116 0.1046839 ## mee 0.01827463 -0.06826416 0.1045954 ## blj 0.01827463 -0.06853445 0.1049518 ## ha 0.01827463 -0.07065947 0.1072087 ## hal 0.01827463 -0.06825608 0.1045530 ## jp 0.01827463 -0.06826337 0.1045601 ``` ] .col-5[ <table> <thead> <tr> <th style="text-align:left;"> day_hour/season </th> <th style="text-align:right;"> summer </th> <th style="text-align:right;"> winter </th> <th style="text-align:right;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> morning </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 106 </td> <td style="text-align:right;"> 214 </td> </tr> <tr> <td style="text-align:left;"> not morning </td> <td style="text-align:right;"> 139 </td> <td style="text-align:right;"> 147 </td> <td style="text-align:right;"> 286 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 247 </td> <td style="text-align:right;"> 253 </td> <td style="text-align:right;"> 500 </td> </tr> </tbody> </table> <br/> The general consensus is that the most widely taught method method="wald" is inappropriate in many situations and should not be used. Recommendations seem to converge around the Miettinen-Nurminen based methods (method="mn") ] ] --- ## R: Difference in proportions - Bootstrap CI .row[.col-6[ ```r fli_small %>% specify(day_hour ~ season, success = "morning") %>% calculate(stat = "diff in props", order = c("summer", "winter")) boot <- fli_small %>% specify(day_hour ~ season, success = "morning") %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "diff in props", order = c("summer", "winter")) ( percentile_ci <- get_ci(boot) ) visualize(boot) + shade_confidence_interval( endpoints = percentile_ci) ``` ] .col-6[ ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 0.0183 ``` ``` ## # A tibble: 1 x 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 -0.0654 0.101 ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## `\(\chi^2\)`: an Alternative Approach ### to Analyse Frequency Tables .row[.col-4[ #### Contingency table ``` ## season ## day_hour summer winter ## morning 108 106 ## not morning 139 147 ``` ] .col-4[ #### Expected frequencies ``` ## season ## day_hour summer winter ## morning 105.716 108.284 ## not morning 141.284 144.716 ``` ] .col-4[ #### Diffs to expected frequencies ``` ## season ## day_hour summer winter ## morning 2.284 -2.284 ## not morning -2.284 2.284 ``` ]] .row[.col-8[ .row[.col-6[ #### `\(X^2\)` ``` ## [1] 0.1704924 ``` ] .col-6[ `\begin{align*} X^2 = \sum \frac{(n_{ij}-\mu_{ij})^2}{\mu_{ij}} \end{align*}` ]] .row[ For random sampling and for large sample sizes `\(X^2\)` has approximately a `\(\chi^2\)` distribution. It takes its minimum value of zero when all `\(n_{ij} = \mu_{ij}\)`. For a fixed sample size, greater differences `\(n_{ij} - \mu_{ij}\)` produce larger `\(X^2\)` values. For the 2x2 contingency table, `\(X^2\)` is approximately `\(\chi^2(1)\)` distributed. The `\(\chi^2\)` approximation improves as `\(\mu_{ij}\)` increase and `\(\mu_{ij} \geq 5\)` is usually sufficient for a decent approximation. ]] .col-4[ <br/> ** `\(n_{ij}\)` ** Count in cell `\(ij\)` ** `\(\mu_{ij}\)` ** Expected frequency in cell `\(ij\)` ( `\(\mu_{ij} = n\pi_{ij}\)`) ** `\(\pi_{ij}\)` ** Expected probability, computed by the marginal probabilities `\(\pi_{ij} = \pi_{i}\times \pi_{j}\)` ]] --- ## R: `\(\chi^2\)`: in 2x2 Contingency Tables .row[.col-6[ ```r fli_small %>% specify(day_hour ~ season, success = "morning") %>% calculate(stat = "Chisq", order = c("summer", "winter")) boot <- fli_small %>% specify(day_hour ~ season, success = "morning") %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "Chisq", order = c("summer", "winter")) ( percentile_ci <- get_ci(boot) ) visualize(boot) + shade_confidence_interval( endpoints = percentile_ci) ``` ] .col-6[ ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 0.104 ``` ``` ## # A tibble: 1 x 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 0 5.09 ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## Effect size in a 2x2 contingency table: `\(\phi\)` .row[.col-6[ #### The `\(\phi\)` coefficient `\begin{align*} \phi = \sqrt{\frac{\chi^2}{N}} \end{align*}` The phi coefficient, `\(\phi\)`, is a measure of strength of association in a 2x2 table. It’s a type of correlation. <br/> ```r sqrt(X2/500) ``` ``` ## [1] 0.01846577 ``` <br/> ```r effectsize::chisq_to_phi(X2, 500, 2, 2) ``` ``` ## Phi | 95% CI ## ------------------- ## 0.02 | [0.00, 0.10] ``` ] .col-6[ ```r effectsize::phi(ct) ``` ``` ## Phi | 95% CI ## ------------------- ## 0.02 | [0.00, 0.10] ``` ```r boot <- boot %>% mutate(stat = sqrt(stat/500)) visualize(boot) + shade_confidence_interval( endpoints = sqrt(percentile_ci/500)) ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-28-1.png" width="95%" style="display: block; margin: auto;" /> ] ] --- ## Effect size in a 2x2 contingency table: Cramer's V .row[.col-6[ #### Cramer's V `\begin{align*} V &= \sqrt{\frac{\phi^2}{\min(c-1,r-1)}}\\ &= \sqrt{\frac{\chi^2/N}{\min(c-1,r-1)}} \end{align*}` with `\(c\)` and `\(r\)` the number of columns and rows, respectively. Cramer's V, is a measure of strength of association. In a 2x2 table it is the same as `\(\phi\)`, and is equal to the simple correlation between two dichotomous variables, ranging between 0 (no dependence) and 1 (perfect dependence). ] .col-6[ ```r sqrt((X2/500)/(min(2-1,2-1))) ``` ``` ## [1] 0.01846577 ``` ```r effectsize::cramers_v(ct) ``` ``` ## Cramer's V | 95% CI ## ------------------------- ## 0.02 | [0.00, 0.10] ``` ```r effectsize::phi(ct) ``` ``` ## Phi | 95% CI ## ------------------- ## 0.02 | [0.00, 0.10] ``` ] ] --- ## Proportions or `\(\chi^2\)`? .row[.col-7[ A proportions analysis has several advantages **Focus on effect size and estimation.** The starting point of a proportions analysis is two proportions and the difference between them. A figure showing these, with their CIs is informative about effect sizes and precision, and thus supports estimation thinking. **Familiarity of effect size measure.** Proportion and difference between two proportions may be more familiar, more easily represented visually in figures, and more readily interpreted than the `\(\phi\)` coefficient. **Fewer restrictions.** Both approaches require frequencies of separate, independent people or objects. `\(\chi^2\)` requires in addition that expected frequencies not be too small. A proportions analysis can be used with very small frequencies and is thus applicable in a wider variety of situations. ] .col-5[ A proportions analysis keeps the focus on **effect sizes** and estimation, and can be used with small expected frequencies. ] ] --- ## Relative Risk .row[ .col-7[ If the two proportions are very small, then their difference will also be small. Yet that small difference may be important. In a medical context, the ratio of the success probabilities, `\(R = p_1/p_2\)`, is called the **relative risk**: $$\hat{R}= \frac{\dfrac{a}{m}}{\dfrac{c}{n}}=\frac{an}{cm} $$ The normal approximation to the `\(1-\alpha\)` confidence interval is `$$\exp\left[\ln(\hat{R}) \pm 2\sinh^{-1}\left( \frac{z_{1-\alpha/2}}{2} \sqrt{\frac{1}{a}+\frac{1}{c}-\frac{1}{m+1}-\frac{1}{n+1}} \right) \right]$$` ] .col-5[ | | | Var 1 | | | --------------------------------| |Var 2 | Outcome 1 | Outcome 2 | Total | |Outcome 1 | a | b | m | |Outcome 2 | c | d | n | |Total | r | s | N | ] ] --- ## R: Relative Risk .row[.col-7[ #### `\(\hat{R}\)` ```r (108*286)/(139*214) ``` ``` ## [1] 1.038392 ``` ```r effectsize::riskratio(ct) ``` ``` ## Risk ratio | 95% CI ## ------------------------- ## 1.04 | [0.84, 1.29] ``` ] .col-5[ <table> <thead> <tr> <th style="text-align:left;"> day_hour/season </th> <th style="text-align:right;"> summer </th> <th style="text-align:right;"> winter </th> <th style="text-align:right;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> morning </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 106 </td> <td style="text-align:right;"> 214 </td> </tr> <tr> <td style="text-align:left;"> not morning </td> <td style="text-align:right;"> 139 </td> <td style="text-align:right;"> 147 </td> <td style="text-align:right;"> 286 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 247 </td> <td style="text-align:right;"> 253 </td> <td style="text-align:right;"> 500 </td> </tr> </tbody> </table> ] ] --- ## R: Relative Ratio - Bootstrap .row[.col-6[ ```r fli_small %>% specify(day_hour ~ season, success = "morning") %>% calculate(stat = "ratio of props", order = c("summer", "winter")) boot <- fli_small %>% specify(day_hour ~ season, success = "morning") %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "ratio of props", order = c("summer", "winter")) ( percentile_ci <- get_ci(boot) ) visualize(boot) + shade_confidence_interval( endpoints = percentile_ci) ``` ] .col-6[ ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 1.04 ``` ``` ## # A tibble: 1 x 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 0.866 1.27 ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## Odds Ratio .row[.col-6[ The odds on an occurrence is defined as the probability of its happening divided by the probability that it does not happen: `\(p/(1-p)\)`. ] .col-6[ For a fair coin, the probability of obtaining a head as opposed to a tail is 0.5; the odds are therefore `\(0.5/0.5 = 1\)`. ]] .row[.col-6[ An **odds-ratio `\(\theta\)`** is the ratio of the odds under one set of conditions to the odds under another set of conditions. It is given by: `$$\theta = \frac{p_1/(1-p_1)}{p_2/(1-p_2)}$$` <br/><br/> The **approximate confidence interval** is: `$$\exp\left[\ln(\hat{\theta}_F) \pm 2\sinh^{-1}\left( \frac{z_{1-\alpha/2}}{2} \sqrt{\frac{1}{a+0.4}+\frac{1}{b+0.4}+\frac{1}{c+0.4}+\frac{1}{d+0.4}} \right) \right]$$` ] .col-6[ In the notation of our earlier 2x2 table: `$$\theta = \frac{ad}{bc}$$` If either of the numbers a, b ,c , or d is zero we have a problem. Therefore we can use: `$$\theta_F = \frac{(a+0.6)(d+0.6)}{(b+0.6)(c+0.6)}$$` ] ] --- ## R: Odds Ratio .row[.col-7[ #### `\(\hat{\theta}\)` ```r ((108+0.6)*(147+0.6))/((106+0.6)*(139+0.6)) ``` ``` ## [1] 1.077143 ``` ```r effectsize::oddsratio(ct) ``` ``` ## Odds ratio | 95% CI ## ------------------------- ## 1.08 | [0.76, 1.54] ``` ```r effectsize::interpret_oddsratio(1.08) ``` ``` ## [1] "very small" ## (Rules: chen2010) ``` ] .col-5[ <table> <thead> <tr> <th style="text-align:left;"> day_hour/season </th> <th style="text-align:right;"> summer </th> <th style="text-align:right;"> winter </th> <th style="text-align:right;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> morning </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 106 </td> <td style="text-align:right;"> 214 </td> </tr> <tr> <td style="text-align:left;"> not morning </td> <td style="text-align:right;"> 139 </td> <td style="text-align:right;"> 147 </td> <td style="text-align:right;"> 286 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 247 </td> <td style="text-align:right;"> 253 </td> <td style="text-align:right;"> 500 </td> </tr> </tbody> </table> ] ] --- ## R: Odds Ratio - Bootstrap .row[.col-6[ ```r fli_small %>% specify(day_hour ~ season, success = "morning") %>% calculate(stat = "odds ratio", order = c("summer", "winter")) boot <- fli_small %>% specify(day_hour ~ season, success = "morning") %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "odds ratio", order = c("summer", "winter")) ( percentile_ci <- get_ci(boot) ) visualize(boot) + shade_confidence_interval( endpoints = percentile_ci) ``` ] .col-6[ ``` ## # A tibble: 1 x 1 ## stat ## <dbl> ## 1 1.08 ``` ``` ## # A tibble: 1 x 2 ## lower_ci upper_ci ## <dbl> <dbl> ## 1 0.753 1.57 ``` <img src="08.inference.cat-1_files/figure-html/unnamed-chunk-40-1.png" width="100%" style="display: block; margin: auto;" /> ]]