Inference for Categorical Data

class: middle, title-slide

# Inference for Categorical Data
## Estimation, Effect Size, and Margin of Error
### Dennis A. V. Dittrich
### 2021

---

layout: true

---

## Estimation: The confidence interval
.row[.col-7[
A **confidence interval**, or interval estimate, provides a range of values that, with a
certain level of confidence, contains the population parameter of interest.

A **confidence interval** is associated with a **margin of error** that accounts for
the **standard error** of the estimator and the desired confidence level of the interval.

If we repeated an experiment and generated numerous samples, the
fraction of calculated **confidence intervals** (which would differ for each
sample) that encompass the true population parameter would tend toward
the confidence level (often 95%).
]
.col-5[
The **confidence interval** represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the `$\alpha$` level.
]]

---

### Estimate and Confidence interval for a proportion
.row[.col-6[
**Frequencies** are counts of the number of data points in an interval or category.

A **proportion** `$P$` is simply a chosen frequency X divided by N, the total number
of cases.
]
.col-6[
Frequencies are needed for **nominal** scaling, and can be used with **ordinal** or **interval** scaling.
]]
.row[.col-6[
To calculate a confidence interval on `$P = X/N$`, assume the `$N$` events are independent and each has the same probability of being a hit.
]
.col-6[
#### Normal approximation
`\begin{align*}
SE = \sqrt{pq/n}
\end{align*}`
the `$100(1 -\alpha)\%$` confidence interval for the proportion in the population is calculated as
`\begin{align*}
p-[z_{1-\alpha/2}\times SE(p)]; p+[z_{1-\alpha/2}\times SE(p) ]
\end{align*}`
The number of successes and failures must be at least 5 each to yield a reasonably good approximation.
]
]

---

## R: Confidence interval for a proportion

.row[.col-7[
For `$n=10$` and `$X = 3$`, we have `$p=3/10 = 0.3$` and `$q= (10-3)/10 = 0.7$`.
]]
.row[.col-7[
The normal approximation (also called the **Wald** interval) for the confidence interval for the proportion is:
]
.col-5[
`\begin{align*}
SE &= \sqrt{\frac{0.3\times 0.7}{10}}\\
\end{align*}`
]]
.row[.col-7[

```r
3/10 - qnorm(0.975) * sqrt((3/10 * 7/10 )/ 10) ; 
```

```
## [1] 0.01597423
```

```r
3/10 + qnorm(0.975) * sqrt((3/10 * 7/10 )/ 10) 
```

```
## [1] 0.5840258
```
]
.col-5[
The _exact_ binomial confidence interval for the **counts** is

```r
qbinom(0.025, 10, 0.3); qbinom(0.975, 10, 0.3)
```

```
## [1] 0
```

```
## [1] 6
```
]]

---

## R: Many different approximations 
### to the confidence interval for one proportion
.row[.col-8[

```r
DescTools::BinomCI(3, 10,
  method = c("wilson", "wald", "waldcc", "agresti-coull", 
    "jeffreys", "modified wilson", "wilsoncc", 
    "modified jeffreys", "clopper-pearson", "arcsine", 
    "logit", "witting", "pratt", "midp", "lik", "blaker"
  ))
```

```
##                         est     lwr.ci    upr.ci
## wilson            0.3000000 0.10779127 0.6032219
## wald              0.3000000 0.01597423 0.5840258
## waldcc            0.3000000 0.00000000 0.6340258
## agresti-coull     0.3555066 0.10333842 0.6076747
## jeffreys          0.3000000 0.09269459 0.6058183
## modified wilson   0.3000000 0.10779127 0.6032219
## wilsoncc          0.3000000 0.08094782 0.6463293
## modified jeffreys 0.3000000 0.09269459 0.6058183
## clopper-pearson   0.3000000 0.06673951 0.6524529
## arcsine           0.3139535 0.07897893 0.6181383
## logit             0.3000000 0.09976832 0.6236819
## witting           0.3000000 0.09725005 0.5481635
## pratt             0.3000000 0.07743751 0.6528046
## midp              0.3000000 0.08262612 0.6199217
## lik               0.3000000 0.08458545 0.6065389
## blaker            0.3000000 0.08725951 0.6194129
```
]
.col-4[
#### Which interval should we use? 
The Wald interval often has inadequate coverage, particularly for small n and values of p close to 0 or 1.

Brown et al. (2001) recommends the **Wilson** or Jeffreys methods for small n and Agresti-Coull, **Wilson**, or Jeffreys, for larger n as providing more reliable coverage than the alternatives.
]
]

---

## R: One categorical variable: one proportion
.row[.col-6[

```r
fli_small %>% 
  tabyl(day_hour) %>% 
  adorn_totals(c("row")) %>%  
  kbl()
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> day_hour </th>
   <th style="text-align:right;"> n </th>
   <th style="text-align:right;"> percent </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> morning </td>
   <td style="text-align:right;"> 214 </td>
   <td style="text-align:right;"> 0.428 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> not morning </td>
   <td style="text-align:right;"> 286 </td>
   <td style="text-align:right;"> 0.572 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 500 </td>
   <td style="text-align:right;"> 1.000 </td>
  </tr>
</tbody>
</table>
]
.col-6[

```r
fli_small %>% 
  ggplot() + 
  geom_bar(aes(x=day_hour))
```

<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

### R: Obtain the Confidence Interval with Bootstrapping: 
.row[.col-6[
generating numerous samples from the existing one

```r
boot <- fli_small %>%
  specify(response = day_hour, 
          success = "morning") %>%
  generate(reps = 1000, 
           type = "bootstrap") %>%
  calculate(stat = "prop")
( percentile_ci <- get_ci(boot) )
```

```
## # A tibble: 1 x 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.386     0.47
```
]
.col-6[

```r
visualize(boot) +
  shade_confidence_interval(
    endpoints = percentile_ci)
```

<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

## R: Obtain the Confidence Interval 
###  with direct Computation: 
.row[.col-6[
Since we know the data generating process is a binomial experiment we can compute theoretical confidence intervals.
]
.col-6[
The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not statistically significant at the `$\alpha$` level.
]]
.row[.col-6[
#### lower bound for the number of successes

```r
qbinom(p=0.025,size=500,prob=214/500)
```

```
## [1] 192
```

#### upper bound for the number of successes

```r
qbinom(0.975,500,214/500)
```

```
## [1] 236
```
]
.col-6[
#### lower bound for the prop. of success

```r
qbinom(0.025,500,214/500)/500
```

```
## [1] 0.384
```
#### upper bound for the prop. of success

```r
qbinom(0.975,500,214/500)/500
```

```
## [1] 0.472
```

```r
DescTools::BinomCI(214, 500, method="wilson")
```

```
##        est    lwr.ci    upr.ci
## [1,] 0.428 0.3853418 0.4717561
```

]]

---

## Two categorical variables with 2 levels
.row[.col-7[
Using proportions can often be a good strategy for examining the relation between two classification variables.

.pink[The contingency table]

```r
fli_small %>% 
  tabyl(day_hour,season) %>% 
  adorn_totals(c("row", "col")) %>%
  adorn_percentages("all") %>% 
  adorn_pct_formatting(rounding = "half up", 
                       digits = 0) %>%
  adorn_title("combined") %>% 
  kbl()
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> day_hour/season </th>
   <th style="text-align:left;"> summer </th>
   <th style="text-align:left;"> winter </th>
   <th style="text-align:left;"> Total </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> morning </td>
   <td style="text-align:left;"> 22% </td>
   <td style="text-align:left;"> 21% </td>
   <td style="text-align:left;"> 43% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> not morning </td>
   <td style="text-align:left;"> 28% </td>
   <td style="text-align:left;"> 29% </td>
   <td style="text-align:left;"> 57% </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:left;"> 49% </td>
   <td style="text-align:left;"> 51% </td>
   <td style="text-align:left;"> 100% </td>
  </tr>
</tbody>
</table>
]
.col-5[
<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" />
<br/>
<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-15-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

### Difference between two independent proportions

.row[.col-7[
The **difference** between two population proportions is estimated by

`\begin{align*}
D= p_1 - p_2
\end{align*}`

The **standard error** of `$p_1 - p_2$` is

`\begin{align*}
SE(D)= \sqrt{ \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2} }
\end{align*}`
]]

.row[.col-7[
The normal approximation for the **confidence interval** for the population difference in proportions is then given by
`\begin{align*}
[D-z_{1-\alpha/2}\times SE(D) ; D+z_{1-\alpha/2}\times SE(D)]
\end{align*}`
]
.col-5[This normal approximation may yield reasonably good results if each sample contains at least 30 observations. 
]
]

---

### R: Difference between two independent proportions

.row[.col-7[
What is the difference in the proportion of flights in the morning between the seasons?

```r
(D <- 108/247 -106/253 )
```

```
## [1] 0.01827463
```

```r
(SE <- sqrt( 
  (108/247)*(139/247)/247 + 
    (106/253)*(147/253)/253 
  ))
```

```
## [1] 0.04425375
```

]
.col-5[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> day_hour/season </th>
   <th style="text-align:right;"> summer </th>
   <th style="text-align:right;"> winter </th>
   <th style="text-align:right;"> Total </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> morning </td>
   <td style="text-align:right;"> 108 </td>
   <td style="text-align:right;"> 106 </td>
   <td style="text-align:right;"> 214 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> not morning </td>
   <td style="text-align:right;"> 139 </td>
   <td style="text-align:right;"> 147 </td>
   <td style="text-align:right;"> 286 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 247 </td>
   <td style="text-align:right;"> 253 </td>
   <td style="text-align:right;"> 500 </td>
  </tr>
</tbody>
</table>
]]
.row[.col-7[

```r
cbind(D - qnorm(.975)* SE, D + qnorm(.975)* SE)
```

```
##             [,1]      [,2]
## [1,] -0.06846113 0.1050104
```
]
]

---

### R: Difference between two independent proportions
### ...the many different CI approximations
.row[.col-7[
What is the difference in the proportion of flights in the morning between the seasons?

```r
DescTools::BinomDiffCI(108, 247, 106, 253,
  method = c( "ac", "wald", "waldcc", "score", 
              "scorecc", "mn", "mee", "blj", 
              "ha", "hal", "jp" ))
```

```
##                est      lwr.ci    upr.ci
## ac      0.01827463 -0.06826217 0.1045485
## wald    0.01827463 -0.06846113 0.1050104
## waldcc  0.01827463 -0.07246170 0.1090110
## score   0.01827463 -0.06799395 0.1042148
## scorecc 0.01827463 -0.07078487 0.1069957
## mn      0.01827463 -0.06835116 0.1046839
## mee     0.01827463 -0.06826416 0.1045954
## blj     0.01827463 -0.06853445 0.1049518
## ha      0.01827463 -0.07065947 0.1072087
## hal     0.01827463 -0.06825608 0.1045530
## jp      0.01827463 -0.06826337 0.1045601
```
]
.col-5[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> day_hour/season </th>
   <th style="text-align:right;"> summer </th>
   <th style="text-align:right;"> winter </th>
   <th style="text-align:right;"> Total </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> morning </td>
   <td style="text-align:right;"> 108 </td>
   <td style="text-align:right;"> 106 </td>
   <td style="text-align:right;"> 214 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> not morning </td>
   <td style="text-align:right;"> 139 </td>
   <td style="text-align:right;"> 147 </td>
   <td style="text-align:right;"> 286 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 247 </td>
   <td style="text-align:right;"> 253 </td>
   <td style="text-align:right;"> 500 </td>
  </tr>
</tbody>
</table>

<br/>
The general consensus is that the most widely taught method method="wald" is inappropriate in many situations and should not be used.

Recommendations seem to converge around the Miettinen-Nurminen based methods (method="mn")
]
]

---

## R: Difference in proportions - Bootstrap CI

.row[.col-6[

```r
fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  calculate(stat = "diff in props", 
            order = c("summer", "winter"))
boot <- fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  generate(reps = 1000, 
           type = "bootstrap") %>% 
  calculate(stat = "diff in props", 
            order = c("summer", "winter"))
( percentile_ci <- get_ci(boot) )
visualize(boot) +
  shade_confidence_interval(
    endpoints = percentile_ci) 
```
]
.col-6[

```
## # A tibble: 1 x 1
##     stat
##    <dbl>
## 1 0.0183
```

```
## # A tibble: 1 x 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1  -0.0654    0.101
```

<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

## `$\chi^2$`: an Alternative Approach 
### to Analyse Frequency Tables
.row[.col-4[
#### Contingency table

```
##              season
## day_hour      summer winter
##   morning        108    106
##   not morning    139    147
```
]
.col-4[
#### Expected frequencies

```
##              season
## day_hour       summer  winter
##   morning     105.716 108.284
##   not morning 141.284 144.716
```
]
.col-4[
#### Diffs to expected frequencies

```
##              season
## day_hour      summer winter
##   morning      2.284 -2.284
##   not morning -2.284  2.284
```
]]
.row[.col-8[
.row[.col-6[
#### `$X^2$`

```
## [1] 0.1704924
```
]
.col-6[
`\begin{align*}
X^2 = \sum \frac{(n_{ij}-\mu_{ij})^2}{\mu_{ij}}
\end{align*}`
]]
.row[
For random sampling and for large sample
sizes `$X^2$` has approximately a `$\chi^2$` distribution. It takes its minimum value of zero
when all `$n_{ij} = \mu_{ij}$`. For a fixed sample size, greater differences `$n_{ij} - \mu_{ij}$` produce larger `$X^2$` values.

For the 2x2 contingency table, `$X^2$` is approximately `$\chi^2(1)$` distributed. The `$\chi^2$` approximation improves as `$\mu_{ij}$` increase and `$\mu_{ij} \geq 5$` is usually sufficient for a decent approximation.
]]
.col-4[
<br/>
** `$n_{ij}$` ** Count in cell `$ij$`

** `$\mu_{ij}$` ** Expected frequency in cell `$ij$` ( `$\mu_{ij} = n\pi_{ij}$`)

** `$\pi_{ij}$` ** Expected probability, computed by the marginal probabilities `$\pi_{ij} = \pi_{i}\times \pi_{j}$`
]]

---

## R: `$\chi^2$`: in 2x2 Contingency Tables

.row[.col-6[

```r
fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  calculate(stat = "Chisq", 
            order = c("summer", "winter"))
boot <- fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  generate(reps = 1000, 
           type = "bootstrap") %>% 
  calculate(stat = "Chisq", 
            order = c("summer", "winter"))
( percentile_ci <- get_ci(boot) )
visualize(boot) +
  shade_confidence_interval(
    endpoints = percentile_ci) 
```
]
.col-6[

```
## # A tibble: 1 x 1
##    stat
##   <dbl>
## 1 0.104
```

```
## # A tibble: 1 x 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1        0     5.09
```

<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

## Effect size in a 2x2 contingency table: `$\phi$`

.row[.col-6[
#### The `$\phi$` coefficient
`\begin{align*}
\phi = \sqrt{\frac{\chi^2}{N}}
\end{align*}`
The phi coefficient, `$\phi$`, is a measure of strength of
association in a 2x2 table. It’s a type of correlation.

<br/>

```r
sqrt(X2/500)
```

```
## [1] 0.01846577
```
<br/>

```r
effectsize::chisq_to_phi(X2, 500, 2, 2)
```

```
## Phi  |       95% CI
## -------------------
## 0.02 | [0.00, 0.10]
```

]
.col-6[

```r
effectsize::phi(ct)
```

```
## Phi  |       95% CI
## -------------------
## 0.02 | [0.00, 0.10]
```

```r
boot <- boot %>%  
  mutate(stat = sqrt(stat/500))
visualize(boot) +
  shade_confidence_interval(
    endpoints = sqrt(percentile_ci/500)) 
```

]
]

---

## Effect size in a 2x2 contingency table: Cramer's V

.row[.col-6[
#### Cramer's V
`\begin{align*}
V &= \sqrt{\frac{\phi^2}{\min(c-1,r-1)}}\\ 
&=  \sqrt{\frac{\chi^2/N}{\min(c-1,r-1)}}
\end{align*}`
with `$c$` and `$r$` the number of columns and rows, respectively.

Cramer's V, is a measure of strength of association. In a 2x2 table it is the same as `$\phi$`, and is equal to the simple correlation between two dichotomous variables, ranging between 0 (no dependence) and 1 (perfect dependence).
]
.col-6[

```r
sqrt((X2/500)/(min(2-1,2-1)))
```

```
## [1] 0.01846577
```

```r
effectsize::cramers_v(ct)
```

```
## Cramer's V |       95% CI
## -------------------------
## 0.02       | [0.00, 0.10]
```

```r
effectsize::phi(ct)
```

```
## Phi  |       95% CI
## -------------------
## 0.02 | [0.00, 0.10]
```

]
]

---

## Proportions or `$\chi^2$`?

.row[.col-7[
A proportions analysis has several advantages

**Focus on effect size and estimation.** The starting point of a proportions analysis
is two proportions and the difference between them. A figure showing
these, with their CIs is informative about effect sizes
and precision, and thus supports estimation thinking.

**Familiarity of effect size measure.** Proportion and difference between two proportions may be more familiar, more easily represented visually in figures,
and more readily interpreted than the `$\phi$` coefficient.

**Fewer restrictions.** Both approaches require frequencies of separate, independent people or objects. `$\chi^2$` requires in addition that expected
frequencies not be too small. A proportions analysis can be used with very
small frequencies and is thus applicable in a wider variety of situations.
]
.col-5[
A proportions analysis keeps the focus on **effect sizes** and estimation, and can be used with small expected frequencies.
]
]

---

## Relative Risk
.row[
.col-7[
If the two proportions are very small, then their difference will also be small.
Yet that small difference may be important.

In a medical context, the ratio of the success probabilities, `$R = p_1/p_2$`, is
called the **relative risk**:
$$\hat{R}= \frac{\dfrac{a}{m}}{\dfrac{c}{n}}=\frac{an}{cm} $$

The normal approximation to the `$1-\alpha$` confidence interval is
`$$\exp\left[\ln(\hat{R}) \pm 2\sinh^{-1}\left( \frac{z_{1-\alpha/2}}{2} \sqrt{\frac{1}{a}+\frac{1}{c}-\frac{1}{m+1}-\frac{1}{n+1}} \right)   \right]$$`

]
.col-5[
|          |           | Var 1    |       |
|          --------------------------------|
|Var 2     | Outcome 1 | Outcome 2 | Total |                      
|Outcome 1 |    a      |    b      |   m   |
|Outcome 2 |    c      |    d      |   n   |
|Total     |    r      |    s      |   N   |

]
]

---

## R: Relative Risk
.row[.col-7[
#### `$\hat{R}$`

```r
(108*286)/(139*214)
```

```
## [1] 1.038392
```

```r
effectsize::riskratio(ct)
```

```
## Risk ratio |       95% CI
## -------------------------
## 1.04       | [0.84, 1.29]
```
]
.col-5[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> day_hour/season </th>
   <th style="text-align:right;"> summer </th>
   <th style="text-align:right;"> winter </th>
   <th style="text-align:right;"> Total </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> morning </td>
   <td style="text-align:right;"> 108 </td>
   <td style="text-align:right;"> 106 </td>
   <td style="text-align:right;"> 214 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> not morning </td>
   <td style="text-align:right;"> 139 </td>
   <td style="text-align:right;"> 147 </td>
   <td style="text-align:right;"> 286 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Total </td>
   <td style="text-align:right;"> 247 </td>
   <td style="text-align:right;"> 253 </td>
   <td style="text-align:right;"> 500 </td>
  </tr>
</tbody>
</table>
]
]

---

## R: Relative Ratio - Bootstrap

.row[.col-6[

```r
fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  calculate(stat = "ratio of props", 
            order = c("summer", "winter"))
boot <- fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  generate(reps = 1000, 
           type = "bootstrap") %>% 
  calculate(stat = "ratio of props", 
            order = c("summer", "winter"))
( percentile_ci <- get_ci(boot) )
visualize(boot) +
  shade_confidence_interval(
    endpoints = percentile_ci) 
```
]
.col-6[

```
## # A tibble: 1 x 1
##    stat
##   <dbl>
## 1  1.04
```

```
## # A tibble: 1 x 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.866     1.27
```

<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

## Odds Ratio
.row[.col-6[
The odds on an occurrence is defined as the probability of its happening
divided by the probability that it does not happen: `$p/(1-p)$`. 
]
.col-6[
For a fair coin,
the probability of obtaining a head as opposed to a tail is 0.5; the odds are
therefore `$0.5/0.5 = 1$`.
]]
.row[.col-6[
An **odds-ratio `$\theta$`** is the ratio of the odds under one set of conditions to the
odds under another set of conditions. It is given by:
`$$\theta = \frac{p_1/(1-p_1)}{p_2/(1-p_2)}$$`

<br/><br/>
The **approximate confidence interval** is:

`$$\exp\left[\ln(\hat{\theta}_F) \pm 2\sinh^{-1}\left( \frac{z_{1-\alpha/2}}{2} \sqrt{\frac{1}{a+0.4}+\frac{1}{b+0.4}+\frac{1}{c+0.4}+\frac{1}{d+0.4}} \right)   \right]$$`

]
.col-6[
In the notation of our earlier 2x2 table:
`$$\theta = \frac{ad}{bc}$$`
If either of the numbers a, b ,c , or d is zero we have a problem.
Therefore we can use:
`$$\theta_F = \frac{(a+0.6)(d+0.6)}{(b+0.6)(c+0.6)}$$`
]
]

---

## R: Odds Ratio
.row[.col-7[
#### `$\hat{\theta}$`

```r
((108+0.6)*(147+0.6))/((106+0.6)*(139+0.6))
```

```
## [1] 1.077143
```

```r
effectsize::oddsratio(ct)
```

```
## Odds ratio |       95% CI
## -------------------------
## 1.08       | [0.76, 1.54]
```

```r
effectsize::interpret_oddsratio(1.08)
```

```
## [1] "very small"
## (Rules: chen2010)
```

---

## R: Odds Ratio - Bootstrap
.row[.col-6[

```r
fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  calculate(stat = "odds ratio", 
            order = c("summer", "winter"))
boot <- fli_small %>%
  specify(day_hour ~ season, 
          success = "morning") %>%
  generate(reps = 1000, 
           type = "bootstrap") %>% 
  calculate(stat = "odds ratio", 
            order = c("summer", "winter"))
( percentile_ci <- get_ci(boot) )
visualize(boot) +
  shade_confidence_interval(
    endpoints = percentile_ci) 
```
]
.col-6[

```
## # A tibble: 1 x 1
##    stat
##   <dbl>
## 1  1.08
```

```
## # A tibble: 1 x 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.753     1.57
```

<img src="08.inference.cat-1_files/figure-html/unnamed-chunk-40-1.png" width="100%" style="display: block; margin: auto;" />
]]