Probability Distributions and the Central Limit Theorem III

class: middle, title-slide

# Probability Distributions and the Central Limit Theorem III
## Sampling and the Central Limit Theorem
### Dennis A. V. Dittrich
### 2021

---

layout: true

---

## Parameter and Sample Statistics
.row[.col-7[
**Parameter**  
A measure computed from the entire population. As
long as the population does not change, the value of
the parameter will not change.

**Simple Random Sample**  
A sample selected in such a manner that each
possible sample of a given size has an equal chance
of being selected.

**Sampling distribution**  
The probability distribution of a sample statistic that
is formed when random samples of size `$n$` are
repeatedly taken from a population.

If the **sample statistic** is the sample mean, then the
distribution is the Sampling distribution of sample
means
]]

---

## Sampling Error
.row[.col-7[
The difference between a measure computed from a
sample (a statistic) and the corresponding measure
computed from the population (a parameter) is the **sampling error**.

*  The size of the sampling error depends on which sample is selected.
*  The sampling error may be positive or negative.
*  There is potentially a different `$\bar{x}$` for each sample.
]]
---

.tip[
## Example: Sampling Error
.row[.col-7[
It is known that the population mean age for employees
working at a major automaker is 44.5 years. Suppose a
random sample of 30 employees is selected and the
sample mean age for these employees is 38 years,
calculate the sampling error.

Sampling error:  
`$\bar{x}-\mu = 38-44.5=-6.5$` years	
	
The sample mean provided an average age that is 6.5 years less than
the population mean.	
]]]

---

## The Role of Sample Size
.row[.col-7[
A real estate development company builds office
buildings. They have built a total of N = 12 buildings. The
number of square feet in each building is shown as
follows:

![](img/samp1.png)

`$\mu = 158972$` sq. feet
]]

---

## The Role of Sample Size
.row[.col-7[
A potential customer plans to select a sample `$n = 5$`
office complexes from the 12.

Number of possible samples of size `$n = 5$`:
]]
.row[.col-7[
`$$C_5^{12}= \binom{12}{5} = \frac{12!}{5!(12-5)!} = 792$$`

These 792 samples could yield many possible sample
means and thus result in many possible sampling error
amounts.

We might be interested in the samples that
provide the smallest and largest sample means.
]
.col-5[

```r
choose(12,5)
```

```
## [1] 792
```
]]

---

## The Role of Sample Size
.row[.col-6[
The five smallest office complexes in terms of
square feet are:
![](img/samp2.png)

`\begin{align*}
\bar{x} &= 108232 \text{ sq. feet}\\
\bar{x}-\mu &= 108232-158972\\
 &= -50740
\end{align*}`
]
.col-6[
The five largest office complexes in terms of square
feet are:
![](img/samp3.png)
`\begin{align*}
\bar{x} &= 210900 \text{ sq. feet}\\
\bar{x}-\mu &= 210900-158972\\
&= 51928
\end{align*}`
]]

---

## The Role of Sample Size
.row[.col-7[
The range of sampling error for samples with `$n = 5$`:  
`$-50740$` to `$51928$`

What happens to the range of sampling error if the
sample size is reduced to `$n = 3$`?:  
`$-65185$` to `$81128$`

![](img/samp4.png)

* The range of potential sampling error is less for larger sample sizes.
* The potential for extreme sampling error is reduced when using larger sample sizes.
]]

---

## Sampling Distribution Simulation 
.row[.col-7[
Let `$X$` be a random sample and let `$T = h(X)$` denote some
statistic. The sampling distribution of `$T$` is its probability distribution.

```r
x <- sample(1:6, 1000, replace=TRUE)
# pop avg: 3.5
mean(x)
```

```
## [1] 3.564
```

The standard deviation of a sampling distribution is called the standard error.

```r
# pop var = 2.92, sd = 1.71
sd(x)
```

```
## [1] 1.728262
```
]
.col-5[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-5-1.png" width="98%" style="display: block; margin: auto;" />
]
]

---

## Sampling, Permutations, Combinations 
.row[.col-6[

When we draw out of a set of `$n$` elements `$k$` elements we can count the number of different outcomes:
]
.col-6[
Drawing 2 elements out of a set with 6 elements with replacement = throwing 2 dice
]]

| Sampling method | Count of outcomes | Throwing 2 dice | Count of outcomes |
|---|---|---|---:|
|Ordered sampling with replacement| `$n^k$`|Two different dice| `$6^2 =36$`|
|Ordered sampling without replacement|	`$P^n_k=\frac{n!}{(n−k)!}$` |?|	`$\frac{6!}{(6−2)!}=$` 30 |
|unordered sampling without replacement|	`$C^n_k=\binom{n}{k}=\frac{n!}{k!(n−k)!}$` |?|	`$\binom{6}{2}=$` 15 |
|unordered sampling with replacement| `$\binom{n+k−1}{k}$`|Two identical dice| `$\binom{6+2−1}{2}=$` 21|

---

### How to compute the number of 
### Permutaions and Combinations in R

.row[
.col-7[
Ordered Permutation (with replacement)

`$n^k$`
]
.col-5[

```r
n<-6; k<-2
n^k
```

```
## [1] 36
```
]
]

.row[
.col-7[
Ordered Permutation (without replacement):  
`$k$`-permutations of `$n$`

`$P^n_k=\frac{n!}{(n−k)!}$`
]
.col-5[

```r
factorial(n)/factorial(n-k)
```

```
## [1] 30
```
]
]

.row[
.col-7[
Unordered combinations (without replacement)  
k-combination of an `$n$`-set

`$C^n_k = \frac{P^n_k}{P^k_k}  =\frac{n!}{k!(n−k)!} =\binom{n}{k}$`
]
.col-5[

```r
choose(n,k)
```

```
## [1] 15
```
]
]

.row[
.col-7[
Unordered combinations (with replacement)

`$C^{n+k-1}_k = \binom{n+k−1}{k}$`]
.col-5[

```r
choose(n+k-1,k)
```

```
## [1] 21
```
]
]

---

## Simulating throwing two dice
.row[
.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-10-1.png" width="80%" style="display: block; margin: auto;" />
]
.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-11-1.png" width="80%" style="display: block; margin: auto;" />

```r
mean(Xbar)  
```

```
## [1] 3.50485
```

```r
sd(Xbar)
```

```
## [1] 1.204884
```

Notice that the sd of the means is smaller than the sd of the population (1.71).

]]

---

## Throwing 5, 10, or 25 dice
.row[
.col-4[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-13-1.png" width="98%" style="display: block; margin: auto;" />
]
.col-4[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-14-1.png" width="98%" style="display: block; margin: auto;" />
]
.col-4[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-15-1.png" width="98%" style="display: block; margin: auto;" />
]
]

.row[
.col-7[
Notice that the distribution of the mean gets more concentrated the more dice you throw, i.e. elements are sampled.

]
]

---

## Properties of Sampling Distributions 
## of Sample Means
.row[.col-7[
1. The mean of the sample means, `$\mu_{\bar{x}}$`, is equal to the population mean `$\mu$`.
`$$\mu_{\bar{x}}=\mu$$`
2. The standard deviation of the sample means, `$\sigma_{\bar{x}}$`, is equal to the population standard deviation, `$\sigma$`, divided by the square root of the sample size, `$n$`.
`$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$`
Called the **standard error of the mean**.

The larger the sample, the smaller the standard error of the mean.
]]

---
.row[.col-7[
## The Central Limit Theorem
If the population itself is normally distributed, the sampling distribution of the sample means is normally distribution for any sample size `$n$`.
]
.col-5[
![](img/clt2.png)
]]

.row[.col-7[
## The Central Limit Theorem

If samples of sufficient size ( `$n \geq 30$` is often considered sufficient) are
drawn from any population with mean = `$\mu$` and standard deviation = `$\sigma ,$` then the
sampling distribution of the sample means approximates a normal distribution. 
		
The greater the sample size, the better the approximation.	
]
.col-5[
![](img/clt1.png)
]]
---

## The Central Limit Theorem
.row[.col-7[
In either case, the sampling distribution of sample means has a mean equal to the population mean.
`$$\text{Mean of the sample means } \mu_{\bar{x}} = \mu$$`

The sampling distribution of sample means has a
variance equal to `$1/n$` times the variance of the
population and a standard deviation equal to the
population standard deviation divided by the square
root of `$n$`.
]
.col-5[
**Variance of the sample means**  
`$$\sigma^2_{\bar{x}} = \frac{\sigma^2}{n}$$`

**Standard deviation of the sample means** (Standard error of the mean)  
`$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$`
]]

---

### Sampling Distribution of the Mean of Any Population

Example: exponential distribution
.row[.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" />
]
.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-17-1.png" width="80%" style="display: block; margin: auto;" />
]]
.row[.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-18-1.png" width="80%" style="display: block; margin: auto;" />
]
.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-19-1.png" width="80%" style="display: block; margin: auto;" />
]]

---

## Interpreting the Central Limit Theorem
.tip[
.row[.col-7[
A study analyzed the sleep habits of college students. The study found that the mean sleep time was 6.8 hours, with a standard deviation of 1.4 hours. Random samples of 100 sleep times are drawn from this population, and the mean of each sample is determined. 
]]
.row[.col-7[
![](img/clt3.png)
]
.col-5[Find the mean and standard deviation of the
sampling distribution of sample means. Then sketch a graph of
the sampling distribution.
]]
]
---

## Interpreting the Central Limit Theorem
.tip[
.row[.col-5[
`$$\mu_{\bar{x}} = \mu = 6.8$$`
`$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{1.4}{\sqrt{100}}= 0.14$$`

Since the sample size is greater than 30, the sampling distribution can be approximated by a normal distribution with a mean of 6.8 hours and a standard deviation of 0.14 hour.
]
.col-7[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" />
]]
]
---

## Interpreting the Central Limit Theorem
.tip[
.row[.col-7[
The training heart rates of all 20-years old athletes are normally distributed, with a mean of 135 beats per minute and standard deviation of 18 beats per minute. Random samples of size 4 are drawn from this population, and the mean of each sample is determined. 
]]
.row[.col-7[
![](img/clt5.png)
]
.col-5[Find the mean and standard error of the mean of the
sampling distribution. Then sketch a graph of the sampling
distribution of sample means.	
]
]
]
---

## Interpreting the Central Limit Theorem
.tip[
.row[.col-5[
`$$\mu_{\bar{x}} = \mu = 135$$`
`$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{18}{\sqrt{4}}= 9$$`

Since the population is normally distributed, the sampling distribution of the sample means is also normally distributed.
]
.col-7[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" />
]]
]
---

## Probability and the Central Limit Theorem
.row[.col-6[
To transform `$x$` to a `$z$`-score	
$$ z = \frac{\text{Value - Mean}}{\text{Standard Error}} = \frac{\bar{x}-\mu}{\sigma_{\bar{x}}}=\frac{\bar{x}-\mu}{\sigma / \sqrt{n} }$$
]
.col-6[
.tip[Example  
You randomly select 50 drivers ages 16 to 19. What is the probability that the mean
distance traveled each day is between 19.4 and 22.5 miles? Assume `$\mu = 20.7$` and `$\sigma = 6.5$` miles.

From the Central Limit Theorem (sample size is greater
than 30), the sampling distribution of sample means is
approximately normal with
`$$\mu_{\bar{x}} = \mu = 20.7$$`
`$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{6.5}{\sqrt{50}} \approx 0.92$$`
]]]

---

.tip[
## Probability and the Central Limit Theorem
.row[.col-5[
The z-scores that correspond to sample means of 19.4 and 22.5 miles are
`$$z_1 =\frac{19.4 - 20.7}{6.5/\sqrt{50}} \approx -1.41$$`
`$$z_2 =\frac{22.5 - 20.7}{6.5/\sqrt{50}} \approx 1.96$$`
]
.col-7[
The probability that the mean distance driven each day by the sample of 50 people is between 19.4 and 22.5 miles is
`\begin{align*}
P(19.4 < \bar{x} < 22.5) & = P(-1.41 < z < 1.96)\\
&= P(z<1.96)-P(z< -1.41)\\
&=0.9750 - 0.0893\\
&=0.8957
\end{align*}`
]]]

---

.tip[
## Probability and the Central Limit Theorem
.row[.col-7[
Of all samples of 50 drivers ages 16 to 19, about 90% will drive a mean distance each day between 19.4 and 22.5 miles, as shown in the graph. This implies that, assuming `$\mu=20.7$` is correct, about 10% of such samples will lie outside the given interval.
]]
.row[.col-7[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" />
]
.col-5[

```r
(pnorm(1.96,0,1)
 -pnorm(-1.41,0,1)) %>%  
  round(3)
```

```
## [1] 0.896
```

```r
(pnorm(22.5,20.7,6.5/sqrt(50))
 -pnorm(19.4,20.7,6.5/sqrt(50))) %>%  
  round(3)
```

```
## [1] 0.896
```
]]]

---

## Finding probabilities of `$x$` and `$\bar{x}$`
.tip[
.row[.col-7[
Some college students use credit cards to pay for school-related
expenses. For this population, the amount paid is normally
distributed, with a mean of $1615 and a standard deviation of
$550.

1. What is the probability that a randomly selected college
student, who uses a credit card to pay for school-related
expenses, paid less than $1400?
2. You randomly select 25 college students who use credit
cards to pay for school-related expenses. What is the
probability that their mean amount paid is less than $1400?
3. Compare the probabilities from parts 1 and 2.
]]
]
---

## Finding probabilities of `$x$` and `$\bar{x}$`
.tip[
.row[.col-7[
What is the probability that a randomly selected college
student, who uses a credit card to pay for school-related
expenses, paid less than $1400?
	
In this case, you are asked to find the probability for a specific value (or rather range) of `$x$`: `$$P(P<1400)$$`
`$$\mu = 1615, \quad \sigma=550$$`

```r
pnorm(1400,1615,550) %>%  round(3)
```

```
## [1] 0.348
```
]]
]
---

## Finding probabilities of `$x$` and `$\bar{x}$`
.tip[
.row[.col-7[
You randomly select 25 college students who use credit
cards to pay for school-related expenses. What is the
probability that their mean amount paid is less than $1400?
In this case, you are asked to find the probability for sample mean of `$x$`: `$$P(P<1400)$$`
`$$\mu = 1615, \quad \sigma=550, \quad \sigma_{\bar{x}}=\frac{550}{\sqrt{25}}$$`

```r
pnorm(1400,1615,550/sqrt(25)) %>%  round(3)
```

```
## [1] 0.025
```

The probability that the mean credit card balance of the 25 sampled students is less than $1400 is 2.5%.
]]
]
---

## Finding probabilities of `$x$` and `$\bar{x}$`
.tip[
.row[.col-7[
Although there is about a 35% chance that a college
student who uses a credit card to pay for
school-related expenses will pay less than $1400,
there is only about a 2.5% chance that the mean
amount a sample of 25 college students will pay is
less than $1400.

Because there is only a 2.5% chance
that the mean amount a sample of 25 college
students will pay is less than $1400, this is an
unusual event.
]]
]

---

## Normal Approximation to a Binomial Distribution
.row[.col-7[
If `$np \geq 5$` and `$nq \geq 5$`, then the binomial random
variable `$x$` is approximately normally distributed with

*  mean `$\mu = np$`
*  standard deviation `$\sigma = \sqrt{npq}$`

where `$n$` is the number of independent trials,

`$p$` is the probability of success in a single trial, and

`$q$` is the probability of failure in a single trial.	
]]

---

## Normal Approximation to a Binomial Distribution
.row[.col-7[
Binomial distribution with `$p = 0.25$`, `$q = 1 - 0.25$`, and  
]]
.row[.col-4[
`$n = 4$`,
<img src="06.pdf-3_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" />
]
.col-4[
`$n = 10$`,
<img src="06.pdf-3_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" />
]]
.row[
.col-4[
`$n = 25$` and
<img src="06.pdf-3_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" />
]
.col-4[
`$n = 50$`.
<img src="06.pdf-3_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" />
]
.col-4[
As `$n$` increases the histogram approaches a normal curve.
]]

---

1. In a survey of 8- to 18-year-old heavy media users
in the United States, 47% said they get fair or poor
grades (C and below). You randomly select
forty-five 8- to 18-year-old heavy media users in
the United States and ask them whether they get
fair or poor grades.
]]
]
---

## Approximating a Binomial Distribution
.tip[
.row[.col-6[
In this binomial experiment, 
`\begin{align*}
n&=45\\
p&=0.47\\
q&=0.53\\
np&= 45\times 0.47 = 21.15\\
nq&= 45\times 0.53 = 23.85\\
\tilde{\sigma} &= \sqrt{45\cdot0.47\cdot 0.53} = 3.348
\end{align*}`

]
.col-6[
.question[What is the probability that fewer than 20 of them respond yes?
]

Normal approximation

```r
pnorm(19.5,21.15,3.348) %>%  round(3)
```

```
## [1] 0.311
```

Exact binomial probability

```r
pbinom(19,45,0.47) %>%  round(3)
```

```
## [1] 0.312
```
]]
]

---

## Approximating a Binomial Distribution
.tip[
.row[.col-7[
Determine whether you can use a normal distribution to
approximate the distribution of `$x$`, the number of people
who reply yes. If you can, find the mean and standard
deviation. If you cannot, explain why.
	
2. In a survey of 8- to 18-year-old light media users in
the United States, 23% said they get fair or poor
grades (C and below). You randomly select twenty
8- to 18-year-old light media users in the United
States and ask them whether they get fair or poor
grades.
]]
]
---

## Approximating a Binomial Distribution
.tip[
.row[.col-6[
In this binomial experiment, 
`\begin{align*}
n&=20\\
p&=0.23& & q=0.77\\
np&= 20\times 0.23 = 4.6&\\
nq&= 20\times 0.77 = 15.4
\end{align*}`

Because `$np < 5$`, you should not use a normal distribution to approximate the distribution of `$x$`.

]
.col-6[

]]
]
---

## Correction for Continuity
.row[.col-6[
A binomial distribution is
discrete and can be represented
by a probability histogram.

To calculate exact binomial
probabilities, the binomial
formula is used for each
value of x and the results
are added.

Geometrically this corresponds
to adding the areas of bars in
the probability histogram.

When you use a continuous
normal distribution to
approximate a binomial
probability, you need to
move 0.5 unit to the left
and right of the midpoint
to include all possible
x-values in the interval
(continuity correction).

]
.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-34-1.png" width="100%" style="display: block; margin: auto;" />
]]

---

## Using a Continuity Correction
.tip[
.row[.col-7[
Use a continuity correction to convert each binomial
quantile to a normal distribution quantile.

The probability of getting between 270 and 310
successes, inclusive.

Solution:  
*  The discrete values are `$270, 271, \ldots, 310$`.
*  The corresponding interval for the continuous normal
distribution is `$269.5 < x < 310.5$`. The normal
distribution probability is `$P(269.5 < x < 310.5)$`.
]]
]

---

## Using a Normal Distribution to Approximate Binomial Probabilities
.row[.col-7[
1. Verify that the binomial distribution applies.  
  Specify `$n$`, `$p$`, and `$q$`.
2. Determine if you can use the normal distribution to approximate `$x$`, the binomial
variable.
  Is `$np \leq 5$`? Is `$nq \leq 5$`?
3. Find the mean `$\mu$` and standard deviation `$\sigma$` for the
distribution.  
`$\mu = np$`, `$\sigma = \sqrt{npq}$`
4. Apply the appropriate continuity correction.
Shade the corresponding area under the normal curve.  
Add 0.5 to (or subtract 0.5 from) the quantile.
5. Find the probability.
]]

---

## Approximating a Binomial Probability
.tip[
.row[.col-6[
In a survey of 8 to 18-year-old heavy media users in
the United States, 47% said they get fair or poor grades
(C and below). You randomly select forty-five 8 to
18-year-old heavy media users in the United States and
ask them whether they get fair or poor grades. What is
the probability that fewer than 20 of them respond yes?

`$\mu = 45\times 0.47 = 21.15$`,  
`$\sigma=\sqrt{45\times 0.47\times 0.53} \approx 3.35$`

Apply the continuity correction (note the probability is for *fewer than 20*):

`$P(x<20-0.5)=P(x<19.5)$`
]

.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-35-1.png" width="80%" style="display: block; margin: auto;" />
Normal approximation

```r
pnorm(19.5,21.15,3.35) %>%  round(3)
```

```
## [1] 0.311
```
Exact binomial

```r
pbinom(19,45,0.47) %>%  round(3)
```

```
## [1] 0.312
```
]
]
]
---

## Approximating a Binomial Probability
.tip[
.row[.col-6[
A study on aggressive driving found that 47% of drivers
say they have yelled at another driver. You randomly
select 200 drivers in the United States and ask them
whether they have yelled at another driver. What is the
probability that at least 100 drivers will say yes, they
have yelled at another driver?
	
`$\mu = 200\times 0.47 = 94$`, `$\sigma=\sqrt{200\times 0.47\times 0.53} \approx 7.06$`
	
Apply the continuity correction (note the probability is for *at least 100*):  
`$P(x\geq 100 -0.5)=P(x> 99.5)$`
]
.col-6[
<img src="06.pdf-3_files/figure-html/unnamed-chunk-38-1.png" width="80%" style="display: block; margin: auto;" />
Normal approximation

```r
1-pnorm(99.5,94,7.06) %>%  round(3)
```

```
## [1] 0.218
```
Exact binomial

```r
1-pbinom(99,200,0.47) %>%  round(3)
```

```
## [1] 0.218
```
]
]
]
---

## Sampling Distribution of a Proportion
.row[.col-7[
The sample proportion is an estimator of a population proportion `$p$` of successes.

For the sample proportion, we count the number of successes in a sample and compute:	
`$$\hat{P}=\frac{x}{n}$$`
With `$x$` the number of successes and `$n$` the number of trials, i.e. the sample size.
]]

---

## Sampling Distribution of a Proportion
.row[.col-7[
Using the laws of expected value and variance, we can determine the mean, variance, and standard deviation of `$\hat{P}$`:

`\begin{align*}
E(\hat{P}) &= p\\
V(\hat{P}) &= \sigma^2_{\hat{P}} = \frac{p(1-p)}{n}\\
\sigma_{\hat{P}} &= \sqrt{\frac{p(1-p)}{n}}
\end{align*}`
]
.col-5[
The standard error of `$\hat{P}$` is called the standard error of the proportion.

`$\hat{P}$` is approximately normal distributed, thus sample proportions can be standardized to a standard normal distribution:
`$$Z=\frac{\hat{P}-p}{\sqrt{p(1-p)/n}}$$`
]]

---

## Example: Sampling Distribution of a Proportion
.tip[
.row[.col-7[
In the last election a state representative received 52% of
the votes cast.

One year after the election the representative organized a
survey that asked a random sample of 300 people
whether they would vote for him in the next election.

If we assume that his popularity has not changed what is
the probability that more than half of the sample would
vote for him?
]]
]
---

## Sampling Distribution of a Proportion
.tip[
.row[.col-5[
The number of respondents who would have voted for the representative is a binomial variable with `$n=300$` and `$p=0.52$`.
	
We want to determine whether the probability that the sample proportion is greater than 50%: `$P(\hat{P}> 0.5)$`

Exact binomial

```r
1-pbinom(150,300,0.52) %>% round(3)
```

```
## [1] 0.738
```
]
.col-7[
Standard Normal approximation

`\begin{align*}
\sigma_{\hat{P}}&= \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.52\times 0.48}{300}}\\
&= 0.0288\\
P(\hat{P}> 0.5) &= P\left(\frac{\hat{P}-p}{\sqrt{p(1-p)/n}} > \frac{0.50-0.52}{0.0288}\right) \\
&= P(Z > -0.69) \approx 0.76
\end{align*}`

```r
1-pnorm(-0.02/(sqrt(0.52*0.48/300)),0,1) %>% 
  round(3)
```

```
## [1] 0.756
```

```r
1-pnorm((150.5/300-.52)/(sqrt(0.52*0.48/300)),0,1) %>% 
  round(3)
```

```
## [1] 0.737
```

]]]