class: middle, title-slide # Probability Distributions and the Central Limit Theorem III ## Sampling and the Central Limit Theorem ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- ## Parameter and Sample Statistics .row[.col-7[ **Parameter** A measure computed from the entire population. As long as the population does not change, the value of the parameter will not change. **Simple Random Sample** A sample selected in such a manner that each possible sample of a given size has an equal chance of being selected. **Sampling distribution** The probability distribution of a sample statistic that is formed when random samples of size `\(n\)` are repeatedly taken from a population. If the **sample statistic** is the sample mean, then the distribution is the Sampling distribution of sample means ]] --- ## Sampling Error .row[.col-7[ The difference between a measure computed from a sample (a statistic) and the corresponding measure computed from the population (a parameter) is the **sampling error**. * The size of the sampling error depends on which sample is selected. * The sampling error may be positive or negative. * There is potentially a different `\(\bar{x}\)` for each sample. ]] --- .tip[ ## Example: Sampling Error .row[.col-7[ It is known that the population mean age for employees working at a major automaker is 44.5 years. Suppose a random sample of 30 employees is selected and the sample mean age for these employees is 38 years, calculate the sampling error. Sampling error: `\(\bar{x}-\mu = 38-44.5=-6.5\)` years The sample mean provided an average age that is 6.5 years less than the population mean. ]]] --- ## The Role of Sample Size .row[.col-7[ A real estate development company builds office buildings. They have built a total of N = 12 buildings. The number of square feet in each building is shown as follows: ![](img/samp1.png) `\(\mu = 158972\)` sq. feet ]] --- ## The Role of Sample Size .row[.col-7[ A potential customer plans to select a sample `\(n = 5\)` office complexes from the 12. Number of possible samples of size `\(n = 5\)`: ]] .row[.col-7[ `$$C_5^{12}= \binom{12}{5} = \frac{12!}{5!(12-5)!} = 792$$` These 792 samples could yield many possible sample means and thus result in many possible sampling error amounts. We might be interested in the samples that provide the smallest and largest sample means. ] .col-5[ ```r choose(12,5) ``` ``` ## [1] 792 ``` ]] --- ## The Role of Sample Size .row[.col-6[ The five smallest office complexes in terms of square feet are: ![](img/samp2.png) `\begin{align*} \bar{x} &= 108232 \text{ sq. feet}\\ \bar{x}-\mu &= 108232-158972\\ &= -50740 \end{align*}` ] .col-6[ The five largest office complexes in terms of square feet are: ![](img/samp3.png) `\begin{align*} \bar{x} &= 210900 \text{ sq. feet}\\ \bar{x}-\mu &= 210900-158972\\ &= 51928 \end{align*}` ]] --- ## The Role of Sample Size .row[.col-7[ The range of sampling error for samples with `\(n = 5\)`: `\(-50740\)` to `\(51928\)` What happens to the range of sampling error if the sample size is reduced to `\(n = 3\)`?: `\(-65185\)` to `\(81128\)` ![](img/samp4.png) * The range of potential sampling error is less for larger sample sizes. * The potential for extreme sampling error is reduced when using larger sample sizes. ]] --- ## Sampling Distribution Simulation .row[.col-7[ Let `\(X\)` be a random sample and let `\(T = h(X)\)` denote some statistic. The sampling distribution of `\(T\)` is its probability distribution. ```r x <- sample(1:6, 1000, replace=TRUE) # pop avg: 3.5 mean(x) ``` ``` ## [1] 3.564 ``` The standard deviation of a sampling distribution is called the standard error. ```r # pop var = 2.92, sd = 1.71 sd(x) ``` ``` ## [1] 1.728262 ``` ] .col-5[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-5-1.png" width="98%" style="display: block; margin: auto;" /> ] ] --- ## Sampling, Permutations, Combinations .row[.col-6[ When we draw out of a set of `\(n\)` elements `\(k\)` elements we can count the number of different outcomes: ] .col-6[ Drawing 2 elements out of a set with 6 elements with replacement = throwing 2 dice ]] | Sampling method | Count of outcomes | Throwing 2 dice | Count of outcomes | |---|---|---|---:| |Ordered sampling with replacement| `\(n^k\)`|Two different dice| `\(6^2 =36\)`| |Ordered sampling without replacement| `\(P^n_k=\frac{n!}{(n−k)!}\)` |?| `\(\frac{6!}{(6−2)!}=\)` 30 | |unordered sampling without replacement| `\(C^n_k=\binom{n}{k}=\frac{n!}{k!(n−k)!}\)` |?| `\(\binom{6}{2}=\)` 15 | |unordered sampling with replacement| `\(\binom{n+k−1}{k}\)`|Two identical dice| `\(\binom{6+2−1}{2}=\)` 21| --- ### How to compute the number of ### Permutaions and Combinations in R .row[ .col-7[ Ordered Permutation (with replacement) `\(n^k\)` ] .col-5[ ```r n<-6; k<-2 n^k ``` ``` ## [1] 36 ``` ] ] .row[ .col-7[ Ordered Permutation (without replacement): `\(k\)`-permutations of `\(n\)` `\(P^n_k=\frac{n!}{(n−k)!}\)` ] .col-5[ ```r factorial(n)/factorial(n-k) ``` ``` ## [1] 30 ``` ] ] .row[ .col-7[ Unordered combinations (without replacement) k-combination of an `\(n\)`-set `\(C^n_k = \frac{P^n_k}{P^k_k} =\frac{n!}{k!(n−k)!} =\binom{n}{k}\)` ] .col-5[ ```r choose(n,k) ``` ``` ## [1] 15 ``` ] ] .row[ .col-7[ Unordered combinations (with replacement) `\(C^{n+k-1}_k = \binom{n+k−1}{k}\)`] .col-5[ ```r choose(n+k-1,k) ``` ``` ## [1] 21 ``` ] ] --- ## Simulating throwing two dice .row[ .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-10-1.png" width="80%" style="display: block; margin: auto;" /> ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-11-1.png" width="80%" style="display: block; margin: auto;" /> ```r mean(Xbar) ``` ``` ## [1] 3.50485 ``` ```r sd(Xbar) ``` ``` ## [1] 1.204884 ``` Notice that the sd of the means is smaller than the sd of the population (1.71). ]] --- ## Throwing 5, 10, or 25 dice .row[ .col-4[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-13-1.png" width="98%" style="display: block; margin: auto;" /> ] .col-4[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-14-1.png" width="98%" style="display: block; margin: auto;" /> ] .col-4[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-15-1.png" width="98%" style="display: block; margin: auto;" /> ] ] .row[ .col-7[ Notice that the distribution of the mean gets more concentrated the more dice you throw, i.e. elements are sampled. ] ] --- ## Properties of Sampling Distributions ## of Sample Means .row[.col-7[ 1. The mean of the sample means, `\(\mu_{\bar{x}}\)`, is equal to the population mean `\(\mu\)`. `$$\mu_{\bar{x}}=\mu$$` 2. The standard deviation of the sample means, `\(\sigma_{\bar{x}}\)`, is equal to the population standard deviation, `\(\sigma\)`, divided by the square root of the sample size, `\(n\)`. `$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$` Called the **standard error of the mean**. The larger the sample, the smaller the standard error of the mean. ]] --- .row[.col-7[ ## The Central Limit Theorem If the population itself is normally distributed, the sampling distribution of the sample means is normally distribution for any sample size `\(n\)`. ] .col-5[ ![](img/clt2.png) ]] .row[.col-7[ ## The Central Limit Theorem If samples of sufficient size ( `\(n \geq 30\)` is often considered sufficient) are drawn from any population with mean = `\(\mu\)` and standard deviation = `\(\sigma ,\)` then the sampling distribution of the sample means approximates a normal distribution. The greater the sample size, the better the approximation. ] .col-5[ ![](img/clt1.png) ]] --- ## The Central Limit Theorem .row[.col-7[ In either case, the sampling distribution of sample means has a mean equal to the population mean. `$$\text{Mean of the sample means } \mu_{\bar{x}} = \mu$$` The sampling distribution of sample means has a variance equal to `\(1/n\)` times the variance of the population and a standard deviation equal to the population standard deviation divided by the square root of `\(n\)`. ] .col-5[ **Variance of the sample means** `$$\sigma^2_{\bar{x}} = \frac{\sigma^2}{n}$$` **Standard deviation of the sample means** (Standard error of the mean) `$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$` ]] --- ### Sampling Distribution of the Mean of Any Population Example: exponential distribution .row[.col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" /> ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-17-1.png" width="80%" style="display: block; margin: auto;" /> ]] .row[.col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-18-1.png" width="80%" style="display: block; margin: auto;" /> ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-19-1.png" width="80%" style="display: block; margin: auto;" /> ]] --- ## Interpreting the Central Limit Theorem .tip[ .row[.col-7[ A study analyzed the sleep habits of college students. The study found that the mean sleep time was 6.8 hours, with a standard deviation of 1.4 hours. Random samples of 100 sleep times are drawn from this population, and the mean of each sample is determined. ]] .row[.col-7[ ![](img/clt3.png) ] .col-5[Find the mean and standard deviation of the sampling distribution of sample means. Then sketch a graph of the sampling distribution. ]] ] --- ## Interpreting the Central Limit Theorem .tip[ .row[.col-5[ `$$\mu_{\bar{x}} = \mu = 6.8$$` `$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{1.4}{\sqrt{100}}= 0.14$$` Since the sample size is greater than 30, the sampling distribution can be approximated by a normal distribution with a mean of 6.8 hours and a standard deviation of 0.14 hour. ] .col-7[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> ]] ] --- ## Interpreting the Central Limit Theorem .tip[ .row[.col-7[ The training heart rates of all 20-years old athletes are normally distributed, with a mean of 135 beats per minute and standard deviation of 18 beats per minute. Random samples of size 4 are drawn from this population, and the mean of each sample is determined. ]] .row[.col-7[ ![](img/clt5.png) ] .col-5[Find the mean and standard error of the mean of the sampling distribution. Then sketch a graph of the sampling distribution of sample means. ] ] ] --- ## Interpreting the Central Limit Theorem .tip[ .row[.col-5[ `$$\mu_{\bar{x}} = \mu = 135$$` `$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{18}{\sqrt{4}}= 9$$` Since the population is normally distributed, the sampling distribution of the sample means is also normally distributed. ] .col-7[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ]] ] --- ## Probability and the Central Limit Theorem .row[.col-6[ To transform `\(x\)` to a `\(z\)`-score $$ z = \frac{\text{Value - Mean}}{\text{Standard Error}} = \frac{\bar{x}-\mu}{\sigma_{\bar{x}}}=\frac{\bar{x}-\mu}{\sigma / \sqrt{n} }$$ ] .col-6[ .tip[Example You randomly select 50 drivers ages 16 to 19. What is the probability that the mean distance traveled each day is between 19.4 and 22.5 miles? Assume `\(\mu = 20.7\)` and `\(\sigma = 6.5\)` miles. From the Central Limit Theorem (sample size is greater than 30), the sampling distribution of sample means is approximately normal with `$$\mu_{\bar{x}} = \mu = 20.7$$` `$$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{6.5}{\sqrt{50}} \approx 0.92$$` ]]] --- .tip[ ## Probability and the Central Limit Theorem .row[.col-5[ The z-scores that correspond to sample means of 19.4 and 22.5 miles are `$$z_1 =\frac{19.4 - 20.7}{6.5/\sqrt{50}} \approx -1.41$$` `$$z_2 =\frac{22.5 - 20.7}{6.5/\sqrt{50}} \approx 1.96$$` ] .col-7[ The probability that the mean distance driven each day by the sample of 50 people is between 19.4 and 22.5 miles is `\begin{align*} P(19.4 < \bar{x} < 22.5) & = P(-1.41 < z < 1.96)\\ &= P(z<1.96)-P(z< -1.41)\\ &=0.9750 - 0.0893\\ &=0.8957 \end{align*}` ]]] --- .tip[ ## Probability and the Central Limit Theorem .row[.col-7[ Of all samples of 50 drivers ages 16 to 19, about 90% will drive a mean distance each day between 19.4 and 22.5 miles, as shown in the graph. This implies that, assuming `\(\mu=20.7\)` is correct, about 10% of such samples will lie outside the given interval. ]] .row[.col-7[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-5[ ```r (pnorm(1.96,0,1) -pnorm(-1.41,0,1)) %>% round(3) ``` ``` ## [1] 0.896 ``` ```r (pnorm(22.5,20.7,6.5/sqrt(50)) -pnorm(19.4,20.7,6.5/sqrt(50))) %>% round(3) ``` ``` ## [1] 0.896 ``` ]]] --- ## Finding probabilities of `\(x\)` and `\(\bar{x}\)` .tip[ .row[.col-7[ Some college students use credit cards to pay for school-related expenses. For this population, the amount paid is normally distributed, with a mean of $1615 and a standard deviation of $550. 1. What is the probability that a randomly selected college student, who uses a credit card to pay for school-related expenses, paid less than $1400? 2. You randomly select 25 college students who use credit cards to pay for school-related expenses. What is the probability that their mean amount paid is less than $1400? 3. Compare the probabilities from parts 1 and 2. ]] ] --- ## Finding probabilities of `\(x\)` and `\(\bar{x}\)` .tip[ .row[.col-7[ What is the probability that a randomly selected college student, who uses a credit card to pay for school-related expenses, paid less than $1400? In this case, you are asked to find the probability for a specific value (or rather range) of `\(x\)`: `$$P(P<1400)$$` `$$\mu = 1615, \quad \sigma=550$$` ```r pnorm(1400,1615,550) %>% round(3) ``` ``` ## [1] 0.348 ``` ]] ] --- ## Finding probabilities of `\(x\)` and `\(\bar{x}\)` .tip[ .row[.col-7[ You randomly select 25 college students who use credit cards to pay for school-related expenses. What is the probability that their mean amount paid is less than $1400? In this case, you are asked to find the probability for sample mean of `\(x\)`: `$$P(P<1400)$$` `$$\mu = 1615, \quad \sigma=550, \quad \sigma_{\bar{x}}=\frac{550}{\sqrt{25}}$$` ```r pnorm(1400,1615,550/sqrt(25)) %>% round(3) ``` ``` ## [1] 0.025 ``` The probability that the mean credit card balance of the 25 sampled students is less than $1400 is 2.5%. ]] ] --- ## Finding probabilities of `\(x\)` and `\(\bar{x}\)` .tip[ .row[.col-7[ Although there is about a 35% chance that a college student who uses a credit card to pay for school-related expenses will pay less than $1400, there is only about a 2.5% chance that the mean amount a sample of 25 college students will pay is less than $1400. Because there is only a 2.5% chance that the mean amount a sample of 25 college students will pay is less than $1400, this is an unusual event. ]] ] --- ## Normal Approximation to a Binomial Distribution .row[.col-7[ If `\(np \geq 5\)` and `\(nq \geq 5\)`, then the binomial random variable `\(x\)` is approximately normally distributed with * mean `\(\mu = np\)` * standard deviation `\(\sigma = \sqrt{npq}\)` where `\(n\)` is the number of independent trials, `\(p\)` is the probability of success in a single trial, and `\(q\)` is the probability of failure in a single trial. ]] --- ## Normal Approximation to a Binomial Distribution .row[.col-7[ Binomial distribution with `\(p = 0.25\)`, `\(q = 1 - 0.25\)`, and ]] .row[.col-4[ `\(n = 4\)`, <img src="06.pdf-3_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-4[ `\(n = 10\)`, <img src="06.pdf-3_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ]] .row[ .col-4[ `\(n = 25\)` and <img src="06.pdf-3_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-4[ `\(n = 50\)`. <img src="06.pdf-3_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> ] .col-4[ As `\(n\)` increases the histogram approaches a normal curve. ]] --- ## Approximating a Binomial Distribution .tip[ .row[.col-7[ Determine whether you can use a normal distribution to approximate the distribution of `\(x\)`, the number of people who reply yes. If you can, find the mean and standard deviation. If you cannot, explain why. 1. In a survey of 8- to 18-year-old heavy media users in the United States, 47% said they get fair or poor grades (C and below). You randomly select forty-five 8- to 18-year-old heavy media users in the United States and ask them whether they get fair or poor grades. ]] ] --- ## Approximating a Binomial Distribution .tip[ .row[.col-6[ In this binomial experiment, `\begin{align*} n&=45\\ p&=0.47\\ q&=0.53\\ np&= 45\times 0.47 = 21.15\\ nq&= 45\times 0.53 = 23.85\\ \tilde{\sigma} &= \sqrt{45\cdot0.47\cdot 0.53} = 3.348 \end{align*}` <img src="06.pdf-3_files/figure-html/unnamed-chunk-30-1.png" width="80%" style="display: block; margin: auto;" /> ] .col-6[ .question[What is the probability that fewer than 20 of them respond yes? ] Normal approximation ```r pnorm(19.5,21.15,3.348) %>% round(3) ``` ``` ## [1] 0.311 ``` Exact binomial probability ```r pbinom(19,45,0.47) %>% round(3) ``` ``` ## [1] 0.312 ``` ]] ] --- ## Approximating a Binomial Distribution .tip[ .row[.col-7[ Determine whether you can use a normal distribution to approximate the distribution of `\(x\)`, the number of people who reply yes. If you can, find the mean and standard deviation. If you cannot, explain why. 2. In a survey of 8- to 18-year-old light media users in the United States, 23% said they get fair or poor grades (C and below). You randomly select twenty 8- to 18-year-old light media users in the United States and ask them whether they get fair or poor grades. ]] ] --- ## Approximating a Binomial Distribution .tip[ .row[.col-6[ In this binomial experiment, `\begin{align*} n&=20\\ p&=0.23& & q=0.77\\ np&= 20\times 0.23 = 4.6&\\ nq&= 20\times 0.77 = 15.4 \end{align*}` Because `\(np < 5\)`, you should not use a normal distribution to approximate the distribution of `\(x\)`. ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-33-1.png" width="80%" style="display: block; margin: auto;" /> ]] ] --- ## Correction for Continuity .row[.col-6[ A binomial distribution is discrete and can be represented by a probability histogram. To calculate exact binomial probabilities, the binomial formula is used for each value of x and the results are added. Geometrically this corresponds to adding the areas of bars in the probability histogram. When you use a continuous normal distribution to approximate a binomial probability, you need to move 0.5 unit to the left and right of the midpoint to include all possible x-values in the interval (continuity correction). ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-34-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## Using a Continuity Correction .tip[ .row[.col-7[ Use a continuity correction to convert each binomial quantile to a normal distribution quantile. The probability of getting between 270 and 310 successes, inclusive. Solution: * The discrete values are `\(270, 271, \ldots, 310\)`. * The corresponding interval for the continuous normal distribution is `\(269.5 < x < 310.5\)`. The normal distribution probability is `\(P(269.5 < x < 310.5)\)`. ]] ] --- ## Using a Normal Distribution to Approximate Binomial Probabilities .row[.col-7[ 1. Verify that the binomial distribution applies. Specify `\(n\)`, `\(p\)`, and `\(q\)`. 2. Determine if you can use the normal distribution to approximate `\(x\)`, the binomial variable. Is `\(np \leq 5\)`? Is `\(nq \leq 5\)`? 3. Find the mean `\(\mu\)` and standard deviation `\(\sigma\)` for the distribution. `\(\mu = np\)`, `\(\sigma = \sqrt{npq}\)` 4. Apply the appropriate continuity correction. Shade the corresponding area under the normal curve. Add 0.5 to (or subtract 0.5 from) the quantile. 5. Find the probability. ]] --- ## Approximating a Binomial Probability .tip[ .row[.col-6[ In a survey of 8 to 18-year-old heavy media users in the United States, 47% said they get fair or poor grades (C and below). You randomly select forty-five 8 to 18-year-old heavy media users in the United States and ask them whether they get fair or poor grades. What is the probability that fewer than 20 of them respond yes? `\(\mu = 45\times 0.47 = 21.15\)`, `\(\sigma=\sqrt{45\times 0.47\times 0.53} \approx 3.35\)` Apply the continuity correction (note the probability is for *fewer than 20*): `\(P(x<20-0.5)=P(x<19.5)\)` ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-35-1.png" width="80%" style="display: block; margin: auto;" /> Normal approximation ```r pnorm(19.5,21.15,3.35) %>% round(3) ``` ``` ## [1] 0.311 ``` Exact binomial ```r pbinom(19,45,0.47) %>% round(3) ``` ``` ## [1] 0.312 ``` ] ] ] --- ## Approximating a Binomial Probability .tip[ .row[.col-6[ A study on aggressive driving found that 47% of drivers say they have yelled at another driver. You randomly select 200 drivers in the United States and ask them whether they have yelled at another driver. What is the probability that at least 100 drivers will say yes, they have yelled at another driver? `\(\mu = 200\times 0.47 = 94\)`, `\(\sigma=\sqrt{200\times 0.47\times 0.53} \approx 7.06\)` Apply the continuity correction (note the probability is for *at least 100*): `\(P(x\geq 100 -0.5)=P(x> 99.5)\)` ] .col-6[ <img src="06.pdf-3_files/figure-html/unnamed-chunk-38-1.png" width="80%" style="display: block; margin: auto;" /> Normal approximation ```r 1-pnorm(99.5,94,7.06) %>% round(3) ``` ``` ## [1] 0.218 ``` Exact binomial ```r 1-pbinom(99,200,0.47) %>% round(3) ``` ``` ## [1] 0.218 ``` ] ] ] --- ## Sampling Distribution of a Proportion .row[.col-7[ The sample proportion is an estimator of a population proportion `\(p\)` of successes. For the sample proportion, we count the number of successes in a sample and compute: `$$\hat{P}=\frac{x}{n}$$` With `\(x\)` the number of successes and `\(n\)` the number of trials, i.e. the sample size. ]] --- ## Sampling Distribution of a Proportion .row[.col-7[ Using the laws of expected value and variance, we can determine the mean, variance, and standard deviation of `\(\hat{P}\)`: `\begin{align*} E(\hat{P}) &= p\\ V(\hat{P}) &= \sigma^2_{\hat{P}} = \frac{p(1-p)}{n}\\ \sigma_{\hat{P}} &= \sqrt{\frac{p(1-p)}{n}} \end{align*}` ] .col-5[ The standard error of `\(\hat{P}\)` is called the standard error of the proportion. `\(\hat{P}\)` is approximately normal distributed, thus sample proportions can be standardized to a standard normal distribution: `$$Z=\frac{\hat{P}-p}{\sqrt{p(1-p)/n}}$$` ]] --- ## Example: Sampling Distribution of a Proportion .tip[ .row[.col-7[ In the last election a state representative received 52% of the votes cast. One year after the election the representative organized a survey that asked a random sample of 300 people whether they would vote for him in the next election. If we assume that his popularity has not changed what is the probability that more than half of the sample would vote for him? ]] ] --- ## Sampling Distribution of a Proportion .tip[ .row[.col-5[ The number of respondents who would have voted for the representative is a binomial variable with `\(n=300\)` and `\(p=0.52\)`. We want to determine whether the probability that the sample proportion is greater than 50%: `\(P(\hat{P}> 0.5)\)` Exact binomial ```r 1-pbinom(150,300,0.52) %>% round(3) ``` ``` ## [1] 0.738 ``` ] .col-7[ Standard Normal approximation `\begin{align*} \sigma_{\hat{P}}&= \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.52\times 0.48}{300}}\\ &= 0.0288\\ P(\hat{P}> 0.5) &= P\left(\frac{\hat{P}-p}{\sqrt{p(1-p)/n}} > \frac{0.50-0.52}{0.0288}\right) \\ &= P(Z > -0.69) \approx 0.76 \end{align*}` ```r 1-pnorm(-0.02/(sqrt(0.52*0.48/300)),0,1) %>% round(3) ``` ``` ## [1] 0.756 ``` ```r 1-pnorm((150.5/300-.52)/(sqrt(0.52*0.48/300)),0,1) %>% round(3) ``` ``` ## [1] 0.737 ``` ]]]