class: middle, title-slide # Experiment and Survey Design ## Randomization and Measurement ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- ## Inference from Sample to Population .row[.col-6[ **Statistical inference** is a set of techniques for drawing conclusions from a sample to the population the sample came from. **Estimation** is our favored approach to statistical inference. ] .col[ A **population** is a usually very large set of people (or other units of observation) about which we are interested in drawing conclusions. A **sample** is a set of people ((or other units of observation)) selected from a population. A **descriptive statistic** is a summary number, such as the sample mean, that tells us about a set of data. An **inferential statistic**, such as a CI, is calculated from sample data and tells us about the underlying population. ]] --- ## Random Sampling .row[.col-5[ A **random sample** is chosen from the population such that all population members have the same probability of being chosen, and all sample members are chosen independently. The observations in a sample are **independent and identically distributed** (iid) if * each observations comes from the same probability distribution * all observations are mutually independent ] .col-7[ Using a **random sample** is usually the best way to obtain a sample likely to be **representative** of the population. The assumption of iid is important for the Central Limit Theorem. The generalization of **exchangeable random** variables is often sufficient and more easily met. * A sequence of random variables that are iid is exchangeable. * An infinite sequence of exchangeable random variables is conditionally independent and identically-distributed, given the underlying distributional form. An exchangeable sequence need not itself be unconditionally iid. This equivalence does not hold for finite sequences of exchangeable random variables. ]] --- ## Sampling .row[.col-6[ A **convenience sample** is a practically achievable sample from the population. A **statistical model** is a set of assumptions. Random sampling is one assumption in the statistical model we use for calculation of inferential statistics. ] .col-6[If random sampling is in doubt, we need to judge a sample to be reasonably representative of a population before using inferential statistics to reach a conclusion about that population. ]] --- ## Comparisons .row[.col-7[ The **independent variable** (IV) is the variable whose values are chosen or manipulated by the researcher. **Levels** are the different values taken by the independent variable, for example, Pen and Laptop. Levels are also called **conditions**, or **treatments**. A **control condition** is the condition that provides a baseline or starting point for a comparison. A control group is a group that experiences the control condition. The **dependent variable** (DV) is the variable that’s measured in the study and provides the data to be analyzed. ]] --- ## Experimental and Non-Experimental Research .row[.col-7[ **Experimental research** uses random assignment of participants to groups or conditionsto the different levels of the IV that is being manipulated. It can justify a causal conclusion. Making a manipulation means we are taking an experimental approach * Randomly assigning participants to a control and treatment group **Non-experimental research** uses pre-existing groups, not formed by random assignment or manipulation of the IV. It cannot justify a causal conclusion, because there could easily be confounds. Observing pre-existing groups that differ in one dimension is a non-experimental approach * They may differ in other dimensions as well ] .col-5[ If two groups or conditions differ on only one factor, then that factor is the likely cause of any observed difference between the two. **Random assignment** of participants to groups or conditions gives the best grounds for a conclusion of causality. A **confound** is an unwanted difference between groups, which is likely to limit the conclusions we can draw from a study. Data analysis is often the same for an experimental and non-experimental study, but conclusions usually must be quite different: Only an experimental study can justify a causal conclusion. ] ] --- ## Random Representative Samples .row[.col-7[ An experiment can go astray when we **confound** competing sources of variation. If we don’t block our population (deliberately drawing samples from each important subgroup), we run the risk of obtaining a sample in which members of an important subgroup are absent or underrepresented. ]] --- ## Treatment Allocation .row[.col-7[ If the members of a sample taken from a stratum are to be exposed to differing test conditions or treatments, then we must make sure that treatment allocation is random. The only safe system is one in which the assignment is made on the basis of random numbers. ]] --- ## Choosing a Random Sample ### Ensuring Your Observations Are Independent .row[.col-7[ Independence of the observations is essential to most statistical procedures. Any kind of dependence, even if only partial, can make the analysis suspect. Any group or cluster of individuals who live, work, study, or pray together may fail to be representative for any or all of the following reasons: - Shared exposure to the same physical or social environment - Self-selection in belonging to the group - Sharing of behaviors, ideas, or diseases among members of the group. ]] --- ## What is wrong? .row[.col-7[ .question[ 1. Donald routinely tested new drugs for toxicity by injecting them in mice. In each case, he’d take five animals from a cage and inject them with the drug. To be on the safe side, he’d take the next five animals from the cage, inject them with a saline solution and use them for comparison purposes. 2. Reasoning, correctly, that he’d find more students home at dinner-time, Tom brought a set of survey forms back to his fraternity house and interviewed his frat brothers one by one at the dinner table. 3. Contrary to what one would expect from the advances in medical care, there were 2.1 million deaths from all causes in the U.S. in 1985, compared to 1.7 million in 1960. ]]] --- ## What Are You Going to Measure? .row[.col-7[ **direct measurement** - yield more accurate data **surrogate response** - are less costly or less time-consuming to measure than the actual variable of interest. - think of the canary in the coal mine .tip[ After you formulate your hypothesis and all of the associated alternatives for your experiment, Decide on the variables you will measure. List possible experimental findings along with the conclusions you would draw and the actions you would take for each possible outcome. ] ]] --- ## Measurement .row[.col-7[ A **construct** is the underlying characteristic we wish to study. Anxiety, well-being, and confidence are examples of constructs. A measure **operationalizes** a construct if it provides a practical way of measuring that construct. For instance, the SAT score operationalizes student aptitude. ]] .row[.col-7[ Two basic features of a good measure are **reliability** and **validity**. The **reliability** of a measure is its repeatability, the extent to which we get the same or a similar value if we measure again. The **validity** of a measure is the extent to which it measures what it is designed to measure. ] .col-5[ .question[ Is the number of words in a term paper a reliable measure? Is it a valid measure of the academic quality of the paper? Suggest a better measure of the academic quality of a paper. ] ]] --- ## Measurement .row[.col-7[ A measure has **nominal** or **categorical** scaling if it comprises category labels, with no sense of order or quantity. For example, ice cream flavors. ] .col-5[ With nominal data, all we can do is record the **frequency** of cases in each category ]] .row[.col-7[ A measure has **ordinal** scaling if it gives information about order, meaning that increasing numbers represent increasing (or decreasing) amounts of whatever is being measured. For example, a ranking of sports teams is an ordinal measure. ] .col-5[ With ordinal data, we can arrange values in order, but can’t calculate a mean. ]] .row[.col-7[ A measure has **interval** scaling if we are willing to assume that all unit intervals anywhere on the scale are equivalent. For example, birth year and longitude are interval measures. ] .col-5[ With interval data, we can calculate means, but not ratios or percentages. ]] .row[.col-7[ A measure has **ratio** scaling if it has a meaningful zero, as well as distance. For example, length, mass, and the time taken to complete a task are all ratio measures. ] .col-5[ With a ratio measure we can calculate means, ratios, and percentages. ]] --- ## Measurement .row[.col-7[ .question[ A grade point average is calculated by translating A = 4, B = 3, etc. What assumption is necessary to justify that calculation? To what extent is it reasonable? Explain. ] ]] --- ## **Selection** can be problematic for research .row[ .col-6[ **Publication bias** is the selection of which studies to make available according to the results they obtain. Typically, studies finding large or striking results are more likely to be published. Studies that do not achieve statistical significance are less likely to be published—the **file drawer effect**. <br/> **Selection of what to report** about a study is the second problematic type of selection. Fully detailed reporting is needed for us to have the full story, and also for close replication to be feasible. ] .col-6[ Has it been replicated? A **close replication** uses a new sample, but otherwise is as similar as possible to the original study. It’s also called an **exact** or **literal** replication A more **distant** replication is deliberately somewhat different from the original study. It’s also called a **modified** or **conceptual replication**. ] ] .row[ .col-5[ ]] --- ## **Selection** can be problematic for research .row[ .col-6[ **Exploratory** or **post hoc** analysis (“post hoc” is Latin for “after the fact”) is not specified in advance. It risks merely telling us about sampling variability, but may provide valuable hints for further investigation. ] .col-6[ **Planned analysis** is specified in advance and provides the best basis for conclusions. A **data analysis plan** states, in advance of carrying out a study, the researcher’s predictions and full details of the intended data analysis. ]] .row[ .col-6[ **Cherry picking**, or **capitalizing on chance** is the choice to focus on one among a number of results because it is the largest, or most interesting, when it may be merely a random fluctuation. Researchers feel enormous pressure to achieve p < .05 so their results have a chance of publication in a good journal, which is the key to obtaining a faculty job, tenure, and funding. ] .col-6[ A result predicted in advance is much more convincing than a result selected afterwards, which may be cherry picked. <br/> **Preregistration** is the lodging in advance, at a secure website and with a date stamp, of a fully detailed research plan, including a data analysis plan. ] ] --- ## Questionable Research Practices .row[.col-7[ Choices made after seeing the data are questionable research practices, and **p-hacking** is using these to achieve p < .05. The risk of **p-hacking** emphasizes the importance of **preregistration** of a full research plan, including data analysis plan. **Lack of replication**. Once a result has reached statistical significance and been published, it is regarded as established. There is little incentive to conduct replications, and replication studies are difficult to get published. Therefore, (too) few replications are carried out. ]] --- ## Experiments and Surveys .row[.col-7[ Data analysis starts with the design of the experiment. **Hawthorne effect** Do inexpensive improvements increase workers’ productivity? Different wall colors, motivational posters, ... everything increased productivity! But is was actually... the mere act of paying attention to a person that modifies their behavior. Think about placebo effects. We need good controls to compare our treatments to. ]] --- ## Designing an Experiment or Survey .row[.col-7[ Before you complete a single data collection 1. Set forth your **objectives** and the use you plan to make of your research. 2. Define the **population(s)** to which you will apply the results of your analysis. `\(\rightarrow\)` Sample from the Right Population. 3. List all possible **sources of variation**. ]] .row[.col-7[ <ol start=4> <li> Decide how you will cope with each source. Describe what you will <strong>measure</strong> and how you will measure it. Define the <strong>experimental unit</strong> and all endpoints. </li></ol> ] .col-5[ #### The Experimental Unit What is an independent Observation? .question[ Suppose we are testing the effect of a topical ointment on pink eye. Is each eye a separate experimental unit or is each patient? ] ] ] --- ## Designing an Experiment or Survey .row[.col-7[ <ol start=5> <li>Formulate your <strong>hypothesis</strong> and all of the associated alternatives. Define your endpoints. List possible experimental findings along with the conclusions you would draw and the actions you would take for each of the possible results.</li> <li> Describe in detail how you intend to draw a <strong>representative random sample</strong> from the population.</li> <li> Describe how you will ensure the <strong>independence of your observations</strong>. Think about what might go wrong, think about what might reduce the chances of that problem happening and what you might do about it if it does.</li> ] .col-5[ #### Formulate Your Hypotheses * Numeric form * Meaningful alternative * possible to gather required data ] ] --- ## Coping with Variation .row[.col-7[ 1. Controlling. Making the environment for the study as uniform and homogeneous as possible 2. Blocking. Stratify the population into subgroups, compare only within subgroups. 3. Measuring. Use Covariates when blocking is not feasible 4. Randomizing. Randomly assign subjects to treatment within each block or subgroup ]] --- ## Matched Pairs .row[.col-7[ A good way to eliminate a source of variation and the errors in interpretation associated with it is through the use of matched pairs Example: Sales data for locations with standard menu and new sandwich | | A| B| C| D| E| F| G| H| |:--------|-----:|-----:|-----:|-----:|-----:|-----:|-----:|-----:| |New | 48722| 28965| 36581| 40543| 55423| 38555| 31778| 45643| |Standard | 46555| 28293| 37453| 38324| 54989| 35687| 32000| 43289| <img src="09.experiments_files/figure-html/unnamed-chunk-3-1.png" width="60%" style="display: block; margin: auto;" /> ]] --- .row[.col-7[ ```r t.test(New,Standard,paired=F,alternative="g") ``` ``` ## ## Welch Two Sample t-test ## ## data: New and Standard ## t = 0.27766, df = 13.981, p-value = 0.3927 ## alternative hypothesis: true difference in means is greater than 0 ## 95 percent confidence interval: ## -6426.242 Inf ## sample estimates: ## mean of x mean of y ## 40776.25 39573.75 ``` ] .col-5[ **Normal t-test** ]] <br/> .row[.col-7[ ```r t.test(New,Standard,paired=T,alternative="g") ``` ``` ## ## Paired t-test ## ## data: New and Standard ## t = 2.4704, df = 7, p-value = 0.0214 ## alternative hypothesis: true difference in means is greater than 0 ## 95 percent confidence interval: ## 280.3029 Inf ## sample estimates: ## mean of the differences ## 1202.5 ``` ] .col-5[ **T-test for paired data** ]] --- class: middle # Precision for Planning --- ## How large a sample? .row[.col-7[ The larger your sample the more precise will be your estimates. The more likely you will be able to detect small effect sizes. ]] .row[.col-7[ **The margin of error** is determined by 1. The true value of the parameter being tested 2. The variation of the observations. 3. The confidence level. **Power of a test** is additionally determined by 4. The significance level (corresponds to the confidence level for the margin of error). 4. The relative costs of the observations and of the losses associated with making Type I and Type II errors. 5. The method used for testing. ] .col-5[ **Target MoE** is the **precision** we want our study to achieve. * When planning research, consider what MoE a particular `\(N\)` is likely to give. * Use past research and perhaps the expected effect size to guide your choice of target MoE. ] ] --- ## Precision for planning ### Accuracy in parameter estimation .row[.col-7[ ...bases choice of `\(N\)` on the MoE the study is likely to give. Consider `\(N\)` for various values of your target MoE. **Assurance** is the probability, expressed as a percentage, that a study obtains MoE no more than target MoE. For precision for planning, use a **standardized effect size measure**. E.g. Cohen’s d, assumed to be in units of population SD, `\(\sigma\)`. For the paired design, precision is greatly influenced by the population correlation between the two measures. High correlation gives high precision, meaning a short CI. ] .col-5[ The relation between MoE and `\(N\)` should be the same for a study already completed and one yet to be run: A MoE that’s very different from what we’d expect might signal an error in our planning, doubt about our statistical model, or that something else strange is going on that we should investigate. ] ] --- ## Accuracy in parameter estimation: A single mean .row[.col-7[ Suppose the population mean is believed to be `\(\mu = 20\)`, and the population standard deviation is believed to be `\(\sigma = 2\)`; thus the population standardized mean is believed to be `\(d =10\)`. To determine the necessary sample size for a study so that the full width of the 95% interval obtained in the study will be no wider than 2: ]] .row[.col-6[ `$$MoE= z_{1-\alpha/2} \times \frac{\sigma}{\sqrt{N}}$$` `$$\frac{MoE}{z_{11-\alpha/2}\times\sigma} = \frac{1}{\sqrt{N}}$$` `$$N = \left(\frac{z_{1-\alpha/2}\times \sigma}{MoE}\right)^2$$` ] .col-6[ The MoE is half as long as the confidence interval, therefore: $$N= \left(\frac{1.96\times 2}{1}\right)^2 \approx 16 $$ ] ] --- ## Accuracy with assurance ### in parameter estimation: A single mean .row[.col-7[ ...but this gives as a confidence interval of length at most 2 only about half of the time! The sample variance is also a random variable with a sample distribution. $$\frac{(N-1)\times s^2}{\sigma^2} \sim \chi^2{(N-1)} \rightarrow s^2 \sim \frac{\chi^2(N-1)\times \sigma^2}{N-1} $$ To determine the necessary sample size for a study so that the full width of the 95% interval obtained in the study will be, with 90% **assurance**, no wider than 2: ] .col-5[ **Assurance** is the probability, expressed as a percentage, that a study obtains MoE no more than target MoE. ] ] .row[.col-4[ `$$N = \left(\frac{z_{1-\alpha/2}\times \tilde{\sigma}_{90}}{MoE}\right)^2$$` ] .col-4[ with `$$\tilde{\sigma}_{90}^2 = \frac{\chi^2_{0.90}(N-1)\times\sigma^2}{N-1}$$` ] .col-4[ Notice that `\(N\)` is on both sides of the equation. ] ] --- ## Accuracy with assurance ### in parameter estimation: A single mean .row[.col-7[ To determine the necessary sample size for a study so that the full width of the 95% interval obtained in the study will be, with 90% **assurance**, no wider than 2: ]] .row[.col-3[ `\(N = \left(\dfrac{1.96\times \tilde{\sigma}_{90}}{1}\right)^2\)` ] .col-5[ with `\(\tilde{\sigma}_{90}^2 = \dfrac{\chi^2_{0.90}(N-1)\times2^2}{N-1}\)` ] .col-4[ `$$\Rightarrow N\approx 24$$` ] ] .row[.col-7[ $$MoE= z_{1-\alpha/2} \times \frac{\sigma}{\sqrt{N}} = 1.96\times\frac{2}{\sqrt{24}} \approx 0.800 $$ `$$MoE_{0.90} = 1.96\times\frac{\tilde{\sigma}_{0.9} }{\sqrt{24}} \approx 0.990$$` ] .col-5[ Note, we need more observations to assure with 90% probability that the confidence interval is not exceeding a length of 2, compared to the case of having a confidence interval of length 2 on average. ] ] --- ## Planning Using Statistical Power .row[.col-7[ **Statistical power** is the probability that a study will find `\(p < \alpha\)` IF an effect of a stated size exists. It’s the probability of rejecting `\(H_0\)` when `\(H_a\)` is true. **Target effect size** is the value of `\(\delta\)` specified by `\(H_a\)`. It’s the population effect size for which we calculate statistical power. Power for **independent groups** depends on * `\(\alpha\)`, * target `\(\delta\)`, and * `\(N\)`. All must be stated for a value of power to make sense. ] .col-5[ **Power** is the probability of rejecting `\(H_0\)` IF the target effect size, `\(\delta\)`, is true. Type II error rate `\(=\beta\)` Power `\(= 1-\beta\)` ]] --- ## Strategies for Using Power for Planning .row[.col-7[ * Target `\(\delta\)` is very influential. Larger `\(\delta\)` gives higher power. * Type I error rate `\(\alpha\)` is very influential: Lower `\(\alpha\)` means it’s harder to reject `\(H_0\)`, so more misses (Type II errors) and lower power. * Group size `\(N\)` is influential. Larger N gives higher power. * For the paired design, `\(\rho\)` is very influential. Larger `\(\rho\)` gives higher power. Put another way, larger `\(\rho\)` requires smaller `\(N\)` for the same power. * The paired design, when it’s applicable, can give high power. ]] --- ### Comparing Power and Precision for Planning ||Precision for planning | Power for planning| |---|---|---| | The general aim | Choose N, considering costs and benefits | Same| | The measure | Standarddized effect size, e.g. Cohen’s d, assumed to be in units of population SD | Same| | The setting | Estimation of effect size | NHST, statistical significance | | The focus | MoE, precision of result <br/> Small MoE is good | Power, the probability that `\(p < \alpha\)`, IF target `\(\delta\)` is true — high power is good.| | The specific aim | Find N that gives target MoE,<br/> (a) on average, or (b) with assurance | Find N to give chosen power, for stated `\(\alpha\)` and target `\(\delta\)` (and `\(\rho\)` for paired design) | Build intuitions about how… | `\(N\)` varies with target MoE (and `\(\rho\)` for paired), on average, and with assurance | Power varies with `\(\alpha\)`, target `\(\delta\)`, and `\(N\)` (and `\(\rho\)` for paired) .row[.col-8[ One general lesson: Required `\(N\)` is often impractically and disappointingly large ]] --- ## R: Known Distribution - A binomial experiment .row[.col-7[ You want to sponsor a politician if he can win: Sponsoring adds 5% to his existing support. If it looks like only 40% of the voters favor the candidate, we won’t give him a dime. If 46% or more of the voters already favor him, we’ll pay to saturate the airwaves with his promises Hypothesis: support is at most 40% Type 1 error set at 5%, if p = 0.40, then the probability of rejecting the hypothesis that p = 0.40 should be no greater than 5%. Type 2 error set at 10%, if p = 0.46, then the probability of rejecting the hypothesis that p = 0.40 should be at least 90%. ] .col-5[ Calculate the 95% percentile of the binomial distribution with 10 trials and p = 0.4 ```r qbinom(.95,10,.4) ``` ``` ## [1] 7 ``` We need 7 to reject the null hypothesis. ]] --- ## R: Known Distribution - A binomial experiment .row[.col-6[ Calculate the probability of observing more than 6 successes for a binomial distribution with 10 trials and p = 0.4 ```r 1-pbinom(6,10,.4) ``` ``` ## [1] 0.05476188 ``` ```r pbinom(6,10,.4, FALSE) ``` ``` ## [1] 0.05476188 ``` 5.4% : good Type 2 error: Calculate the probability of observing 6 or less successes for a binomial distribution with 10 trials and p = 0.46 ```r pbinom(6,10,.46) ``` ``` ## [1] 0.8859388 ``` we want 10% not 88% ] .col-6[ Larger sample size n=400 number of success needed to reject null ```r qbinom(.95,400,.4) ``` ``` ## [1] 176 ``` Type 1 error ```r pbinom(175,400,.4, FALSE) ``` ``` ## [1] 0.05734915 ``` Type 2 error ```r pbinom(175,400,.46) ``` ``` ## [1] 0.1970185 ``` Still too big! ]] --- ## R: Power calculation ### for differences in proportions .row[.col-7[ ```r library(pwr) #?pwr.p.test ``` ```r (p.out<-pwr.p.test(h = ES.h(p1 = 0.46, p2 = 0.40), sig.level = 0.05, power = 0.90, alternative = "greater")) ``` ``` ## ## proportion power calculation for binomial distribution (arcsine transformation) ## ## h = 0.1212723 ## n = 582.2981 ## sig.level = 0.05 ## power = 0.9 ## alternative = greater ``` ]] --- .row[.col-7[ <img src="09.experiments_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> We need a sample size of `\(n=583\)`. ] .col-5[ Number of success needed to reject null hypothesis with a sample size of 583. ```r qbinom(.95,583,.4) ``` ``` ## [1] 253 ``` type 1 error ```r pbinom(252,583,.4, FALSE) ``` ``` ## [1] 0.05182839 ``` type 2 error ```r pbinom(252,583,.46) ``` ``` ## [1] 0.0961366 ``` ]] --- ## Another binomial experiment .row[.col-7[ A friend of ours has a “lucky” coin that seems to come up heads every time he flips it. We examine the coin and verify that only one side is marked tails. How many times should we flip the coin to test the hypothesis that it is fair so that 1. the probability of making a Type I error is no greater than 10% and 2. we have a probability of 80% of detecting a weighted coin that will come up heads 70 times out of one hundred on the average? ]] --- .row[.col-7[ ```r pwr.p.test(h = ES.h(p1 = 0.70, p2 = 0.50), sig.level = 0.10, power = 0.80, alternative = "greater") ``` ``` ## ## proportion power calculation for binomial distribution (arcsine transformation) ## ## h = 0.4115168 ## n = 26.61923 ## sig.level = 0.1 ## power = 0.8 ## alternative = greater ``` ]] .row[.col-6[ ```r qbinom(.9,27,.5) ``` ``` ## [1] 17 ``` type 1 error ```r pbinom(17,27,.5, FALSE) ``` ``` ## [1] 0.06103906 ``` type 2 error ```r pbinom(17,27,.7) ``` ``` ## [1] 0.2724475 ``` ] .col-6[ add one more obs to get closer to target ```r qbinom(.9,28,.5) ``` ``` ## [1] 17 ``` type 1 error ```r pbinom(17,28,.5, FALSE) ``` ``` ## [1] 0.09246667 ``` type 2 error ```r pbinom(17,28,.7) ``` ``` ## [1] 0.1913274 ``` ]] --- ## Almost Normal Data .row[.col-7[ The variance of the mean of `\(n\)` observations each with variance `\(\sigma^2\)` is `\(\sigma^2/n\)` The 95th percentage point of a standard normal N(0,1) distribution is ```r qnorm(.95) ``` ``` ## [1] 1.644854 ``` ] .col-5[ <img src="09.experiments_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ]] .row[.col-7[ If the true mean is actually one standard deviation larger than 0, the probability of observing a value larger than 1.644854 is given by ```r 1 - pnorm(1.644854,1) ``` ``` ## [1] 0.2595109 ``` ] .col-5[ <img src="09.experiments_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ]] --- ## Sample size calculation ### for standard normal data .row[.col-7[ If each of `\(n\)` independent observations is normally distributed as N(0,1), then their mean is distributed as `\(N(0,1/\sqrt{n})\)` Detecting a difference of 1 now becomes a task of detecting a difference of `\(\sqrt{n}\)` standard deviation units. ]] .row[.col-7[ We will have power of 90% providing 1 - pnorm(1.644, `\(\sqrt(n)\)`) = 0.1 = 1-0.9 ] .col-5[ ```r qnorm(.1) ``` ``` ## [1] -1.281552 ``` ]] .row[.col-7[ We require a sample size `\(n\)` such that `\(1.644854 - \sqrt{n} = - 1.281552\)` ] .col-5[ ```r (1.644854 + 1.281552)^2 ``` ``` ## [1] 8.563852 ``` That is a sample size of `\(n=9\)`. ]] .row[.col-7[ Type 2 error with a sample size of 9 and a test statistics of 1.64 ] .col-5[ ```r pnorm(1.644854,sqrt(9)) ``` ``` ## [1] 0.08768552 ``` ]] --- ## Sample size and power calculation ### for standard normal data .row[.col-7[ ```r pwr.norm.test(d=1,sig.level = 0.05, power=0.9, alternative = "greater") ``` ``` ## ## Mean power calculation for normal distribution with known variance ## ## d = 1 ## n = 8.563847 ## sig.level = 0.05 ## power = 0.9 ## alternative = greater ``` ] .col-5[ #### Sample size calculation * given effect size, type I error, and power ] ] <br/> .row[.col-7[ ```r pwr.norm.test(n=9,sig.level = 0.05, power=0.9, alternative = "greater") ``` ``` ## ## Mean power calculation for normal distribution with known variance ## ## d = 0.9754722 ## n = 9 ## sig.level = 0.05 ## power = 0.9 ## alternative = greater ``` ] .col-5[ #### Smallest detectable effect size * given sample size, type I error, and power ] ] --- ## Power calculation for experiments ### with unknown variance .row[.col-7[ For a fixed absolute effect size we need to know the population variance or an estimate. ]] .row[.col-7[ For power and sample size calculations based on the t-test we can use: ] .col-5[ ```r ?pwr.t.test ``` ]] <br/> .row[.col-7[ ```r pwr.t.test(d=0.5,sig.level = 0.05, power=.9, type="one.sample") ``` ``` ## ## One-sample t test power calculation ## ## n = 43.99548 ## d = 0.5 ## sig.level = 0.05 ## power = 0.9 ## alternative = two.sided ``` ] .col-5[ #### One sample test against a fixed value * the effect size is given in standard deviation units (called Cohen's d) ] ] --- ## Power calculation for experiments ### with unknown variance .row[.col-7[ ```r pwr.t.test(d=0.5,sig.level = 0.05, power=.9) ``` ``` ## ## Two-sample t test power calculation ## ## n = 85.03128 ## d = 0.5 ## sig.level = 0.05 ## power = 0.9 ## alternative = two.sided ## ## NOTE: n is number in *each* group ``` ] .col-5[ #### Test for difference between two groups ] ]