# Introduction to Statistics and Data Science

Please remember to asks questions in case you get stuck with any of the assignments.

The submission deadline for homework 7 and all following homework assignments is May 23th.

**subject to changes**

## Course Description

This course introduces to the theory and practice of describing and analysing data, testing hypotheses and how to use the free statistical computing software R for your data analyses.

Topics discussed include, but are not limited to, combinatorics, probability, random variables, density and distribution functions, sampling from a population, sample properties and descriptive summary statistics, and inferring population characteristics from a random sample.

## Course Objectives

The course will enable students to describe and to analyse data using simple statistical methods and to interpret and to report their data analysis results. Students will learn how to use the statistical computing software R, a software widely used in academia and industry.

## Course Materials

### Required:

The course does not follow a single textbook, there will be weekly, mandatory reading assignments from different sources as indicated under the topics below.

### Recommended Reading list

#### Data Analysis

- Agresti, A., 2019. An Introduction to Categorical Data Analysis, 3rd ed. Wiley.
- Bilder, C.R. and Loughin, T.M., 2015. Analysis of Categorical Data with R. Wiley.
- Çetinkaya-Rundel, M. and Hardin, J, 2021. Introduction to Modern Statistics. https://openintro-ims.netlify.app/
- Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley.
- Cumming, G. and Calin-Jageman, R., 2016. Introduction to the New Statistics. Routledge.
- Devlin, T.D. et al., 2018. Seeing Theory. https://seeing-theory.brown.edu/index.html
- Downey, A., 2016. There is still only one test. http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html
- Gandrud, Ch., 2020. Reproducible Research with R and RStudio, 3rd edition. CRC Press.
- Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley.
- Good, P.I. and Hardin, J.W., 2012. Common Errors in Statistics (and How to Avoid Them). Wiley.
- Heiss, A., 2019. Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups. https://github.com/andrewheiss/diff-means-half-dozen-ways
- Ismay, C. and Kennedy, P. C., 2019. Getting used to R, RStudio, and R Markdown. https://ismayc.github.io/rbasics-book/index.html

- Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage.
- Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson.
- Reinhart, A., 2015. Statistics Done Wrong. No Starch Press. https://www.statisticsdonewrong.com/
- Selvamuthu, D. and Das, D., 2018. Introduction to Statistical Methods, Design of Experiments and Statistical Quality Control. Springer.
- Tukey, J.W., 1977. Exploratory data analysis. Addison-Wesley.
- Upton, G.J.G., 2017. Categorical Data Analysis by Example. Wiley.
- Urdan, T.C., 2011. Statistics in plain English. Routledge.
- De Veaux, R.D., Velleman, P.F. and Bock, D.E., 2018. Intro stats, 5th ed. Boston: Pearson.
- Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc. https://r4ds.had.co.nz/

#### Data Visualization & Communication

- Chang, W., 2021. R Graphics Cookbook, 2nd edition. O’Reilly Media, Inc. https://r-graphics.org/
- Cleveland, W.S., 1993. Visualizing Data. Hobart Press.
- Cleveland, W.S., 1994. The elements of graphing data. Hobart Press.
- Few, S., 2009. Now you see it. Analytics Press.
- Few, S., 2012. Show me the numbers, 2nd ed. Analytics Press.
- Few, S., 2015. Signal – Understanding What Matters in a World of Noise. Analytics Press.
- Harris, R.L., 1999. Information Graphics. Oxford University Press.
- Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press. http://socviz.co/
- Healy, K., 2018. The Plain Person’s Guide to Plain Text Social Science. http://plain-text.co
- Kirk, A., 2019. Data Visualisation, 2nd edition. SAGE.
- Knaflic, C.N., 2015. Storytelling with Data. Wiley.
- Knaflic, C.N., 2020. Storytelling with Data - let’s practice! Wiley.
- Miller, J.E., 2015. The Chicago Guide to Writing about Numbers, 2nd ed. Chicago University Press.
- Robbins, N.B., 2005. Creating More Effective Graphs. Wiley.
- Schwabish, J., 2021. Better Data Visualizations. Columbia University Press.
- Turabian, K.L., 2018. A Manual for Writers of Research Papers, Theses, and Dissertations, 9th ed. University of Chicago Press.
- Wainer, H., 2005. Graphic Discovery – A Trout in the Milk and Other Visual Adventures. Princeton University Press.
- Wainer, H., 2009. Picturing the Uncertain World. Princeton University Press.
- Wickham, H., Navarro, D., and Pedersen, T.L., 2022. ggplot2: Elegant Graphics for Data Analysis. 3rd edition. Springer. https://ggplot2-book.org/index.html
- Wilke, C.O., 2019. Fundamentals of Data Visualization. O’Reilly Media, Inc. https://clauswilke.com/dataviz/ https://github.com/clauswilke/dataviz
- Xie, Y., 2015. Dynamic Documents with R and knitr. CRC Press. https://github.com/yihui/knitr-book/tree/master/markdown
- Xie, Y., Allaire, J.J. and Grolemund, G., 2018. R markdown: The definitive guide. CRC Press. https://bookdown.org/yihui/rmarkdown/
- Yau, N., 2013. Data Points – Visualization that means something. Wiley.

## Course Requirements:

Students must read the weekly reading assignments *before* each session.
Students need access to a computer with the statistical computing
software R. To get this access, students should create an account at
https://rstudio.cloud for an in-browser version of RStudio and R.
Alternatively, R is available free of charge from
www.r-project.org. The graphical frontend
RStudio is
available in a free version, too. R and Rstudio are installed in the
college’s computer lab.

Homework needs to be submitted electronically before class. See below for problem sets and deadlines.

## Covid Pandemic in-person on Campus Policy

This semester Touro College Berlin will offer hybrid learning, which means students can come to campus or study online.

If you want to participate in a session in-person on campus, you will have to notify me by 12pm (noon) the day prior to the respective class session. You can decide for each week separately how you want to participate in the class.

If no student notifies me by 12pm (noon) the day prior to a class session, I will not be on campus and join the class session from my home office.

The first session will be remote via zoom only.

## Instructor Information:

Prof. Dr. Dennis A. V. Dittrich

dennis.dittrich@touroberlin.de

https://economicscience.net
Twitter: @davdittrich

You can always contact me via email or twitter. For meetings appointments can be arranged through the my webpage at: https://economicscience.net/appointments/

I have also set up a discord server with a channel for our class. You can join it here!

Updated information, links to the literature, additional materials, etc. can be found on my webpage as well.

## Grading Guidelines:

Grading Component | Weight |
---|---|

Problem Sets | 65% |

Quiz | 10% |

Data Analysis Project: Report | 25% |

If the term paper (data analysis project report) is better than the weighted average of the above listed grading components, the course grade is determined only by the term paper.

## Workload

A typical 3 US credits / 5 ECTS course requires 150 hours of your time. The table below identifies how I expect those 150 hours will be allocated. While you do not receive direct marks for reading, reading will affect your ability to participate in class discussions and activities and your ability to succeed in the assessments and therefore your final grade.

Activity | Time |
---|---|

Reading (3 hours / week) | 45 hours |

Lectures (2 hours / week) | 30 hours |

Class Time (3 hours / week) | 45 hours |

Problem Sets (1 hours / week) | 15 hours |

Preparation and Review (1 hour / week) | 15 hours |

Starting with the second week, our sessions are exercise sessions where we apply the material you have read before each session with the help of R/RStudio.

The lectures for this class are pre-recorded and complementary to the readings assignments. That means they do not just re-iterate the content of your reading assignments but build on them and introduce you to the software that we will use during the exercises.

Successful participation in this course requires that you first read the assigned materials, review and build upon your readings with the help of the pre-recorded lectures and lecture slides, and participate in the exercise sessions. Without these three steps it is unlikely that you will be able to excel in the weekly problem sets.

## Weekly Topics and Reading Assignments

Slides and pre-recorded lectures for each session are here. (Click thislink!) They required materials that you need to review before each of the corresponding exercise sessions.

### Session 1:

Introduction to course and to data science and statistics

Required reading:

Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Chapter 2.

Ismay, C. and Kennedy, P. C., 2018. Getting used to R, RStudio, and R Markdown. Chapters 2 & 3.Join the TCB-stats workspace on RStudio.cloud.

### Session 2:

Reproducible Research with R, RStudio, and R Markdown

Required reading:

Tierney, N, 2020. RMarkdown for Scientists

Ismay, C. and Kennedy, P. C., 2018. Getting used to R, RStudio, and R Markdown. Chapter 4.Further recommendation:

Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press. Chapter 2.

Healy, K., 2018. The Plain Person’s Guide to Plain Text Social Science.

RStudio. RMarkdownAs a reference:

Gandrud, Ch., 2020. Reproducible Research with R and RStudio, 3rd edition. CRC Press.

Xie, Y., 2015. Dynamic Documents with R and knitr. CRC Press.

Xie, Y., Allaire, J.J. and Grolemund, G., 2018. R markdown: The definitive guide. CRC Press.

### Session 3:

Exploratory Data Analysis and Data Visualization

Required reading:

Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Chapter 3.

Çetinkaya-Rundel, M. and Hardin, J, 2021. Introduction to Modern Statistics. Exploratory data analysis: Chapter 4 and 5Further recommendations:

Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press. Chapters 1 and 3.

De Veaux, R.D., Velleman, P.F. and Bock, D.E., 2018. Intro stats. Boston: Pearson. Chapters 2-4.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 2.

Michonneau, F. and Fournier, A, 2019. Data Analysis and Visualization in R for Ecologists. Section “Visualizing data”.

and the other sources from the recommended reading list section Data Visualization & Communication.As a reference:

Holtz, Y, 2022. The R graph gallery.

Wickham, H., Navarro, D., and Pedersen, T.L., 2022. ggplot2: Elegant Graphics for Data Analysis. 3rd edition. Springer.

### Session 4:

Working with Data, R, and RStudio

Required reading:

Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Online chapters 4–7, and 9–15.Further recommendations:

Michonneau, F. and & Fournier, A, 2020. Data Analysis and Visualization in R for Ecologists. Sections “Before we start”, “Intro to R”, “Starting with data”, and “Manipulating data”.

### Session 5:

Working with Data & Exploratory Data Analysis

Required reading:

De Veaux, R.D., Velleman, P.F. and Bock, D.E., 2018. Intro stats. Boston: Pearson. Chapters 2-4.

Franconeri, Steven L., Lace M. Padilla, Priti Shah, Jeffrey M. Zacks, and Jessica Hullman. 2021. “The Science of Visual Data Communication: What Works.” Psychological Science in the Public Interest 22 (3): 110–61. https://doi.org/10.1177/15291006211051956.Further recommendations:

Videos: One Chart at a Time

Schwabish, J., 2021. Better Data Visualizations. Columbia University Press.

Rost, L.C., 2020. How to pick more beautiful colors for your data visualizations. Datawrapper Blog

Rost, L.C., 2021. Which color scale to use when visualizing data. Datawrapper Blog

Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press. Chapters 4, 5, and 8.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 2.As a reference:

Chang, W., 2021. R Graphics Cookbook, 2nd edition. O’Reilly.

Holtz, Y, 2022. The R graph gallery.

Wickham, H., Navarro, D., and Pedersen, T.L., 2022. ggplot2: Elegant Graphics for Data Analysis. 3rd edition. Springer. Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Part I Explore.

Wilke, C.O., 2019. Fundamentals of Data Visualization. O’Reilly Media, Inc.

Miller, J.E., 2015. The Chicago Guide to Writing about Numbers, 2nd ed. Chicago University Press.

### Session 6:

Introduction to Probability

Required reading:

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 6.

https://seeing-theory.brown.edu/basic-probability/index.html

https://seeing-theory.brown.edu/compound-probability/index.htmlFurther recommendations:

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 4.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 2.

### Session 7:

Probability Distributions and the Central Limit Theorem

Required reading:

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 7-8.

https://seeing-theory.brown.edu/probability-distributions/index.htmlFurther recommendations:

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 4.

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 5.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 3.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 4.

### Session 8:

- Probability Distributions and the Central Limit Theorem

Required reading:

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 8-9.

### Session 9:

Inference: Estimation of Population Parameters and their Margin of Error - Continuous Data

Required reading:

Cumming, G. and Calin-Jageman, R., 2016. Introduction to the New Statistics. Routledge. Chapter 5.Further recommendations:

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 10.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 4.

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 6.

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 7.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 5 and 7.

### Session 10:

Inference: Null Hypothesis Tests and Significance

Required reading:

Downey, A., 2016. There is still only one test.

https://www.rstudio.com/resources/rstudioconf-2018/infer-a-package-for-tidy-statistical-inference/

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 7.

Andrew B., et. al., 2020. infer: Tidy Statistical Inference. R package version 0.5.4.Further recommendations:

Cumming, G. and Calin-Jageman, R., 2016. Introduction to the New Statistics. Routledge. Chapter 6.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 3.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 5.

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 7.

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 11.

### Session 11:

Inference: Null Hypothesis Tests and Significance - Continuous Data - One Sample

Required reading:

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 9.Further recommendations:

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 8.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 5.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 8.

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 12.

### Session 12:

Inference: Null Hypothesis Tests and Significance - Continuous Data - Multiple Samples

Required reading:

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 10 and 11.

Heiss, A., 2019. Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups.Further recommendations:

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 8 and 9.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 5.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 12.

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 13 and 14.

### Session 13:

Inference: Categorical Data - Estimation of Population Parameters and their Margin of Error, NHST

Required reading:

Cumming, G. and Calin-Jageman, R., 2016. Introduction to the New Statistics. Routledge. Chapter 13.

Agresti, A., 2019. An Introduction to Categorical Data Analysis, 3rd ed. Wiley. Chapter 2 and Section 8.1.Further recommendations:

Upton, G.J.G., 2017. Categorical Data Analysis by Example. Wiley. Chapter 2-4.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 8 & 10.

Bilder, C.R. and Loughin, T.M., 2015. Analysis of Categorical Data with R. Wiley. Chapter 1.

### Session 14:

Experiment and Survey Design: Randomization and Measurement Required reading:

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 6.

Ford, C., 2018. Getting started with the pwr packageFurther recommendation:

Cumming, G. and Calin-Jageman, R., 2016. Introduction to the New Statistics. Routledge. Chapter 2 and 10.

Boddy, R. and Smith, G., 2010. Effective Experimentation. Wiley.

### Session 15:

- no final exam

Topics and reading assignments are subject to changes.

## Homework Problems

You will find the homework problems and other material for download in this pCloud folder. [Click Here!]

Upload you homework solutions to this pCloud folder [Click Here!]

### RMarkdown for homework problems

You have to use RMarkdown with PDF output for your homework solutions. See the RStudio lessons on RMarkdown and R Markdown: The Definitive Guide for an introduction and reference to RMarkdown, respectively. See also Xie, Y., 2015. Dynamic Documents with R and knitr.

If you do not have a pCloud account yet: You can get one for free! You do not require an account to access to above folders.