# Introduction to Statistics and Data Science for Management

## Modification of schedule

While we have online classes and do not meet on campus, we will meet only on Tuesdays at 15:30 online and there will be no session on Mondays. I will offer Fridays at 11:30 an extra online session during which we will discuss additional examples and more exercises. The Friday session is completely voluntary.

Next topics for the voluntary Friday session are:

- April 3
- tidy data / Explorative Data Analysis.
- April 17?
- Probabilities and probability distributions

## Instructions for online class sessions

Please join me on zoom through this link (Click here!) at our regular class time. To save everyone's bandwidth and thus improve the quality of the video conference I recommend to not turn your camera on and share your screen only when necessary.

In the pcloud folder you will find my pdf slides. Often, we will also use rstudio.cloud, so please be logged into your account.

Please, check your college email regularly for updates. I may ask you to participate in doodle polls if you want to join in extra turorials, or in tricider polls to quickly elicit topics to discuss in more details or just for general brainstorming.

## Course Description

This course introduces to the theory and practice of describing and analysing data, testing hypotheses and how to use the free statistical computing software R for your data analyses.

Topics discussed include, but are not limited to, combinatorics, probability, random variables, density and distribution functions, sampling from a population, sample properties and descriptive summary statistics, and inferring population characteristics from a random sample.

## Course Objectives

The course will enable students to describe and to analyse data using simple statistical methods and to interpret and to report their data analysis results. Students will learn how to use the statistical computing software R, a software widely used in academia and industry.

## Course Materials

### Required:

The course does not follow a single textbook, there will be weekly, mandatory reading assignments from different sources as indicated under the topics below.

### Recommended Reading list

#### Data Analysis

- Agresti, A., 2019. An Introduction to Categorical Data Analysis, 3rd ed. Wiley.
- Bilder, C.R. and Loughin, T.M., 2015. Analysis of Categorical Data with R. Wiley.
- Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley.
- Devlin, T.D. et al., 2018. Seeing Theory. https://seeing-theory.brown.edu/index.html
- Downey, A., 2016. There is still only one test. http://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html
- Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley.
- Good, P.I. and Hardin, J.W., 2012. Common Errors in Statistics (and How to Avoid Them). Wiley.
- Heiss, A., 2019. Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups. https://github.com/andrewheiss/diff-means-half-dozen-ways
- Ismay, C. and Kennedy, P. C., 2019. Getting used to R, RStudio, and R Markdown. https://ismayc.github.io/rbasics-book/index.html
- Jaggia, S. and Kelly, A., 2020. Essentials of Business Statistics, 2nd ed. McGraw-Hill.
- Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage.
- Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson.
- Reinhart, A., 2015. Statistics Done Wrong. No Starch Press. https://www.statisticsdonewrong.com/
- Render, B. et al., 2018. Quantitative Analysis for Management, 13th ed. Pearson.
- Selvamuthu, D. and Das, D., 2018. Introduction to Statistical Methods, Design of Experiments and Statistical Quality Control. Springer.
- Stanton, J.M., 2017. Reasoning with Data: An Introduction to Traditional and Bayesian Statistics Using R. Guilford Publications.
- Torfs, P. and Brauer, C, 2018. A (very) short introduction to R. https://github.com/ClaudiaBrauer/A-very-short-introduction-to-R/blob/master/documents/A%20(very)%20short%20introduction%20to%20R.pdf
- Tukey, J.W., 1977. Exploratory data analysis. Addison-Wesley.
- Upton, G.J.G., 2017. Categorical Data Analysis by Example. Wiley.
- Urdan, T.C., 2011. Statistics in plain English. Routledge.
- De Veaux, R.D., Velleman, P.F. and Bock, D.E., 2018. Intro stats, 5th ed. Boston: Pearson.
- Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. O'Reilly Media, Inc. https://r4ds.had.co.nz/

#### Data Visualization & Communication

- Cleveland, W.S., 1993. Visualizing Data. Hobart Press.
- Cleveland, W.S., 1994. The elements of graphing data. Hobart Press.
- Few, S., 2009. Now you see it. Analytics Press.
- Few, S., 2012. Show me the numbers, 2nd ed. Analytics Press.
- Few, S., 2015. Signal – Understanding What Matters in a World of Noise. Analytics Press.
- Harris, R.L., 1999. Information Graphics. Oxford University Press.
- Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press. http://socviz.co/
- Healy, K., 2018. The Plain Person's Guide to Plain Text Social Science. http://plain-text.co
- Knaflic, C.N., 2015. Storytelling with Data. Wiley.
- Miller, J.E., 2015. The Chicago Guide to Writing about Numbers, 2nd ed. Chicago University Press.
- Robbins, N.B., 2005. Creating More Effective Graphs. Wiley.
- Tufte, E.R., 1990. Envisioning Information, 2nd ed. Cheshire, CT: Graphics Press.
- Tufte, E.R., 2001. The Visual Display of Quantitative Information, 2nd ed. Cheshire, CT: Graphics Press.
- Tufte, E.R., 1997. Visual Explanations. Cheshire, CT: Graphics Press.
- Tufte, E.R., 2006. Beautiful evidence. Cheshire, CT: Graphics Press.
- Turabian, K.L., 2018. A Manual for Writers of Research Papers, Theses, and Dissertations, 9th ed. University of Chicago Press.
- Wainer, H., 2005. Graphic Discovery – A Trout in the Milk and Other Visual Adventures. Princeton University Press.
- Wainer, H., 2009. Picturing the Uncertain World. Princeton University Press.
- Wickham, H., 2016. ggplot2: Elegant Graphics for Data Analysis. Springer. https://ggplot2.tidyverse.org/
- Yau, N., 2013. Data Points – Visualization that means something. Wiley.

## Course Requirements:

Students must read the weekly reading assignments *before* each session.
Students need access to a computer with the statistical computing
software R. To get this access, students should create an account at
https://rstudio.cloud for an in-browser version of RStudio and R.
Alternatively, R is available free of charge from
www.r-project.org. The graphical frontend
RStudio is
available in a free version, too. R and Rstudio are installed in the
college's computer lab.

Homework needs to be submitted electronically before class. See below for problem sets and deadlines.

The course is taught in two sections, one section meets Mondays at 12:15, the other meets Tuesdays at 15:30. Students should attend the meetings for only one section each week.

## Instructor Information:

Prof. Dr. Dennis A. V. Dittrich

dennis.dittrich@touroberlin.de

http://economicscience.net

You can always reach me via email. For meetings in my office, appointments can be arranged through my webpage at: http://economicscience.net/content/book-appointment.

Updated information, links to the literature, additional materials, etc. can be found on my webpage.

## Grading Guidelines:

Grading Component | Weight |
---|---|

Problem Sets | 50% |

Data Analysis Project: Report | 40% |

Data Analysis Project: Presentation | 10% |

## Workload

A typical 3 US credits / 5 ECTS course requires 150 hours of your time. The table below identifies how I expect those 150 hours will be allocated. While you do not receive direct marks for reading, reading will affect your ability to participate in class discussions and activities and your ability to succeed in the assessments and therefore your final grade.

Activity | Time |
---|---|

Class Time (3 hours / week) | 45 hours |

Reading (4 hours / week) | 60 hours |

Problem Sets (2 hours / week) | 30 hours |

Preparation and Review (1 hour / week) | 15 hours |

## Weekly Topics and Reading Assignments

### Session 1: 10.02. / 11.02.

Introduction to course and to data science and statistics

Jaggia, S. and Kelly, A., 2020. Essentials of Business Statistics, 2nd ed. McGraw-Hill. Chapter 1.Join the tcb-stats workspace on RStudio.cloud.

### Session 2: 17.02. / 18.02.

Working with Data, R, and RStudio

Michonneau, F. and & Fournier, A, 2019. Data Analysis and Visualization in R for Ecologists. Sections “Before we start”, “Intro to R”, “Starting with data”, and “Manipulating data”.Further recommendation:

Ismay, C. and Kennedy, P. C., 2018. Getting used to R, RStudio, and R Markdown. Chapters 2, 3, 5.

Torfs, P. and Brauer, C, 2018. A (very) short introduction to R.

Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Part II Wrangle.

### Session 3: 24.02. / 25.02.

Working with Data, continued

Michonneau, F. and & Fournier, A, 2019. Data Analysis and Visualization in R for Ecologists. Section “Manipulating data”.Exploratory Data Analysis and Data Visualization

Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Part I Explore.Further recommendations:

De Veaux, R.D., Velleman, P.F. and Bock, D.E., 2018. Intro stats. Boston: Pearson. Chapters 2-4.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 2.

Michonneau, F. and & Fournier, A, 2019. Data Analysis and Visualization in R for Ecologists. Section “Visualizing data”.

Wickham, H., 2016. ggplot2: Elegant Graphics for Data Analysis. Springer.

Healy, K., 2018. Data Visualization: A Practical Introduction. Princeton University Press. Chapters 1-5.

Tukey, J.W., 1977. Exploratory data analysis. Addison-Wesley.

and the other sources from the recommended reading list section Data Visualization & Communication.

### Session 4: 02.03. / 03.03.

Exploratory Data Analysis and Data Visualization

De Veaux, R.D., Velleman, P.F. and Bock, D.E., 2018. Intro stats. Boston: Pearson. Chapters 2-4.Further recommendations:

Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. Part I Explore.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 2.

### Session 5: 09.03. / 10.03.

Introduction to Probability

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 6.

https://seeing-theory.brown.edu/basic-probability/index.html

https://seeing-theory.brown.edu/compound-probability/index.htmlFurther recommendations:

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 4.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 2.

### Session 6: 16.03. / 17.03.

Inference and Significance

Downey, A., 2016. There is still only one test.

https://www.rstudio.com/resources/videos/infer-a-package-for-tidy-statistical-inference/

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 7.Andrew B., et. al., 2019. infer: Tidy Statistical Inference. R package version 0.5.1.

Further recommendations:

Stanton, J.M., 2017. Reasoning with Data: An Introduction to Traditional and Bayesian Statistics Using R. Guilford Publications. Chapter 5.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 3.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 5.

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 7.

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 11.

### Session 7: 23.03. / 24.03.

Probability Distributions and the Central Limit Theorem

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 7-8.

https://seeing-theory.brown.edu/probability-distributions/index.htmlFurther recommendations:

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 4.

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 5 & 6.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 3 & 4.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 4.

### Session 8: ~~30.03.~~ / 31.03.

- Probability Distributions and the Central Limit Theorem

Keller, G., 2018. Statistics for Management and Economics, 11th ed. Cengage. Chapter 8-9.

### Session 9: ~~06.04.~~ / 21.04.

Inference for Proportions

Agresti, A., 2019. An Introduction to Categorical Data Analysis, 3rd ed. Wiley. Chaper 2 & 8.Further recommendations:

Upton, G.J.G., 2017. Categorical Data Analysis by Example. Wiley. Chapter 2-4.

Bilder, C.R. and Loughin, T.M., 2015. Analysis of Categorical Data with R. Wiley. Chapter 1.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 8 & 10.

### Session 10: ~~20.04.~~ / 28.04.

Inference for Comparing Means

Heiss, A., 2019. Half a dozen frequentist and Bayesian ways to measure the difference in means in two groups.

Urdan, T.C., 2011. Statistics in plain English. Routledge. Chapter 9-11.Further recommendations:

Stanton, J.M., 2017. Reasoning with Data: An Introduction to Traditional and Bayesian Statistics Using R. Guilford Publications. Chapter 5 & 6.

Levine, D.M. and Stephan, D.F., 2010. Even You Can Learn Statistics, 2nd ed. Pearson. Chapter 8 & 9.

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 5.

Chihara. L.M. and Hesterberg, T.C., 2019. Mathematical Statistics with Resampling and R, 2nd ed. Wiley. Chapter 8 & 12.

### Session 11: 04.05. / 05.05.

Designing Experiments

Good, P.I., 2013. Introduction to Statistics Through Resampling Methods and R, 2nd ed. Wiley. Chapter 6.

Ford, C., 2018. Getting started with the pwr packageFurther recommendation:

Boddy, R. and Smith, G., 2010. Effective Experimentation. Wiley.

### Session 12: 11.05. / 12.05.

Statistical Quality Control

Render, B. et al., 2018. Quantitative Analysis for Management, 13th ed. Pearson. Chapter 15.Further recommendation:

Selvamuthu, D. and Das, D., 2018. Introduction to Statistical Methods, Design of Experiments and Statistical Quality Control. Springer. Chapter 10.**Deadline**for the approval of your data analysis project.

### Session 13: 18.05. / 19.05.

Writing about Numbers

Miller, J.E., 2015. The Chicago Guide to Writing about Numbers, 2nd ed. Chicago University Press.Further recommendation:

Healy, K., 2018. The Plain Person's Guide to Plain Text Social Science.

### Session 14: 25.05. / 02.06.

Review: Statistics Done Wrong

Further recommendation:

Reinhart, A., 2015. Statistics Done Wrong. No Starch Press.

Good, P.I. and Hardin, J.W., 2012. Common Errors in Statistics (and How to Avoid Them). Wiley.

### Session 15: 08.06. / 09.06.

Data Analysis Project: Presentations

Freely available data sets:

https://datasetsearch.research.google.com/

https://www.kaggle.com/datasets

Topics and reading assignments are subject to changes.

## Homework Problems

You will find the homework problems and other material for download in this pCloud folder. [Click Here!]

Upload you homework solutions to this pCloud folder [Click Here!]

### RMarkdown for homework problems

You are encouraged to use RMarkdown with PDF output for your homework solutions. See the RStudio lesson on RMarkdown and R Markdown: The Definitive Guide for an introduction and reference to RMarkdown, respectively.

#### Submission deadlines for homework

Homework # | due at noon on |
---|---|

1 | 17.02. |

2 | 24.02. |

3 | 02.03. |

4 | 09.03. |

5 | 16.03. |

6 | 23.03. |

7 | 30.03. |

8 | 21.04. |

9 | 28.04. |

10 | 05.05. |

11 | 12.05. |

12 | 19.05. |

13 | 26.05. |

Note: As long as we are having our classes online the submission deadline for homework assignments is postponed to Tuesdays, noon. Should we move back to have classes on campus the deadline will be moved to Mondays again.

If you do not have a pCloud account yet: You can get one for free! You do not require an account to access to above folders.