class: middle, title-slide # Probability ### Dennis A. V. Dittrich ### 2021 --- layout: true <div class="my-footer"> <span><img src="img/tcb-logo.png" height="40px"></span> </div> --- ## Random Experiment .row[.col-7[ A **random experiment** is an action or process that leads to one of several possible outcomes. A **sample space** of a random experiment is a list of all possible outcomes of the experiment. The outcomes must be exhaustive and mutually exclusive. An **event** is a collection or set of one or more simple events (**outcomes**) in a sample space. ]] --- ## Probability .row[.col-7[ **classical** - based on equally likely events **relative frequency** - assigning probabilities based on experimentation or historical data **subjective** - Assigning probabilities based on the one's (subjective) judgment. - subjective probability is the degree of belief that we hold in the occurrence of an event ]] --- ## Probability: classical .row[.col-7[ If an experiment has n possible outcomes, this method would assign a probability of 1/n to each outcome. It is necessary to determine the number of possible outcomes. **Experiment** Rolling a die **Outcomes** {1, 2, 3, 4, 5, 6} **Probabilities** Each sample point has a 1/6 chance of occurring. ]] --- ## Probability: frequencies .row[.col-7[ |Items sold|0|1|2|3|4| |---|-:|-:|-:|-:|-:| |Number of days|1|2|10|12|5 |relative frequency |0.03|0.07|0.33|0.40|0.17 <br/> There is a 40% chance that 3 items will be sold on any given day. ]] --- ## Interpreting Probability .row[.col-7[ No matter which method is used to assign probabilities, all will be interpreted in the relative frequency approach. For a lottery game where 6 numbers of 49 are picked, he classical approach would predict the probability for any one number being picked as `$$1/49=2.04\%.$$` We interpret this to mean that in the long run each number will be picked 2.04% of the time. ]] --- .your-turn[.row[.col-7[ ## Exercise - The weather forecaster reports that the probability of rain tomorrow is 10%. 1. Which approach was used to arrive at this number? 2. How do you interpret the probability? - The sample space of the toss of a fair die is S = {1, 2, 3, 4, 5, 6} If the die is balanced each simple event has the same probability. Find the probability of the following events. 1. An even number 2. A number less than or equal to 4 3. A number greater than or equal to 5 ]]] --- .your-turn[.row[.col-7[ ## Exercise Shoppers can pay for their purchases with cash, a credit card, or a debit card. Suppose that the proprietor of a shop determines that 60% of her customers use a credit card, 30% pay with cash, and the rest use a debit card. 1. Determine the sample space for this experiment. 2. Assign probabilities to the simple events. 3. Which method did you use in part 2? ]]] --- ## Joint, Marginal, Conditional Probability... .row[.col-7[ We study methods to determine probabilities of events that result from combining other events in various ways. There are several types of combinations and relationships between events: - Complement event - Intersection of events - Union of events - Mutually exclusive events - Dependent and independent events ]] --- ## Complement of an Event .row[.col-7[ The **complement** of event A is defined to be the event consisting of all sample points that are “not in A”. Complement of `\(A\)` is denoted by `\(A^c\)`. `$$P(A) + P(A^c) = 1$$` ### Example Look at all the possible tosses of 2 dice: {(1,1), 1,2),... (6,6)}. Let A = tosses totaling 7: A = {(1,6), (2, 5), (3,4), (4,3), (5,2), (6,1)} P(Total = 7) + P(Total `\(\neq\)` 7) = 1 ]] --- ## Intersection of Events A and B .row[.col-7[ The **intersection** of events A and B is the event that occurs when both A and B occur. It is denoted as `\(A \land B\)`. The probability of the intersection is called the joint probability: `\(P(A \land B)\)`. ### Example Let A = tosses where first toss is 1: `\(A = \{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)\}\)` and B = tosses where the second toss is 5: `\(B = \{(1,5), (2,5), (3,5), (4,5), (5,5), (6,5)\}\)` The intersection is {(1,5)}, the joint probability of A and B is the probability of the intersection of A and B, i.e. `\(P(A \land B) = 1/36\)`. ]] --- ## Union of Events A and B .row[.col-7[ The **union** of events A and B is the event that occurs when either A or B or both occur. It is denoted as `\(A \lor B\)`. ### Example Let A = tosses where first toss is 1: `\(A = \{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)\}\)` and B is the tosses that the second toss is 5: `\(B = \{(1,5), (2,5), (3,5), (4,5), (5,5), (6,5)\}\)` The Union of A and B is: `\(A\land B = \{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,5),\\ (3,5), (4,5), (5,5), (6,5)\}\)` ]] --- ## Mutually Exclusive Events .row[.col-7[ When two events are **mutually exclusive** (that is the two events cannot occur together), their joint probability is 0. ### Example A = tosses totaling 7, and B = tosses totaling 11. `\(P(A\land B) =0\)` ]] --- ## Venn and Euler Diagrams <img src="05.probability_files/figure-html/unnamed-chunk-2-1.png" width="60%" style="display: block; margin: auto;" /> A: 1, 2, 3, 4, 5, 6 B: 2, 4, 6, 8 D: 7 --- ## Contingency Table .pink[showing relative frequencies] .row[.col-7[ | |Mutual fund outperforms the market |Mutual fund doesn’t outperform the market| |---|-:|-:| |Top 20 MBA program| 0.11| 0.29 |Not top 20 MBA program| 0.06| 0.54 ]] --- ## Contingency Table .pink[showing relative frequencies] .row[.col-8[ Alternatively, we can introduce shorthand notation to represent the events: `\(A_1\)` = Fund manager graduated from a top-20 MBA program `\(A_2\)` = Fund manager did not graduate from a top-20 MBA program `\(B_1\)` = Fund outperforms the market `\(B_2\)` = Fund does not outperform the market ] .col-4[ | | `\(B_1\)` | `\(B_2\)`| |---|-:|-:| | `\(A_1\)` | 0.11| 0.29 | `\(A_2\)` | 0.06| 0.54 ]] --- ## Marginal Probability .row[.col-7[ **Marginal probabilities** are computed by adding across rows and down columns; that is they are calculated in the margins of the table. | | `\(B_1\)` | `\(B_2\)`| `\(P(A_i)\)` |---|-:|-:|-:| | `\(A_1\)` | 0.11| 0.29 | 0.40 | `\(A_2\)` | 0.06| 0.54 | 0.60 | `\(P(B_i)\)` | 0.17 | 0.83| 1.00 ]] --- ## Conditional Probability .row[.col-7[ ...is used to determine how two events are related; that is, we can determine the probability of one event given the occurrence of another related event. The probability of event A given event B is `$$P(A ∣ B) = \frac{P(A \land B)}{P(B)}$$` Note how “A given B” and “B given A” are related: `$$P(B ∣ A) = \frac{P(A \land B)}{P(A)}$$` ]] --- .row[ .col-7[ .question[ What is the probability that a fund will outperform the market given that the manager graduated from a top-20 MBA program?] ]] .row[ .col-7[ `\(A_1\)` = Fund manager graduated from a top-20 MBA program `\(B_1\)` = Fund outperforms the market | | `\(B_1\)` | `\(B_2\)`| `\(P(A_i)\)` |---|-:|-:|-:| | `\(A_1\)` | 0.11| 0.29 | 0.40 | `\(A_2\)` | 0.06| 0.54 | 0.60 | `\(P(B_i)\)` | 0.17 | 0.83| 1.00 <br/> `\(\Rightarrow\)` we want to know: “what is `\(P(B_1 | A_1 )\)`?" `$$P(B_1 ∣ A_1) = \frac{P(A_1 \land B_1)}{P(A_1)} = \frac{0.11}{0.40}=0.275$$` ]] --- ## Independent Events .row[.col-7[ One of the objectives of calculating conditional probability is to determine whether two events are related. In particular, we would like to know whether they are independent, that is, if the probability of one event is not affected by the occurrence of the other event. Two events A and B are said to be independent if `$$P(A ∣ B) = P(A)$$` or `$$P(B ∣ A) = P(B)$$` ]] --- .row[.col-7[ | | `\(B_1\)` | `\(B_2\)`| `\(P(A_i)\)` |---|-:|-:|-:| | `\(A_1\)` | 0.11| 0.29 | 0.40 | `\(A_2\)` | 0.06| 0.54 | 0.60 | `\(P(B_i)\)` | 0.17 | 0.83| 1.00 `$$P(B_1 ∣ A_1) = \frac{P(A_1 \land B_1)}{P(A_1)} = \frac{0.11}{0.40}=0.275$$` The marginal probability for `\(B_1\)` is: `\(P(B_1) = 0.17\)`. Since `\(P(B_1 |A_1) \neq P(B_1)\)`, `\(B_1\)` and `\(A_1\)` are not independent events. Stated another way, they are dependent. That is, the probability of one event ( `\(B_1\)`) is affected by the occurrence of the other event ( `\(A_1\)`). ]] --- .your-turn[ .row[.col-7[ ## Exercise Calculate the marginal probabilities from the following table of joint probabilities. | | `\(A_1\)` | `\(A_2\)` | |---|---:|---:| | `\(B_1\)` | .4 | .3 | `\(B_2\)` | .2 | .1 <br/> 1. Determine `\(P(A_1 ∣ B_1 )\)`. 2. Determine `\(P(A_2 ∣ B_1 )\)`. 3. Did your answers to parts (a) and (b) sum to 1? Is this a coincidence? Explain. 4. Are the events independent? Explain. ]]] --- .your-turn[ .row[.col-7[ ## Exercise | | `\(A_1\)` | `\(A_2\)` | |---|---:|---:| | `\(B_1\)` | .20 | .60 | `\(B_2\)` | .05 | .15 <br/> Are the events independent? Explain. ]]] --- ## Rules for probability computations .row[.col-7[ **Complement Rule** `\(P(A^C) = 1 − P(A)\)` for any event `\(A\)`. **Multiplication Rule** The joint probability of any two events `\(A\)` and `\(B\)` is `$$P(A \land B) = P(B)P(A∣B) = P(A)P(B∣A)$$` **Multiplication Rule for Independent Events** The joint probability of any two independent events A and B is `$$P(A \land B) = P(A)P(B)$$` **Addition Rule** The probability that event `\(A\)`, or event `\(B\)`, or both occur is `$$P(A \lor B) = P(A) + P(B) − P(A \land B)$$` Don't count the joint event twice! ]] --- ## Probability Trees .row[.col-7[ A graduate statistics course has seven male and three female students. The professor wants to select two students at random to help her conduct a research project. What is the probability that the two students chosen are female? We want to answer the question: what is `\(P(A \land B)\)`? ]] --- class: inverse ## Probability Trees `$$P(A \land B) = P(A)P(B|A) = (3/10)(2/9) = 6/90 = 0.067$$` <img src="05.probability_files/figure-html/unnamed-chunk-3-1.png" width="75%" style="display: block; margin: auto;" /> --- .your-turn[ .row[.col-7[ ## Exercise Approximately 10% of people are left-handed. If two people are selected at random, what is the probability of the following events? 1. Both are right-handed. 2. Both are left-handed. 3. One is right-handed and the other is left-handed. 4. At least one is right-handed. ]]] --- class: inverse ## a. Both are right-handed. `$$P(R,r)= P(R)\cdot P(r|R)= 0.9 \cdot 0.9 = 0.81$$` <img src="05.probability_files/figure-html/unnamed-chunk-4-1.png" width="75%" style="display: block; margin: auto;" /> --- class: inverse ## b. Both are left-handed `$$P(L,l) = P(L)\cdot P(l|L)= 0.1 \cdot 0.1 = 0.01$$` <img src="05.probability_files/figure-html/unnamed-chunk-5-1.png" width="75%" style="display: block; margin: auto;" /> --- class: inverse ## c. 1 is right-handed & the other is left-handed `$$P(R,l \lor L,r) = P(R)\cdot P(l|R) + P(L)\cdot P(L|r)\\ = 0.1\cdot 0.9 + 0.9\cdot 0.1 = 0.18$$` <img src="05.probability_files/figure-html/unnamed-chunk-6-1.png" width="75%" style="display: block; margin: auto;" /> --- class: inverse ## d. At least one is right-handed `$$P(L,r)+P(R,l)+P(R,r) = 1-P(L,l) = 1-0.01=0.99$$` <img src="05.probability_files/figure-html/unnamed-chunk-7-1.png" width="75%" style="display: block; margin: auto;" /> --- .your-turn[ ## Exercise .row[.col-7[ A telemarketer calls people and tries to sell them a subscription to a daily newspaper. On 20% of her calls, there is no answer or the line is busy. She sells subscriptions to 5% of the remaining calls. For what proportion of calls does she make a sale? ]]] --- class: inverse <img src="05.probability_files/figure-html/unnamed-chunk-8-1.png" width="75%" style="display: block; margin: auto;" /> --- .your-turn[ ## Exercise .row[.col-7[ A financial analyst has determined that there is a 22% probability that a mutual fund will outperform the market over a 1-year period provided that it outperformed the market the previous year. If only 15% of mutual funds outperform the market during any year, what is the probability that a mutual fund will outperform the market 2 years in a row? ]]] --- class: inverse <img src="05.probability_files/figure-html/unnamed-chunk-9-1.png" width="75%" style="display: block; margin: auto;" /> --- ## Bayes’ Law .row[.col-7[ In its most basic form, if we know `\(P(B | A)\)`, we can apply Bayes’ Law to determine `\(P(A | B)\)` ]] ### Pay €500 for MBA prep? .row[.col-6[ The GMAT is a requirement for applicants of MBA programmes. There are a variety of preparatory courses designed to help improve GMAT scores, which range from 200 to 800. Suppose that a survey of MBA students reveals that among GMAT scorers above 650, 52% took a preparatory course, whereas among GMAT scorers of less than 650 only 23% took a preparatory course. ] .col-6[ An applicant to an MBA programme has determined that he needs a score of more than 650 to get into a certain MBA programme, but he feels that his probability of getting that high a score is quite low--10%. He is considering taking a preparatory course that cost €500. He is willing to do so only if his probability of achieving 650 or more doubles. What should he do? ]] --- .row[.col-7[ Let `\(A\)` = GMAT score of 650 or more, hence `\(A^C\)` = GMAT score less than 650 Our student has determined the probability of getting greater than 650 (without any prep course) as 10%, that is: `\(P(A) = 0.10\)` It follows that `\(P(A^C) = 1 – 0.10 = 0.90\)` Let `\(B\)` represent the event “take the prep course” and thus, `\(B^C\)` is “do not take the prep course”. From our survey information, we’re told that among GMAT scorers above 650, 52% took a preparatory course, that is: `\(P(B | A) = 0.52\)` (Probability of finding a student who took the prep course given that they scored above 650...) But our student wants to know `\(P(A | B)\)`, that is, what is the probability of getting more than 650 given that a prep course is taken? If this probability is > 20%, he will spend €500 on the prep course. ]] --- .row[.col-7[ Among GMAT scorers of less than 650 only 23% took a preparatory course. That is: `\(P(B |A C ) = 0.23\)` (Probability of finding a student who took the prep course given that he or she scored less than 650...) The conditional probabilities are `\(P(B | A) = 0.52\)` and `\(P(B |A^C ) = 0.23\)` Again using the complement rule we find the following conditional probabilities. `\(P(B^C | A) = 1 -0.52 = 0.48\)` and `\(P(B^C | A^C ) = 1 -0.23 = 0.77\)` Recall `\(P(A ∣ B) = \frac{P(A \land B)}{P(B)}\)` Unfortunately, we don’t know `\(P(A \land B)\)` and we don’t know `\(P(B)\)`. ]] --- class: inverse .row[ .col-7[ In order to go from `\(P(B | A) = 0.52\)` to `\(P(A | B) = ?\)` we need to apply Bayes’ Law `$$P(A\land B) = P(A)\cdot P(B|A) = 0.052$$` ]] .row[ <img src="05.probability_files/figure-html/unnamed-chunk-10-1.png" width="60%" style="display: block; margin: auto;" /> ] --- class: inverse `$$P(B) = P(A\land B) + P(A^c\land B) = 0.259$$` <img src="05.probability_files/figure-html/unnamed-chunk-11-1.png" width="60%" style="display: block; margin: auto;" /> --- .row[.col-7[ `$$P(A\land B) = P(A)\cdot P(B|A) = 0.052$$` `$$P(B) = P(A\land B) + P(A^c\land B) = 0.259$$` Thus, `$$P(A ∣ B) = \frac{P(A \land B)}{P(B)} = \frac{0.052}{0.259} = 0.201$$` The probability of scoring 650 or better doubles to 20.1% when the prep course is taken. ]] --- ## Bayesian Terminology .row[.col-7[ **prior probabilities** The probabilities `\(P(A)\)` and `\(P(A^c)\)` are called prior probabilities because they are determined prior to the decision about taking the preparatory course. **posterior probabilities** The conditional probability `\(P(A | B)\)` is called a posterior probability (or revised probability), because the prior probability is revised after the decision about taking the preparatory course. **likelihood probabilities** The probabilities `\(P(B|.)\)` and `\(P(B^c|.)\)` are called likelihood probabilities. ]] --- .your-turn[ ## Exercise .row[.col-7[ A foreman for an injection-molding firm admits that on 10% of his shifts, he forgets to shut off the injection machine on his line. This causes the machine to overheat, increasing the probability from 2% to 20% that a defective molding will be produced during the early morning run. 1. What proportion of moldings from the early morning run is defective? 2. The plant manager randomly selects a molding from the early morning run and discovers it is defective. What is the probability that the foreman forgot to shut off the machine the previous night? ]]] --- .your-turn[ ## Exercise .row[.col-7[ Three airlines serve a small town in Ohio. Airline A has 50% of all the scheduled flights, airline B has 30%, and airline C has the remaining 20%. Their on-time rates are 80%, 65%, and 40%, respectively. A plane has just left on time. What is the probability that it was airline A? ]]] --- .row[.col-9[ ![](img/monty1-small.png) ] .col-3[ ## Monty Hall ] ] --- .row[.col-9[ ![](img/monty2-small.png) ] .col-3[ ## Monty Hall ]]