Probability Distributions and the Central Limit Theorem II

class: middle, title-slide

# Probability Distributions and the Central Limit Theorem II
## Continuous Probability Distributions
### Dennis A. V. Dittrich
### 2021

---

layout: true

---

## Continuous Random Variable

.row[.col-6[
Unlike a discrete random variable, a **continuous random variable** is one that can  assume an uncountable number of values.
*  We cannot list the possible values because there is an infinite number of them.
*  Because there is an infinite number of values, the probability of each individual value is virtually 0.	
*  We can determine the probability of a range of values only.
]
.col-6[
*  With a discrete random variable like tossing a die, it is meaningful to talk about `$P(X=5).$`
*  In a continuous setting, the probability the random variable of interest, say task length, takes exactly 5 minutes is infinitesimally small, hence `$P(X=5) = 0.$`
*  It is, however, meaningful to talk about `$P(X \leq 5).$`
]]

---

## Probability Density Functions
.row[.col-7[
A function `$f(x)$` is called a **probability density function**
over the range `$a \leq x \leq b$` if it meets the following
requirements:

1.  `$f(x) \geq 0$` for all `$x$` between `$a$` and `$b$`, and
2.  The total area under the curve between `$a$` and `$b$`  
  is `$1$`: `$$\int_a^b f(x)dx=1$$`
]]

---

.row[.col-6[
## Uniform Distribution
The uniform probability distribution (sometimes
called the rectangular probability distribution) is described by the function:		
$$ f(x) = \frac{1}{b-a}, \text{where } a\leq x\leq b$$

The height of the probability density function is the same for all values of x between a and b for a given distribution. 
]
.col-6[
![](img/u1.png)

![](img/u2.png)
]]

---
.tip[
## Example
.row[.col-7[
The amount of petrol sold daily at a service station is
uniformly distributed with a minimum of 2,000 litres and a
maximum of 5,000 litres.

Find the probability that daily sales will fall between 2,500
and 3,000 litres.

Algebraically: what is `$P(2500 \leq X \leq 3000)$`?

`\begin{align*}
P(2500 \leq X \leq 3000) &= \int_{2500}^{3000} \frac{1}{5000-2000}dx\\
 &= \frac{3000 - 2500}{3000}\\
 &\approx 0.167
\end{align*}`
]]
]
---
.tip[
## Example
.row[.col-7[
The amount of petrol sold daily at a service station is
uniformly distributed with a minimum of 2,000 litres and a
maximum of 5,000 litres.

What is the probability that the service station will sell at least
4,000 litres?

Algebraically: what is `$P(X \geq 4000)$`?
`\begin{align*}
P(X \geq 4000) &=\int_{4000}^{5000} \frac{1}{5000-2000}dx \\
&= \frac{5000-4000}{3000} = \frac{1}{3}
\end{align*}`
There is a one-in-three chance the petrol station will sell
more at least 4,000 litres on any given day.
]]
]
---

.tip[
## Example
.row[.col-7[
The amount of petrol sold daily at a service station is uniformly
distributed with a minimum of 2,000 litres and a maximum of
5,000 litres.
	
What is the probability that the station will sell exactly 2,500
litres?
	
Algebraically: what is `$P(X = 2500)$`?

`\begin{align*}
P(X = 2500) &=\int_{2500}^{2500} \frac{1}{5000-2000}dx\\
&= \frac{2500-2500}{3000}\\
&= 0
\end{align*}`

The probability that the petrol station will sell exactly
2,500 litres is zero.
]]
]
---

## Uniform Distribution
.row[.col-7[
**Mean and Expected Value**
`$$E(x)=\mu = \frac{a+b}{2}$$`

**Standard Deviation**
`$$\sigma = \sqrt{\frac{(b-a)^2}{12}}$$`
]]

---

## The Normal Distribution
.row[.col-7[
The normal distribution is the most important of all
probability distributions. The probability density function
of a normal random variable is given by:
`$$f(x)= \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \text{ with } -\infty < x < \infty$$`
]]
.row[
.col-7[
<img src="06.pdf-2_files/figure-html/rplot-normal-1.png" width="90%" style="display: block; margin: auto;" />
]
.col-5[
The graph of a normal distribution is called the  normal curve.
]
]

---

## Properties of a Normal Distribution
.row[.col-7[
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and is symmetric about the mean.
3. The normal curve approaches, but never touches the x-axis as it extends farther and farther away from the mean.
4. The total area under the normal curve is equal to one.
]
.col-5[
<img src="06.pdf-2_files/figure-html/rplot-normal-1.png" width="90%" style="display: block; margin: auto;" />
]]

---

## Properties of a Normal Distribution
.row[.col-7[
<ol start=6><li>Between `$\mu - \sigma$` and `$\mu + \sigma$` (in the center of the curve),
 the graph curves downward. The graph curves upward to the left of `$\mu - \sigma$` and to the right of `$\mu + \sigma$`.
   * **Inflection Point**: A point where the curve changes concavity (from concave up to concave down, or concave down to concave up).
   *  A curve that is concave **u**p looks like a **u**-shape
   *  A curve that is concave dow**n** looks like a **n**-shape
</li>
]
.col-5[
<img src="06.pdf-2_files/figure-html/rplot-normal-1.png" width="90%" style="display: block; margin: auto;" />
]]

---

.row[.col-6[
The normal distribution is fully defined by two parameters:
its standard deviation `$\sigma$` and mean `$\mu$`.

*  The normal distribution is bell shaped and symmetrical about the mean `$\mu$`.
*  Normal distributions range from minus infinity to plus infinity.
]
.col-6[
*  A normal distribution can have any mean and any positive standard deviation.
*  The mean gives the location of the line of symmetry.
*  The standard deviation describes the spread of the data.
]]

.row[.col-10[
![](img/normal2.png)
]]

---

## Understanding Mean and SD
.row[.col-7[
Which curve has the greater mean?

![](img/normal3.png)

Curve A has the greater mean.

The line of symmetry of curve A occurs at `$x = 15$`. The line of symmetry of curve B occurs at `$x = 12$`.
]]

---

## Understanding Mean and SD
.row[.col-7[
Which curve has the greater standard deviation?
	
![](img/normal3.png)
	
Curve B has the greater standard deviation.

Curve B is more spread out than curve A.
]]

---

## Normal Distribution
Increasing the mean shifts the curve to the right.

---

## Normal Distribution
Increasing the standard deviation *flattens* the curve...
	
<img src="06.pdf-2_files/figure-html/unnamed-chunk-3-1.png" width="60%" style="display: block; margin: auto;" />

---

## Interpreting Graphs of Normal Distributions
.row[.col-7[
The scaled test scores for New York State Grade 4
Common Core Mathematics Test are normally
distributed. The normal curve shown below represents
this distribution.

What is the mean test score? Estimate
the standard deviation of this normal distribution.
]]
.col-9[
![](img/normal6.png)
]

---

## Interpreting Graphs of Normal Distributions
.col-9[
![](img/normal6a.png)
]
.row[.col-7[
The scaled test scores for the New York State Grade 4
Common Core Mathematics Test are normally
distributed with a mean of about 305 and a standard
deviation of about 40.
]]

---

## Standard Normal Distribution
.row[.col-7[
A normal distribution whose mean is zero and standard
deviation is one is called the standard normal distribution.

`$$f(z)= \frac{1}{1\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{z-0}{1}\right)^2} \text{ with } -\infty < z < \infty$$`

Any normal distribution can be  converted to a standard normal distribution with simple
algebra: Any x-value can be transformed into a z-score (of a standard normal distribution) by
using the formula
`$$z=\frac{\text{Value}-\text{Mean}}{\text{Standard deviation}}= \frac{x-\mu}{\sigma}$$`
]]

---

## Properties of the Standard Normal Distribution
.row[.col-6[
1. The cumulative area is close to 0 for z-scores close to `$z = -3.49$`.
2. The cumulative area increases as the z-scores increase.
3. The cumulative area for `$z = 0$` is `$0.5$`.
4. The cumulative area is close to 1 for z-scores close to `$z = 3.49$`.
]
.col-6[
<img src="06.pdf-2_files/figure-html/rplot-normal-1.png" width="95%" style="display: block; margin: auto;" />
]]

---

.tip[
## Example
.row[.col-7[
Suppose that at another petrol station the daily demand
for regular petrol is normally distributed with a mean of
1,000 litres and a standard deviation of 100 litres.

The station manager has just opened the station for
business and notes that there is exactly 1,100 litres of
regular petrol in storage.

The next delivery is scheduled later today at the close of
business. The manager would like to know the probability
that he will have enough regular petrol to satisfy today’s
demands.
]]]

---

.tip[
.row[.col-7[
The demand is normally distributed with mean `$\mu = 1000$`
and standard deviation `$\sigma = 100$`. We want to find the
probability
 `$P(X < 1100)$`

Graphically we want to calculate:

![](img/normal7.png)
]]]

---

.tip[
.row[.col-7[
The first step is to standardize `$X$`. However, if we perform any
operations on `$X$`, we must perform the same operations on
1,100. Thus
`$$P(X < 1100) =
P\left(\frac{X-\mu}{\sigma} < \frac{1100-1000}{100}\right)  = P(Z < 1.00)$$`
]]
.row[.col-7[
![](img/normal7a.png)
]
.col-5[
The values of Z specify the location of the corresponding
value of X.

A value of `$Z = 1$` corresponds to a value of X that is 1
standard deviation above the mean.

Notice that the mean of Z --- which is `$0$` --- corresponds
to the mean of X
]]]

---

## Finding Areas Under the Standard Normal Curve
.row[.col-7[
1. Sketch the standard normal curve and shade the appropriate area under the curve.
2. Find the area by following the directions for each case shown.
  1. To find the area to the left of z, find the area that corresponds to z in the Standard Normal Table.
  2. To find the area to the right of z, use the Standard Normal Table to find the area that corresponds to z. Then subtract the area from 1.
  3. To find the area between two z-scores, find the area corresponding to each z-score in the
Standard Normal Table. Then subtract the smaller area from the larger area.
]]

---

![](img/normal8.png)

---

## The Standard Normal Distribution
.row[.col-7[
If we know the mean and standard deviation of a
normally distributed random variable, we can always
transform the probability statement about X into a
probability statement about Z.

Consequently, we need only one table, the standard normal probability table, for obtaining the probability distribution of all normal distributions.

This was a major benefit before we had access to electronic computers and had to use printed probability distribution tables...
]]

---

## Density and cumulative probability for the normal distribution
.row[.col-7[

```r
dnorm(x,m,s)
```

The normal density function in `R` gives the density of the normal distribution with mean `$m$` and standard deviation `$s$` at `$x$`.

```r
pnorm(x,m,s)
```

The normal cumulative probability function gives the cumulative probability of obersving at most `$x$` if `$X$` is normally distributed with mean `$m$` and standard deviation `$s$`.
]]

---
.row[.col-7[
Find the area under the standard normal curve to the left
of `$z = -0.99$`.

```r
pnorm(-0.99,0,1) %>% 
  round(digits=4)
```

```
## [1] 0.1611
```

<br/>
Find the area under the standard normal curve to the
right of `$z = 1.06$`.

```r
1-pnorm(1.06,0,1) %>% 
  round(digits=4)
```

```
## [1] 0.1446
```

<br/>
Find the area under the standard normal curve between
`$z = -1.5$` and `$z = 1.25$`.

```r
(pnorm(1.25,0,1)-pnorm(-1.5,0,1)) %>% 
  round(digits=4)
```

```
## [1] 0.8275
```
]]

---
.tip[
## Example
.row[.col-5[
Suppose that at another petrol station the daily demand
for regular petrol is normally distributed with a mean of
1,000 litres and a standard deviation of 100 litres.
	
The station manager has just opened the station for
business and notes that there is exactly 1,100 litres of
regular petrol in storage.
]
.col-7[
The next delivery is scheduled later today at the close of
business. The manager would like to know the probability
that he will have enough regular petrol to satisfy today’s
demands.

The demand is normally distributed with mean `$\mu = 1000$`
and standard deviation `$\sigma = 100$`. We want to find 
`\begin{align}
P(X < 1100) &= P\left(\frac{X-\mu}{\sigma} < \frac{1100-1000}{100}\right)\\
&= P(Z < 1.00)
\end{align}`
]]]

---
.tip[
`$$P(X < 1100) =
P\left(\frac{X-\mu}{\sigma} < \frac{1100-1000}{100}\right)  = P(Z < 1.00)$$`

.row[.col-7[
Using the standard normal...

```r
pnorm(1,0,1) %>% round(digits=4)
```

```
## [1] 0.8413
```

<br/>
...or directly the normal distribution given in the problem

```r
pnorm(1100,1000,100) %>% round(digits=4)
```

```
## [1] 0.8413
```

]]]

---

## Applications in Finance: Measuring Risk
.row[.col-7[
Consider an investment whose return is normally
distributed with a mean of 10% and a standard deviation
of 5%.

1. Determine the probability of losing money.
2. Find the probability of losing money when the standard deviation is equal to 10%.
]]

---

## Measuring Risk
.row[.col-7[
The investment loses money when the return is negative. Thus, we wish to determine:
`$P(X < 0)$`

```r
pnorm(0,10,5) %>% round(digits=4)
```

```
## [1] 0.0228
```

The probability of losing money is `$p=0.0228$`.

<br/>
If we increase the standard deviation to 10% the probability of suffering a loss becomes:

```r
pnorm(0,10,10) %>% round(digits=4)
```

```
## [1] 0.1587
```
]]
---

## Finding values Given a Probability}
.row[.col-7[
Often, we’re asked to find some value of `$Z$` for a given
probability, i.e. given an area ( `$A$`) under the curve, what is
the corresponding value of `$z$` ( `$z_A$`) on the horizontal axis
that gives us this area? That is:
`$$P(Z>z_A) =A$$`
]
.col-5[
![](img/normal9.png)
]]
.row[.col-7[
What value of `$z$` corresponds to an area under the curve
 of 2.5%? That is, what is `$z_{0.025}$`?

`$$(1-A)=(1-0.025) = 0.975$$`
]]
.row[.col-7[
If you do a *reverse look-up* on standard normal probability distribution table for `$p=0.975$`,
you will get the corresponding `$z_A = 1.96$`.
]
.col-5[

```r
qnorm(0.975,0,1) %>% 
  round(digits=4)
```

```
## [1] 1.96
```

]]
---

## Finding values for a Given Probability
.row[.col-7[

```r
qnorm(p,m,s)
```
In `R`, the normal quantile function gives the quantile of a normal distribution with mean `$m$` and standard deviation `$s$`.

A similar function exists for all other distribution function, their name is always `q<Name of the distribution>()`, i.e. `qbinom()`, `qpois()`, `qgeom()`, with corresponding parameters describing the distribution.	
]]

---

## The Exponential Distribution
.row[.col-7[
...has this probability density function

`$$f(x) = \lambda e^{-\lambda x} \text{ for } x\geq 0$$`
Often used to measure the time that elapses between two occurrences of an event.

*  The time between “hits” on an webpage
*  The time between arrivals of customers at a bank drive-in teller window
*  The time between failures of an electronic component
*  The time between phone calls 
*  The time between parts arriving at an assembly station quantity.
]
.col-5[
**Mean (time between events) and standard deviation**
`$$\mu=\sigma = \frac{1}{\lambda}$$`
]]

---

## The Exponential Distribution
.row[.col-7[
The exponential distribution depends upon the value of `$\lambda$`.

Smaller values of `$\lambda$` *flatten* the curve:
]]

---

## Exponential Distribution
.row[.col-7[
If `$X$` is an exponential random variable, then we can
calculate probabilities by:
`\begin{align*}
P(X>x) &= e^{-\lambda x}\\
P(X<x) &= 1-e^{-\lambda x}\\
P(x_1 < X < x_2) &= P(X<x_1)-P(X<x_2)\\
&= e^{-\lambda x_1} - e^{-\lambda x_2}\\
\end{align*}`

The density, cumulative probability, and quantile function in `R` for the exponential distribution are

```r
dexp(x,r)
pexp(x,r)
qexp(p,r)
```
with `$x$` being the quantile, `$p$` the probability, and `$r$` the rate `${\lambda}$`. 
]]

---

.tip[
## Example: Exponential Distribution
.row[.col-7[
The lifetime of an alkaline battery (measured in hours) is
exponentially distributed with `$\lambda = 0.05$`.

What is the probability a battery will last between 10 and 15 hours?
`\begin{align*}
P(10\leq X\leq 15) &= e^{-0.05(10)} - e^{-0.05(15)}\\ 
&\approx 0.6065 - 0.4724 = 0.1341
\end{align*}`

```r
(pexp(15,0.05) - pexp(10,0.05)) %>% 
  round(digits=4)
```

```
## [1] 0.1342
```
There is about a 13% chance a battery will last between 10 to 15 hours.
]]]

---

## Other Continuous Distributions
.row[.col-7[
Three other important continuous distributions which will
be used later are:
]]
.row[.col-6[
**Student t Distribution**
* ...that we will use instead of the normal distribution when we do not know the standard deviation of the distribution and therefore have to estimate it from the data first.

**F Distribution**
* ...that we will use for a so-called analysis of variance.

]
.col-6[
**Chi-Squared Distribution**
* ...that we will use for goodness of fit test of an observed distribution to a theoretical one, to test for the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation.

]]