**Distribution of Data**

**In this lesson, you’re expected to:**

– understand the concepts of variance and standard deviation

– learn the basics of probability

– explore how to solve problems using the Frequentist and Bayesian approaches

**What is the distribution of our data?**

In the previous lesson, we explained how to find the center of a dataset. So once we have information about the center of our sample (characterized either by the mean, median or mode), what’s next?

To further understand our data, we need to know how this data is distributed.

*Are our observations narrowly clustered around the average? Or are they very dispersed?*

**The Variance**

As the mean is not a sufficient descriptor of data, we can better describe the data if we combine it with the variance.

The variance * measures the amount of “spread” among the different values of random variable X* of a sample or population.

It is computed as follows:

**represents the mean.**

**The Standard Deviation**

Variance is a good measure of spread, however it’s not easy to interpret.

**Computing Standard Deviation**

That result is not straightforward to interpret as the mean and variance are in different units. If the mean is in dollars the variance in squared dollars, and is difficult for us to think about square dollars.

Therefore, we compute the square root of the variance, σ, which is called the * standard deviation*.

**Standard Deviation**

https://statistics.laerd.com/statistical-guides/measures-of-spread-standard-deviation.php

**Measuring Asymmetry**

*is a statistic that measures the asymmetry of a set of n data samples xi.*

**Skewness**The numerator is the mean squared deviation (or variance) and the denominator is the mean cubed deviation.

Skewness can be affected by outliers! A simpler alternative is to look at the relationship between mean (μ) and median.

* Pearson’s median skewness coefficient* is a more robust alternative:

**Data Distributions**

Summarizing the data into its average and a measure of its dispersion is not necessarily fully informative and can be dangerous, since very different data can be described by the same statistics. Hence, basic statistics and the conclusions derived must be validated by inspecting the data.

To this end, we usually visualize the data distribution, which describes how frequently each value appears.

The most common representation of a distribution is a **histogram***(which we saw in Lesson 3)*, which is a graph that shows the frequency of each value.

It is common to represent the probability mass function or PMF. This function is obtained by normalizing the histogram, by dividing each frequency by the total number of observations (n).

**Basics of Probability**

**Differences between the Frequentist paradigm and the Bayesian paradigm**

The main assumptions of the * Frequentist* paradigm are the following:

• Data are a repeatable random sample – there is a frequency.

• Underlying assumptions remain constant during this repeatable process.

• Parameters are fixed.

*paradigm are the following:*

**Bayesian**• Data are observed from a realized sample.

• Parameters are unknown and described probabilistically.

• Data are fixed.

**Frequentist Point of View**

* Probability* is the measure of the chance that an event will occur as a result of an experiment or trial.

In other words, the probability of an event is the number of ways the event can occur divided by the total number of possible outcomes.

Thus, it is a real value between 0 and 1 that is intended to be a measure corresponding to the idea that some things are more likely than others.

For example, In the case of a six-sided die, each roll is called a trial. If we want to compute the probability of obtaining a 2, each time the dice shows a 2 is called a * success*, while other trials are called

*. If we perform n identical trials, we will expect to observe s successes. Hence the probability of a success is s/n.*

**failures****Binomial Probability Distribution**

The binomial probability distribution is a * discrete* distribution that represents the probability (p) of getting a success

*in*

**k****times***in a yes/no experiment.*

**n trials**The probability is given by the following formula:

**factorial**of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n. For example, 5! = 5 × 4 × 3 × 2 × 1 = 120.

**Probability Mass Function**

*k*successes in

*n*trials is given by the probability mass function depicted below.

Enlarged version: http://bit.ly/2mvUCno

**Monte Carlo Experiments**

Monte Carlo experiments are a broad class of * computational algorithms* that rely on repeated random sampling to obtain numerical results.

Typically, one runs simulations many times over in order to solve problems that might be deterministic in principle.

**Continuous Distributions**

*1) Exponential Distribution*

In probability theory and statistics, the exponential distribution describes the time between events in a * Poisson* process.

In a Poisson process, events occur continuously and independently at a constant average rate. Many real world examples follow this distribution. For example, the time for the next bus or taxi to arrive.

**2) Uniform Distribution**The * normal or Gaussian distribution* is probably the most used and known continuous probability distribution.

The curve of this distribution is perfectly symmetrical around the mean of the distribution.

As a result, the distribution mean is identical to the two alternative measures of central tendency: the median and the mode.

* Around 68% of the observations will likely fall within one standard deviation of the mean*.

Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.

**Statistical Distributions**

http://onlinestatbook.com/2/introduction/distributions.html

**Bayesian Approach**

*Who will win the next Formula 1 World Championship?*

In this case, we cannot use the Frequentist approach as we cannot conduct identical trials. Each year the scenario is very different, rules vary, circuits may not been the same, even the drivers will be different or can make a move from one team to another.

Bayesian theory defines probability as the degree of belief that an event will occur.

*to the event.*

**based on prior knowledge of conditions****that might be related**• P(A) is the probability of observing A

• P(B) is the probability of observing B

• P(B|A) is the probability of observing event B given that A occurs

For example, event A can be that the person contacting our company will sign up for a new account, and event B, that he contacted the company by phone. Since the prior observed probability of signing up is higher for customer calling than for those visiting the office, the fact that he called can change our degree of belief on whether he will sign up.

**The Monty Hall Problem**

You’ve made it to the final round of a game show, and have to pick between 3 doors, one of which has a car behind it! You make your choice, and then the host decides to show you one of the wrong answers. He then offers you the chance to switch doors. Should you do it?

**The best strategy is to change your choice!**Our intuition may tell us that once Monty opens the door, it makes no difference if you change but as you saw in the video, this is wrong!

The truth is that if you stick to your choice, the probability of winning is ⅓, the same as in your first try. However, if you switch to another option, the chances increase to ⅔.

*Let’s use Bayesian theory to solve this problem.*

In the beginning, there are different hypotheses. The chances that the car is behind door A, B, or C are the same and equal to ⅓.

=> P(A) = P(B) = P(C) = ⅓

Suppose we choose B and Monty opens one of the other two doors (C in this case), what is the probability of success if you stick to your first decision?

*Lets understand the denominator.*

We initially chose B, so if the car is behind B, Monty will show us a goat behind A half the time. If the car is behind C, Monty never shows us a goat behind C. Finally, if the car is behind A, Monty shows us a goat behind C every time.

*What are the probabilities if we decide to switch?*

As we have seen, the probability that the car is behind B is ⅓ and the sum of the two probabilities must equal 1.

Therefore, the probability that the car is behind C = 1 – ⅓ = ⅔.

**Bayes Theorem**

http://blogs.sas.com/content/sastraining/2011/01/31/the-bayes-theorem-explained-to-an-above-average-squirrel/