Distribution of Data

Distribution of Data

In this lesson, you’re expected to:
– understand the concepts of variance and standard deviation
– learn the basics of probability
– explore how to solve problems using the Frequentist and Bayesian approaches

What is the distribution of our data?

In the previous lesson, we explained how to find the center of a dataset. So once we have information about the center of our sample (characterized either by the mean, median or mode), what’s next?

To further understand our data, we need to know how this data is distributed.

Are our observations narrowly clustered around the average? Or are they very dispersed?

The Variance

As the mean is not a sufficient descriptor of data, we can better describe the data if we combine it with the variance.

The variance measures the amount of “spread” among the different values of random variable X of a sample or population.
It is computed as follows:

Where n = total number of observations and μ represents the mean.
The Standard Deviation

Variance is a good measure of spread, however it’s not easy to interpret.

Computing Standard Deviation

That result is not straightforward to interpret as the mean and variance are in different units. If the mean is in dollars the variance in squared dollars, and is difficult for us to think about square dollars.

Therefore, we compute the square root of the variance, σ, which is called the standard deviation.

[Optional] Standard Deviation
Measuring Asymmetry
Skewness is a statistic that measures the asymmetry of a set of n data samples xi.

The numerator is the mean squared deviation (or variance) and the denominator is the mean cubed deviation.

Negative deviation indicates that the distribution “skews left” (it extends farther to the left than to the right).

Skewness can be affected by outliers! A simpler alternative is to look at the relationship between mean (μ) and median.

Pearson’s median skewness coefficient is a more robust alternative:

Data Distributions

Summarizing the data into its average and a measure of its dispersion is not necessarily fully informative and can be dangerous, since very different data can be described by the same statistics. Hence, basic statistics and the conclusions derived must be validated by inspecting the data.

To this end, we usually visualize the data distribution, which describes how frequently each value appears.

The most common representation of a distribution is a histogram(which we saw in Lesson 3), which is a graph that shows the frequency of each value.

We can normalize frequencies by dividing by n (the number of samples).

It is common to represent the probability mass function or PMF. This function is obtained by normalizing the histogram, by dividing each frequency by the total number of observations (n).

Basics of Probability
Differences between the Frequentist paradigm and the Bayesian paradigm

The main assumptions of the Frequentist paradigm are the following:

• Data are a repeatable random sample – there is a frequency.
• Underlying assumptions remain constant during this repeatable process.
• Parameters are fixed.

On the other hand, the assumptions of the Bayesian paradigm are the following:

• Data are observed from a realized sample.
• Parameters are unknown and described probabilistically.
• Data are fixed.

Frequentist Point of View

Probability is the measure of the chance that an event will occur as a result of an experiment or trial.

In other words, the probability of an event is the number of ways the event can occur divided by the total number of possible outcomes.

Thus, it is a real value between 0 and 1 that is intended to be a measure corresponding to the idea that some things are more likely than others.

For example, In the case of a six-sided die, each roll is called a trial. If we want to compute the probability of obtaining a 2, each time the dice shows a 2 is called a success, while other trials are called failures. If we perform n identical trials, we will expect to observe s successes. Hence the probability of a success is s/n.

Binomial Probability Distribution

The binomial probability distribution is a discrete distribution that represents the probability (p) of getting a success k times in n trials in a yes/no experiment.

The probability is given by the following formula:

* In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n. For example, 5! = 5 × 4 × 3 × 2 × 1 = 120.
Probability Mass Function
The probability of getting exactly k successes in n trials is given by the probability mass function depicted below.

Enlarged version: http://bit.ly/2mvUCno

Monte Carlo Experiments

Monte Carlo experiments are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.

Typically, one runs simulations many times over in order to solve problems that might be deterministic in principle.

Continuous Distributions

1) Exponential Distribution

In probability theory and statistics, the exponential distribution describes the time between events in a Poisson process.

In a Poisson process, events occur continuously and independently at a constant average rate. Many real world examples follow this distribution. For example, the time for the next bus or taxi to arrive.

Enlarged version: http://bit.ly/2mMsxu1
2) Uniform Distribution

The normal or Gaussian distribution is probably the most used and known continuous probability distribution.

The curve of this distribution is perfectly symmetrical around the mean of the distribution.

As a result, the distribution mean is identical to the two alternative measures of central tendency: the median and the mode.

Enlarged version: http://bit.ly/2nvoceK
Another important feature of this distribution is that it provides the basis for specifying the number of observations that should fall within select portions of the curve.

Around 68% of the observations will likely fall within one standard deviation of the mean.

Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.

[Optional] Statistical Distributions
Bayesian Approach
Who will win the next Formula 1 World Championship? 

In this case, we cannot use the Frequentist approach as we cannot conduct identical trials. Each year the scenario is very different, rules vary, circuits may not been the same, even the drivers will be different or can make a move from one team to another.

Bayesian theory defines probability as the degree of belief that an event will occur.

Bayes Theorem describes the probability of an event occurring, given that a previous event has occurred. It is based on prior knowledge of conditions that might be related to the event.
• P(A|B) is the probability of observing event A given that B occurs
• P(A) is the probability of observing A
• P(B) is the probability of observing B
• P(B|A) is the probability of observing event B given that A occurs
We can interpret the theorem by assuming that A is the hypothesis and B is the new evidence that modifies our belief in A.

For example, event A can be that the person contacting our company will sign up for a new account, and event B, that he contacted the company by phone. Since the prior observed probability of signing up is higher for customer calling than for those visiting the office, the fact that he called can change our degree of belief on whether he will sign up.

The Monty Hall Problem

You’ve made it to the final round of a game show, and have to pick between 3 doors, one of which has a car behind it! You make your choice, and then the host decides to show you one of the wrong answers. He then offers you the chance to switch doors. Should you do it?
Watch this 3-minute video to learn more: https://www.youtube.com/watch?v=9vRUxbzJZ9Y
The best strategy is to change your choice!

Our intuition may tell us that once Monty opens the door, it makes no difference if you change but as you saw in the video, this is wrong!

The truth is that if you stick to your choice, the probability of winning is ⅓, the same as in your first try. However, if you switch to another option, the chances increase to ⅔.

Let’s use Bayesian theory to solve this problem. 

In the beginning, there are different hypotheses. The chances that the car is behind door A, B, or C are the same and equal to ⅓.
=> P(A) = P(B) = P(C) = ⅓

Suppose we choose B and Monty opens one of the other two doors (C in this case), what is the probability of success if you stick to your first decision?

Lets understand the denominator. 

We initially chose B, so if the car is behind B, Monty will show us a goat behind A half the time. If the car is behind C, Monty never shows us a goat behind C. Finally, if the car is behind A, Monty shows us a goat behind C every time.

What are the probabilities if we decide to switch?
As we have seen, the probability that the car is behind B is ⅓ and the sum of the two probabilities must equal 1.

Therefore, the probability that the car is behind C = 1 – ⅓ = ⅔.

Jim Rohn Sứ mệnh khởi nghiệp