**Confidence Intervals & Hypothesis Testing**

**In this lesson, you’re expected to:**

– learn what confidence intervals are and how to identify them

– understand how to use hypothesis testing with confidence intervals

*Let’s begin this lesson by revising the standard error and the Central Limit Theorem, as they will be useful to understand confidence intervals.*

**Standard Error of the Mean**

*The standard error (SE) of a sample mean is the estimated standard deviation of the error in the process by which it was generated.*

In other words, it is the * standard deviation of the sampling distribution* of the sample statistic.

The formula to calculate standard error is the following:

**Central Limit Theorem**

*If a sample consists of at least 30 independent observations and the distribution of the underlying population data is not strongly skewed, then the distribution of the sample means is well approximated by a normal distribution.*

**Understanding the Central Limit Theorem**

http://www.qualitydigest.com/inside/twitter-ed/understanding-central-limit-theorem.html#

**What is a Confidence Interval?**

*Informally, we can say that a confidence interval is*

*.*

**a range of numbers in which the true value of a parameter is likely to fall**Confidence intervals complement an estimation, as an estimation does not provide information of how much the parameter varies from observation to observation.

Hence, instead of just estimating the value of a parameter, the next step is to provide a plausible range of values for the parameter. And this is what a confidence interval provides.

*For example, imagine we want to know how many customers we expect to call customer service daily.*

The first step would be to * make an estimation of the parameter*(the number of customers calling). For this purpose, we can

*. Let’s say we have a mean of 500 customers calling daily.*

**compute the mean**Next, we need to know * how much this quantity varies* from day to day. The number of telephone operators that the manager will hire would be very different if he expects peaks of around 1000 calls than if he only expects maximum values of 600. Thus, we need to

*. Once we have calculated them, we will know that for example, we can expect between 300 and 700 calls per day.*

**calculate confidence intervals***, its standard error represents the standard deviation associated with the estimate.*

**normal distribution**In this case, we know that * roughly 95% of the time, the estimation will fall within 2 standard errors of the parameter*.

In other words, there is a 95% chance that the sample mean is between μ ± 2 SE.

Hence, in such a case, we can say that we are 95% confident that we have captured the true parameter.

**What does being 95% confident mean?**Suppose we take lots of samples to estimate the mean of a population and build a confidence interval from each sample.

Then, 95% of those intervals will contain the actual value of the mean.

**Sampling Distributions and Confidence Intervals**

**Identify Confidence Intervals**

*Creating more Confidence Intervals**Remember: why did we choose 2 standard errors to build the confidence interval? *

The decision was based on the general guideline that around 95% of the time, observations fall within two standard deviations of the mean.

However, * for the normal distribution* we can be more accurate as the exact value is

*rather than 2.*

**1.96**CI (95%) = Estimate ± (1.96 × SE)

*With the previous formula, we are limited to being 95% confident.*

*What if the manager of the company wants to be 99% confidentabout the result?*

**How to Build Confidence Intervals**

In general, if the point estimate follows the normal model, we can build any confidence interval for the estimate parameter with the following formula:

**Estimate ± (z × SE)**

The value of z can be obtained from the Z table:

http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf

Hence, for a * 99% confidence interval*, the value of z corresponds to

*.*

**2.58***How do we interpret the 99% confidence interval?*

In 99% of the cases, if you compute the confidence interval from a sample, the true value of the parameter for the population will fall within ± CI (99%).

Green dots represent the * average number of sales*, and lines the confidence interval. Red lines are used to highlight those confidence intervals that do not contain the actual mean of the number of sales. The black horizontal line shows this value.

*Does the figure make sense? Do the number of samples for which the confidence interval does not include the actual value seem logical?**

*. Hence, we would expect that if we take infinite samples, for 5% of them, the confidence interval would not contain the right value, while the remaining 95% of the samples will.*

**95% sure that the confidence intervals will contain the actual value of the mean**In the figure, we can appreciate that only 3 out of 100 do not contain the actual value. These three cases are highlighted in red. This value represents 3% which is very close to 5%. Thus, the result is logical.

**Hypothesis Testing**

**Hypothesis Testing with Confidence Intervals**

A bank is performing an analysis of their customers who are closing their account. In particular, the manager is interested in knowing whether those leaving the bank have on average more or less money in their account than those who stay.

*How do we proceed? *

To answer this question, we need hypothesis testing.

**What is a hypothesis test?**

*. The first one includes all the account holders who did not close their account last month while the second group is composed of those who kept their account open.*

**separate our clients into two populations**2) * Compute the mean for each group* and compare the means of both groups. The result is that those leaving had on average $32,000 and those who stayed had only $30,000.

So there is a difference in means, but **is the observed effect statistically significant?**

*So now we have two hypotheses:*

• H0: The money in the accounts of those leaving and those staying is the same. This is the * null hypothesis*, that states that the observed effect is due to chance or a bias in the sample.

• H1: The mean amount of money in the accounts of those leaving is higher than the mean amount of money in the accounts of those who stay. H1 is known as the * alternative hypothesis* which is what you believe or hope to prove to be true.

**The null and alternative hypotheses**

*to test whether our hypothesis H1 is true and if we can reject the null hypothesis or not.*

**confidence intervals**To this end, we compute the confidence interval of the mean estimation for both populations.

Suppose the * 95% confidence interval* of the money of those leaving the bank is

*. Since the mean of those staying is $30,000 and this value is in the range of probable values, we*

**[$29,000, $33,000]***.*

**cannot reject the null hypothesis****Understanding Hypothesis Tests**