Relationships Between Variables

Relationships Between Variables

In this lesson, you’re expected to:
– learn how to interpret the relationship between two variables
– understand the concepts of correlation and covariance
– find out which measure of association to use in different situations
Relationships between Variables

Are two variables related and how? 

This is a question that arises in business quite often. It seems reasonable that the number of sales of a product is positively related to advertising expenditures.

But did this really happen in the last campaign? How strong was this relation? Can we quantify it? Were sales more related to the budget spent on TV or Newspapers?

Measures of Association

These questions are not easy to answer and the information can definitely not be obtained just from a bivariate scatter plot.

Hence, we need measures of association that help us answer these questions.

Covariance and correlation are the two most common measures of association between numerical variables.

Covariance and correlation
Both measures indicate whether two variables are related or not and the sign of the relation.

Hence, a positive correlation or covariance means that both variables increase and decrease together. This can happen in marketing campaigns. When the expenditure in advertising increases, the number of sales also increases.

By contrast, a negative measure of correlation or covariance tells us that when one variable increases the other decreases.

For example, in the car industry, horsepower and miles per gallon (mpg) are usually negatively related, as more powerful cars tend to consume more fuel.

Finally, if two variables are not related, both measures will give a value close to zero.

Covariance 

Covariance is a measure of the tendency of two variables to vary together.

The formula to compute covariance is as follows:

Sxy = the sample covariance between variables X and Y
x̅ = the sample mean for X
ȳ = the sample mean for Y
n = the number of elements in both samples.
Xi = a single element in the sample for X
Yi = a single element in the sample for Y
Limitations of Covariance

The main problem with covariance is that it is a measure that’s difficult to interpret (units are products of units!) and can take any positive or negative number.

So it is easy to know whether two variables are positively or negatively related but very difficult to quantify the strength of such a relation.

Correlation

An alternative to solve the lack of interpretability on covariance is to divide the deviations by σ, which yields standard scores, and compute the product of standard scores.

Hence, after some basic maths, correlation can be written as follows:

Correlation Coefficient

The main advantage of correlation is that its value ranges from -1 to 1.

• -1 represents the strongest negative correlation
• 0 represents an absence of correlation
• 1 implies a perfect positive relation

Interpreting the Correlation Coefficient

• Exactly –1: A perfect downhill (negative) linear relationship
• –0.7: A strong downhill (negative) linear relationship
• –0.5: A moderate downhill (negative) relationship
• –0.3: A weak downhill (negative) linear relationship
• 0: No linear relationship
• +0.3: A weak uphill (positive) linear relationship
• +0.5: A moderate uphill (positive) relationship
• +0.7: A strong uphill (positive) linear relationship
• Exactly +1: A perfect uphill (positive) linear relationship

Enlarged version: http://bit.ly/2nvhdlV
[Optional] Correlation Does NOT Imply Causation
Watch this 3-minute video to learn more: https://www.youtube.com/watch?v=FJcUU0GXsms
Comparing Correclation & Covariance
As we have seen earlier, the main advantage of correlation is that it has a fixed range from -1 to +1 which helps us interpret and quantify the strength of the relation between the variables.

By contrast, covariance has no lower or upper limits.

Unlike the covariance, the value of the correlation isn’t affected by the units in which X and Y are measured. 

For example, suppose that a sample of tuna is chosen from the catch of two different fishing boats. The covariance between the weights of the tuna caught by the two boats is computed.

The value of the covariance is different if the weights are expressed in kilograms or in pounds; however, the correlation is the same whether weights are expressed in kilograms or pounds.

[Optional] Comparing Correlation & Covariance
Watch this 2-minute video to learn more: https://www.youtube.com/watch?v=eRlzmCrdTWw
Jim Rohn Sứ mệnh khởi nghiệp