Relationships Between Variables
– learn how to interpret the relationship between two variables
– understand the concepts of correlation and covariance
– find out which measure of association to use in different situations
Are two variables related and how?
This is a question that arises in business quite often. It seems reasonable that the number of sales of a product is positively related to advertising expenditures.
But did this really happen in the last campaign? How strong was this relation? Can we quantify it? Were sales more related to the budget spent on TV or Newspapers?
These questions are not easy to answer and the information can definitely not be obtained just from a bivariate scatter plot.
Hence, we need measures of association that help us answer these questions.
Covariance and correlation are the two most common measures of association between numerical variables.
Hence, a positive correlation or covariance means that both variables increase and decrease together. This can happen in marketing campaigns. When the expenditure in advertising increases, the number of sales also increases.
By contrast, a negative measure of correlation or covariance tells us that when one variable increases the other decreases.
For example, in the car industry, horsepower and miles per gallon (mpg) are usually negatively related, as more powerful cars tend to consume more fuel.
Finally, if two variables are not related, both measures will give a value close to zero.
Covariance is a measure of the tendency of two variables to vary together.
The formula to compute covariance is as follows:
x̅ = the sample mean for X
ȳ = the sample mean for Y
n = the number of elements in both samples.
Xi = a single element in the sample for X
Yi = a single element in the sample for Y
The main problem with covariance is that it is a measure that’s difficult to interpret (units are products of units!) and can take any positive or negative number.
So it is easy to know whether two variables are positively or negatively related but very difficult to quantify the strength of such a relation.
An alternative to solve the lack of interpretability on covariance is to divide the deviations by σ, which yields standard scores, and compute the product of standard scores.
Hence, after some basic maths, correlation can be written as follows:
The main advantage of correlation is that its value ranges from -1 to 1.
• -1 represents the strongest negative correlation
• 0 represents an absence of correlation
• 1 implies a perfect positive relation
• Exactly –1: A perfect downhill (negative) linear relationship
• –0.7: A strong downhill (negative) linear relationship
• –0.5: A moderate downhill (negative) relationship
• –0.3: A weak downhill (negative) linear relationship
• 0: No linear relationship
• +0.3: A weak uphill (positive) linear relationship
• +0.5: A moderate uphill (positive) relationship
• +0.7: A strong uphill (positive) linear relationship
• Exactly +1: A perfect uphill (positive) linear relationship
By contrast, covariance has no lower or upper limits.
For example, suppose that a sample of tuna is chosen from the catch of two different fishing boats. The covariance between the weights of the tuna caught by the two boats is computed.
The value of the covariance is different if the weights are expressed in kilograms or in pounds; however, the correlation is the same whether weights are expressed in kilograms or pounds.