In statistics, correlation (often measured as a correlation coefficient) indicates the strength and direction of a linear relationship between two random variables.

The data can be represented by the ordered pairs (x, y) where x is the independent, or explanatory, variable and y is the dependent, or response, variable.

A scatter plot can be used to determine whether a linear (straight line) correlation exists between two variables. In a scatter plot, the ordered pairs (x, y) are graphed as points in a coordinate plane.

The following scatter plots show several types of correlations.

+

+

Interpreting correlation using a scatter plot can be subjective. A more precise way to measure the type and strength of a linear correlation between two variables is to calculate the correlation coefficient.

The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The formula for r is

where n is the number of pairs of data.

The population correlation coefficient is represented by ρ (the lowercase Greek letter rho).

The range of the correlation coefficient is –1 to 1.

If x and y have a strong positive linear correlation, r is close to 1.

If x and y have a strong negative linear correlation, r is close to –1.

If there is no linear correlation or a weak linear correlation, r is close to 0.

+

+

Example. Calculate the correlation coefficient for the advertising expenditures and company sales data. What can you conclude?

Advertising expenses (1000s of $), x |
Company sales (1000s of $), y |

2.4 |
225 |

1.6 |
184 |

2.0 |
220 |

2.6 |
240 |

1.4 |
180 |

1.6 |
184 |

2.0 |
186 |

2.2 |
215 |

.

Let’s look at the scatter plot.

.

Guidelines.

1. Find the sum of the x values.

2. Find the sum of the y values

3. Multiply each x value by its corresponding y value and find the sum.

4. Square each x value and find the sum.

5. Square each y value and find the sum.

6. Use these five sums to calculate the correlation coefficient.

.

Advertising expenses (1000s of $), x |
Company sales (1000s of $), y |
xy |
x |
y |

2.4 |
225 |
540 |
5.76 |
50625 |

1.6 |
184 |
294.4 |
2.56 |
33856 |

2.0 |
220 |
440 |
4 |
48400 |

2.6 |
240 |
624 |
6.76 |
57600 |

1.4 |
180 |
252 |
1.96 |
32400 |

1.6 |
184 |
294.4 |
2.56 |
33856 |

2.0 |
186 |
372 |
4 |
34596 |

2.2 |
215 |
473 |
4.84 |
46225 |

∑x = 15.8 |
∑y = 1634 |
∑xy = 3289.8 |
∑x |
∑y |

.

Using these sums and n = 8, the correlation coefficient is

Because r is close to 1, there is a strong positive linear correlation. As the amount of spending on advertising increases, the company sales also increase.

.

.

Find the sums.

These are your 5 sums. Use these and n = 8 to determine the correlation coefficient.

Because r is close to 1, there is a strong positive linear correlation. As the amount of spending on advertising increases, the company sales also increase.

.

If your calculator is like mine, you needed to turn on a feature. I didn’t know until I saw that r was missing. r should be below the a and b.

Here’s what we need to do. Turn on DiagnosticOn. DiagnosticOn is within our CATALOG. Now we have the ability to find r using our lists.

Because r is close to 1, there is a strong positive linear correlation. As the amount of spending on advertising increases, the company sales also increase.

.

.

Example. Calculate the correlation coefficient for the income level and donating percent data. What can you conclude?

Income level (in 1000s of $), x |
Donating percent, y |

42 |
9 |

48 |
10 |

50 |
8 |

59 |
5 |

65 |
6 |

72 |
3 |

.

Find the five sums and use the correlation coefficient formula.

.

Using these sums and n = 6, the correlation coefficient is

.

Use the Linear Regression feature to check. Nice work.

Because r is close to –1, there is a strong negative linear correlation. As the level of income rises, the percentage of donating decreases.