This is “The Linear Correlation Coefficient”, section 10.2 from the book Beginning Statistics (v. 1.0).
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. You may also download a PDF copy of this book (33 MB) or just this chapter (2 MB), suitable for printing or most e-readers, or a .zip file containing this book's HTML files (for use in a web browser offline).
Figure 10.3 "Linear Relationships of Varying Strengths" illustrates linear relationships between two variables x and y of varying strengths. It is visually apparent that in the situation in panel (a), x could serve as a useful predictor of y, it would be less useful in the situation illustrated in panel (b), and in the situation of panel (c) the linear relationship is so weak as to be practically nonexistent. The linear correlation coefficient is a number computed directly from the data that measures the strength of the linear relationship between the two variables x and y.
Figure 10.3 Linear Relationships of Varying Strengths
The linear correlation coefficientA number computed directly from the data that measures the strength of the linear relationship between the two variables x and y. for a collection of n pairs $\left(x,y\right)$ of numbers in a sample is the number r given by the formula
$$r=\frac{S{S}_{xy}}{\sqrt{S{S}_{xx}\xb7S{S}_{yy}}}$$where
$$S{S}_{xx}={\displaystyle \mathrm{\Sigma}{x}^{2}}-\frac{1}{n}{\left({\displaystyle \mathrm{\Sigma}x}\right)}^{2},\text{\hspace{1em}}S{S}_{xy}={\displaystyle \mathrm{\Sigma}xy}-\frac{1}{n}\left({\displaystyle \mathrm{\Sigma}x}\right)\left({\displaystyle \mathrm{\Sigma}y}\right),\text{\hspace{1em}}S{S}_{yy}={\displaystyle \mathrm{\Sigma}{y}^{2}}-\frac{1}{n}{\left({\displaystyle \mathrm{\Sigma}y}\right)}^{2}$$The linear correlation coefficient has the following properties, illustrated in Figure 10.4 "Linear Correlation Coefficient ":
Figure 10.4 Linear Correlation Coefficient R
Pay particular attention to panel (f) in Figure 10.4 "Linear Correlation Coefficient ". It shows a perfectly deterministic relationship between x and y, but $r=0$ because the relationship is not linear. (In this particular case the points lie on the top half of a circle.)
Compute the linear correlation coefficient for the height and weight pairs plotted in Figure 10.2 "Plot of Height and Weight Pairs".
Solution:
Even for small data sets like this one computations are too long to do completely by hand. In actual practice the data are entered into a calculator or computer and a statistics program is used. In order to clarify the meaning of the formulas we will display the data and related quantities in tabular form. For each $\left(x,y\right)$ pair we compute three numbers: x^{2}, $xy$, and y^{2}, as shown in the table provided. In the last line of the table we have the sum of the numbers in each column. Using them we compute:
x | y | x^{2} | $xy$ | y^{2} | |
---|---|---|---|---|---|
68 | 151 | 4624 | 10268 | 22801 | |
69 | 146 | 4761 | 10074 | 21316 | |
70 | 157 | 4900 | 10990 | 24649 | |
70 | 164 | 4900 | 11480 | 26896 | |
71 | 171 | 5041 | 12141 | 29241 | |
72 | 160 | 5184 | 11520 | 25600 | |
72 | 163 | 5184 | 11736 | 26569 | |
72 | 180 | 5184 | 12960 | 32400 | |
73 | 170 | 5329 | 12410 | 28900 | |
73 | 175 | 5329 | 12775 | 30625 | |
74 | 178 | 5476 | 13172 | 31684 | |
75 | 188 | 5625 | 14100 | 35344 | |
Σ | 859 | 2003 | 61537 | 143626 | 336025 |
so that
The number $r=0.868$ quantifies what is visually apparent from Figure 10.2 "Plot of Height and Weight Pairs": weights tends to increase linearly with height (r is positive) and although the relationship is not perfect, it is reasonably strong (r is near 1).
With the exception of the exercises at the end of Section 10.3 "Modelling Linear Relationships with Randomness Present", the first Basic exercise in each of the following sections through Section 10.7 "Estimation and Prediction" uses the data from the first exercise here, the second Basic exercise uses the data from the second exercise here, and so on, and similarly for the Application exercises. Save your computations done on these exercises so that you do not need to repeat them later.
For the sample data
$$\begin{array}{cccccc}x& 0& 1& 3& 5& 8\\ y& 2& 4& 6& 5& 9\end{array}$$For the sample data
$$\begin{array}{cccccc}x& 0& 2& 3& 6& 9\\ y& 0& 3& 3& 4& 8\end{array}$$For the sample data
$$\begin{array}{rrrrrr}x& \hfill 1& \hfill 3& \hfill 4& \hfill 6& \hfill 8\\ y& \hfill 4& \hfill 1& \hfill 3& \hfill \text{\u2212}1& \hfill 0\end{array}$$For the sample data
$$\begin{array}{rrrrrr}x& \hfill 1& \hfill 2& \hfill 4& \hfill 7& \hfill 9\\ y& \hfill 5& \hfill 5& \hfill 6& \hfill \text{\u2212}3& \hfill 0\end{array}$$For the sample data
$$\begin{array}{cccccc}x& 1& 1& 3& 4& 5\\ y& 2& 1& 5& 3& 4\end{array}$$For the sample data
$$\begin{array}{rrrrrr}x& \hfill 1& \hfill 3& \hfill 5& \hfill 5& \hfill 8\\ y& \hfill 5& \hfill \text{\u2212}2& \hfill 2& \hfill \text{\u2212}1& \hfill \text{\u2212}3\end{array}$$Compute the linear correlation coefficient for the sample data summarized by the following information:
$$\begin{array}{lll}n=5\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}x=25\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{x}^{2}=165\hfill \\ {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}y=24\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{y}^{2}=134\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}xy=144\hfill \\ \hfill & 1\le x\le 9\hfill & \hfill \end{array}$$Compute the linear correlation coefficient for the sample data summarized by the following information:
$$\begin{array}{lll}n=5\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}x=31\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{x}^{2}=253\hfill \\ {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}y=18\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{y}^{2}=90\hfill & {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}xy=148\hfill \\ \hfill & 2\le x\le 12\hfill & \hfill \end{array}$$Compute the linear correlation coefficient for the sample data summarized by the following information:
$$\begin{array}{ccc}n=10& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}x=0& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{x}^{2}=60\\ {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}y=24& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{y}^{2}=234& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}xy=\text{\u2212}87\\ & \text{\u2212}4\le x\le 4& \end{array}$$Compute the linear correlation coefficient for the sample data summarized by the following information:
$$\begin{array}{ccc}n=10& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}x=\text{\u2212}3& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{x}^{2}=263\\ {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}y=55& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}{y}^{2}=917& {{{\displaystyle \mathrm{\Sigma}}}^{\text{}}}^{\text{}}xy=\text{\u2212}355\\ & \text{\u2212}10\le x\le 10& \end{array}$$The age x in months and vocabulary y were measured for six children, with the results shown in the table.
$$\begin{array}{ccccccc}x& 13& 14& 15& 16& 16& 18\\ y& 8& 10& 15& 20& 27& 30\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The curb weight x in hundreds of pounds and braking distance y in feet, at 50 miles per hour on dry pavement, were measured for five vehicles, with the results shown in the table.
$$\begin{array}{cccccc}x& 25& 27.5& 32.5& 35& 45\\ y& 105& 125& 140& 140& 150\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The age x and resting heart rate y were measured for ten men, with the results shown in the table.
$$\begin{array}{cccccc}x& 20& 23& 30& 37& 35\\ y& 72& 71& 73& 74& 74\end{array}$$ $$\begin{array}{cccccc}x& 45& 51& 55& 60& 63\\ y& 73& 72& 79& 75& 77\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The wind speed x in miles per hour and wave height y in feet were measured under various conditions on an enclosed deep water sea, with the results shown in the table,
$$\begin{array}{cccccc}x& 0& 0& 2& 7& 7\\ y& 2.0& 0.0& 0.3& 0.7& 3.3\end{array}$$ $$\begin{array}{cccccc}x& 9& 13& 20& 22& 31\\ y& 4.9& 4.9& 3.0& 6.9& 5.9\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The advertising expenditure x and sales y in thousands of dollars for a small retail business in its first eight years in operation are shown in the table.
$$\begin{array}{ccccc}x& 1.4& 1.6& 1.6& 2.0\\ y& 180& 184& 190& 220\end{array}$$ $$\begin{array}{ccccc}x& 2.0& 2.2& 2.4& 2.6\\ y& 186& 215& 205& 240\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The height x at age 2 and y at age 20, both in inches, for ten women are tabulated in the table.
$$\begin{array}{cccccc}x& 31.3& 31.7& 32.5& 33.5& 34.4\\ y& 60.7& 61.0& 63.1& 64.2& 65.9\end{array}$$ $$\begin{array}{cccccc}x& 35.2& 35.8& 32.7& 33.6& 34.8\\ y& 68.2& 67.6& 62.3& 64.9& 66.8\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The course average x just before a final exam and the score y on the final exam were recorded for 15 randomly selected students in a large physics class, with the results shown in the table.
$$\begin{array}{cccccc}x& 69.3& 87.7& 50.5& 51.9& 82.7\\ y& 56& 89& 55& 49& 61\end{array}$$ $$\begin{array}{cccccc}x& 70.5& 72.4& 91.7& 83.3& 86.5\\ y& 66& 72& 83& 73& 82\end{array}$$ $$\begin{array}{cccccc}x& 79.3& 78.5& 75.7& 52.3& 62.2\\ y& 92& 80& 64& 18& 76\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The table shows the acres x of corn planted and acres y of corn harvested, in millions of acres, in a particular country in ten successive years.
$$\begin{array}{cccccc}x& 75.7& 78.9& 78.6& 80.9& 81.8\\ y& 68.8& 69.3& 70.9& 73.6& 75.1\end{array}$$ $$\begin{array}{cccccc}x& 78.3& 93.5& 85.9& 86.4& 88.2\\ y& 70.6& 86.5& 78.6& 79.5& 81.4\end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Fifty male subjects drank a measured amount x (in ounces) of a medication and the concentration y (in percent) in their blood of the active ingredient was measured 30 minutes later. The sample data are summarized by the following information.
$$\begin{array}{lll}n=50\hfill & {\mathrm{\Sigma}}^{\text{}}x=112.5\hfill & {\mathrm{\Sigma}}^{\text{}}y=4.83\hfill \\ \hfill & \mathrm{\Sigma}xy=15.255\hfill & 0\le x\le 4.5\hfill \\ \hfill & \mathrm{\Sigma}{x}^{2}=356.25\hfill & \mathrm{\Sigma}{y}^{2}=0.667\hfill \end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
In an effort to produce a formula for estimating the age of large free-standing oak trees non-invasively, the girth x (in inches) five feet off the ground of 15 such trees of known age y (in years) was measured. The sample data are summarized by the following information.
$$\begin{array}{lll}n=15\hfill & {\mathrm{\Sigma}}^{\text{}}x=3368\hfill & {\mathrm{\Sigma}}^{\text{}}y=6496\hfill \\ \hfill & {\mathrm{\Sigma}}^{\text{}}xy=\mathrm{1,933,219}\hfill & {\mathrm{\Sigma}}^{\text{}}{x}^{2}=\mathrm{917,780}\hfill \\ \hfill & {\mathrm{\Sigma}}^{\text{}}{y}^{2}=\mathrm{4,260,666}\hfill & 74\le x\le 395\hfill \end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Construction standards specify the strength of concrete 28 days after it is poured. For 30 samples of various types of concrete the strength x after 3 days and the strength y after 28 days (both in hundreds of pounds per square inch) were measured. The sample data are summarized by the following information.
$$\begin{array}{lll}n=30\hfill & {\mathrm{\Sigma}}^{\text{}}x=501.6\hfill & {\mathrm{\Sigma}}^{\text{}}y=1338.8\hfill \\ \hfill & {\mathrm{\Sigma}}^{\text{}}xy=\mathrm{23,246.55}\hfill & {\mathrm{\Sigma}}^{\text{}}{x}^{2}=8724.74\hfill \\ \hfill & {\mathrm{\Sigma}}^{\text{}}{y}^{2}=\mathrm{61,980.14}\hfill & 11\le x\le 22\hfill \end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Power-generating facilities used forecasts of temperature to forecast energy demand. The average temperature x (degrees Fahrenheit) and the day’s energy demand y (million watt-hours) were recorded on 40 randomly selected winter days in the region served by a power company. The sample data are summarized by the following information.
$$\begin{array}{lll}n=40\hfill & {\mathrm{\Sigma}}^{\text{}}x=2000\hfill & {\mathrm{\Sigma}}^{\text{}}y=2969\hfill \\ \hfill & {\mathrm{\Sigma}}^{\text{}}xy=\mathrm{143,042}\hfill & {\mathrm{\Sigma}}^{\text{}}{x}^{2}=\mathrm{101,340}\hfill \\ \hfill & {\mathrm{\Sigma}}^{\text{}}{y}^{2}=\mathrm{243,027}\hfill & 40\le x\le 60\hfill \end{array}$$Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
In each case state whether you expect the two variables x and y indicated to have positive, negative, or zero correlation.
In each case state whether you expect the two variables x and y indicated to have positive, negative, or zero correlation.
Changing the units of measurement on two variables x and y should not change the linear correlation coefficient. Moreover, most change of units amount to simply multiplying one unit by the other (for example, 1 foot = 12 inches). Multiply each x value in the table in Exercise 1 by two and compute the linear correlation coefficient for the new data set. Compare the new value of r to the one for the original data.
Refer to the previous exercise. Multiply each x value in the table in Exercise 2 by two, multiply each y value by three, and compute the linear correlation coefficient for the new data set. Compare the new value of r to the one for the original data.
Reversing the roles of x and y in the data set of Exercise 1 produces the data set
$$\begin{array}{cccccc}x& 2& 4& 6& 5& 9\\ y& 0& 1& 3& 5& 8\end{array}$$Compute the linear correlation coefficient of the new set of data and compare it to what you got in Exercise 1.
In the context of the previous problem, look at the formula for r and see if you can tell why what you observed there must be true for every data set.
Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the first large data set problem for Section 10.1 "Linear Relationships Between Variables".
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the second large data set problem for Section 10.1 "Linear Relationships Between Variables".
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls
Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the third large data set problem for Section 10.1 "Linear Relationships Between Variables".
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
$r=0.921$
$r=\text{\u2212}0.794$
$r=0.707$
0.875
−0.846
0.948
0.709
0.832
0.751
0.965
0.992
same value
same value
$r=0.4601$
$r=0.9002$