协方差与相关性
http://ci.columbia.edu/ci/premba_test/c0331/s7/s7_5.html
Covariance and correlation describe how two variables are related.
- Variables are positively related if they move in the same direction.
- Variables are inversely related if they move in opposite directions.
Both covariance and correlation indicate whether variables are positively or inversely related. Correlation also tells you the degree to which the variables tend to move together.
You are probably already familiar with statements about covariance and correlation that appear in the news almost daily. For example, you might hear that as economic growth increases, stock market returns tend to increase as well. These variables are said to be positively related because they move in the same direction. You may also hear that as world oil production increases, gasoline prices fall. These variables are said to be negatively, or inversely, related because they move in opposite directions.
The relationship between two variables can be illustrated in a graph. In the examples below, the graph on the left illustrates how the positive relationship between economic growth and market returns might appear. The graph indicates that as economic growth increases, stock market returns also increase. The graph on the right is an example of how the inverse relationship between oil production and gasoline prices might appear. It illustrates that as oil production increases, gas prices fall.
To determine the actual relationships of these variables, you would use the formulas for covariance and correlation.
Covariance 协方差
- convariance 协方差
Covariance indicates how two variables are related. A positive covariance means the variables are positively related, while a negative covariance means the variables are inversely related. The formula for calculating covariance of sample data is shown below.
$$cov{x,y}=\frac{\sum{i=1}^{N}(x{i}-\bar{x})(y{i}-\bar{y})}{N-1}$$
- $$x$$ = the independent variable
- $$y$$ = the dependent variable
- $$n$$ = number of data points in the sample
- $$\bar{x}$$ = the mean of the independent variable $$x$$
- $$\bar{y}$$ = the mean of the dependent variable $$y$$
To understand how covariance is used, consider the table below, which describes the rate of economic growth ($$x_i$$) and the rate of return on the S&P 500 ($$y_i$$).
Using the covariance formula, you can determine whether economic growth and S&P 500 returns have a positive or inverse relationship. Before you compute the covariance, calculate the mean of $$x$$ and $$y$$. (The Summary Measures topic of the Discrete Probability Distributions section explains the mean formula in detail.)
$$\hat{X}=\frac{\sum{i=1}^{N}x{i}}{n}$$
$$\hat{X}=\frac{2.1+2.5+4.0+3.6}{4} = 3.1$$
$$\hat{Y}=\frac{\sum{i=1}^{N}y{i}}{n}$$
$$\hat{Y}=\frac{8+12+14+10}{4} = 11$$
Now you can identify the variables for the covariance formula as follows.
- x = $$2.1, 2.5, 4.0$$, and $$3.6$$ (economic growth)
- y = $$8, 12, 14$$, and $$10$$ (S&P 500 returns)
- $$\bar{x}$$ = $$3.1$$
- $$\bar{y}$$ = $$11$$
Substitute these values into the covariance formula to determine the relationship between economic growth and S&P 500 returns.
$$cov{x,y}=\frac{\sum{i=1}^{N}(x{i}-\bar{x})(y{i}-\bar{y})}{N-1} =\frac{(2.1-3.1)(8-11)+\cdots}{4-1}=\frac{3+(-0.6)+2.7+(-0.5)}{3}=1.53$$ The covariance between the returns of the S&P 500 and economic growth is 1.53. Since the covariance is positive, the variables are positively related—they move together in the same direction.
Correlation 相关系数
Correlation is another way to determine how two variables are related. In addition to telling you whether variables are positively or inversely related, correlation also tells you the degree to which the variables tend to move together.
As stated above, covariance measures variables that have different units of measurement. Using covariance, you could determine whether units were increasing or decreasing, but it was impossible to measure the degree to which the variables moved together because covariance does not use one standard unit of measurement. To measure the degree to which variables move together, you must use correlation.
Correlation standardizes the measure of interdependence between two variables and, consequently, tells you how closely the two variables move. The correlation measurement, called a correlation coefficient, will always take on a value between $$1$$ and $$–1$$:
- If the correlation coefficient is one, the variables have a perfect positive correlation. This means that if one variable moves a given amount, the second moves proportionally in the same direction. A positive correlation coefficient less than one indicates a less than perfect positive correlation, with the strength of the correlation growing as the number approaches one.
- If correlation coefficient is zero, no relationship exists between the variables. If one variable moves, you can make no predictions about the movement of the other variable; they are uncorrelated.
- If correlation coefficient is –1, the variables are perfectly negatively correlated (or inversely correlated) and move in opposition to each other. If one variable increases, the other variable decreases proportionally. A negative correlation coefficient greater than –1 indicates a less than perfect negative correlation, with the strength of the correlation growing as the number approaches –1.
To calculate the correlation coefficient for two variables, you would use the correlation formula, shown below.
$$r_(x,y) = \frac{cov(x,y)}{S_xS_y}$$
- $$r(x,y)$$ = correlation of the variables $$x$$ and $$y$$
- $$cov(x,y)$$ = covariance of the variables $$x$$ and $$y$$
- $$s_x$$ = sample standard deviation of the random variable $$x$$
- $$s_y$$ = sample standard deviation of the random variable $$y$$
Earlier in this discussion, you saw how the covariance of S&P 500 returns and economic growth was calculated using data from the following table. Now consider how their correlation is measured.
Economic Growth % ($$x_i$$) | S&P 500 Returns % ($$y_i$$) |
---|---|
2.1 | 8 |
2.5 | 12 |
4.0 | 14 |
3.6 | 10 |
To calculate correlation, you must know the covariance for the two variables and the standard deviations of each variable. From the earlier example, you know that the covariance of S&P 500 returns and economic growth was calculated to be 1.53. Now you need to determine the standard deviation of each of the variables. You would calculate the standard deviation of the S&P 500 returns and the economic growth from the above example as follows. (For a more detailed explanation of calculating standard deviation, refer to the Summary Measures topic of the Discrete Probability Distributions section of the course.)
$$Sx = \sqrt{\frac{\sum{i=1}^n(X_i-\bar{X})^2}{n-1}}$$
$$S_x=\sqrt{\frac{(2.1-3.1)^2+(2.5-3.1)^2+\cdots}{4-1}} = \sqrt{\frac{1+0.36+0.81+0.25}{3}}=0.9$$
$$S_y=\sqrt{\frac{(8-11)^2+(12-11)^2+\cdots}{4-1}} = \sqrt{\frac{9+1+9+1}{3}}=2.58$$
Using the information from above, you know that
- $$cov(x,y)=1.53$$
- $$S_x = 0.90$$
- $$S_y=2.58$$
Now you can calculate the correlation coefficient by substituting the numbers above into the correlation formula, as shown below.
$$r_(x,y) = \frac{cov(x,y)}{S_xS_y}=\frac{1.53}{0.90*2.58} = 0.66$$
A correlation coefficient of .66 tells you two important things:
- Because the correlation coefficient is a positive number, returns on the S&P 500 and economic growth are postively related.
- Because .66 is relatively far from indicating no correlation, the strength of the correlation between returns on the S&P 500 and economic growth is strong.
Both covariance and correlation identified that the variables are positively related. By standardizing measures, correlation is also able to measure the degree to which the variables tend to move together.
In business, covariance and correlation are used frequently to analyze market returns for anything from an individual stock to a market composite. In addition, marketing executives use covariance and correlation to understand the interdependence between consumer behavior and the consumption of their products.
- If there is a positive relationship between the scores of job incumbents on a job knowledge test and actual job performance, which of the following graphs would most likely be an accurate representation of this situation? Solution 1
- In each of the graphs, are job performance and test performance shown to be positively related, inversely related, or unrelated?
- Given the following return information, what is the covariance between the return of Stock A and the return of the market index?
- Given the following return information, what is the covariance between the return of Stock A and the return of the market index?
- Using the table and your calculations from above, calculate the correlation of Stock A's returns and the return of the market index.