Multicollinearity and Imprecise Estimation
containing several explanatory variables, the
precision of estimation of linear parametric functions is analysed in terms
of latent roots and vectors of X'X, where Xis the matrix of values of explanatory
variables. This analysis provides a practical method for detecting
multicollinearity, and it is demonstrated that it is also useful in solving
problems of optimum choice of new values of explanatory variables.
used in econometrics to denote the presence of linear
relationships or "near linear relationships" among explanatory (independent, concomitant)
variables in linear regression; see, for instance, Johnston (1963), Malinvaud
(1966). The problems created by this state of affairs are well known and are illustrated
clearly by a very simple example. Consider the single-variable linear regression model
with the usual assumptions-known xi's, unknown a and ,f3, and uncorrelated errors
ei with common variance. There are two "explanatory" variables here, the constant
"variable" which is identically 1 and the variable x. The values of these two
explanatory variables occurring in the model (1.1) are linearly related if and only if
x, = x, = . . . = x,, the common value being x,, say. A scatter diagram makes it quite
clear that when this is the case and x,#O, there is no hope of estimating a and ,8
separately. On the other hand, ai-px, can be estimated perfectly well-by
in fact.
Referring still to this simple example, it is clear that if the xi's are not all equal but
are very closely grouped around their mean 2, then a and ,8can be estimated separately,
but estimates of both of them will be very imprecise (unless 2 is near zero, in
which case a, but not 16, may be fairly precisely estimated). It remains true, of course,
that a + ,f32 can be estimated with relative precision. |