.comment-link {margin-left:.6em;}

Thursday, September 22, 2005

There Are Statistics & Statistics Without Understanding

Bryan Caplan gets it right:
"R.J. Rummel's critique of a Cato study set off a big bloggers' debate about the value of think tanks. The following passage in Rummel's critique got my attention:

This correlation is meaningful for the kind of regression analysis Gartzke did, but he apparently doesn't know it. A problem in regression analysis is multicollinearity, which is to say moderate or high correlations among the independent variables. If two independent variables are highly correlated they are no longer statistically independent, and the first one entered into the regression, in this case economic freedom, steals that part of the correlation it has with democracy from the dependent variable. Thus, economic freedom is highly significant, while democracy is not. If Gartzke had done two bivariate regressions on his MID data, one with economic freedom and other with democracy as the independent variables, he surely would have found democracy highly significant.

This reminds me of one of the best few pages I've ever read in a textbook. The book: Arthur Goldberger's A Course in Econometrics. The subject: Multicollinearity and micronumerosity.

Goldberger's main point: People who use statistics often talk as if multicollinearity (high correlations between independent variables) biases results. But it doesn't. Multicollinearity leads to big standard errors, but if your independent variables are highly correlated, they SHOULD be big! Intuitively, big standard err"
In my view, when you do empirical estimations or when you evaluate the estimations of others, it is very important to understand what happens to your estimates if there are any number of ways in which your data do not meet the statistical assumptions on which your estimates are based. One of the assumptions of regression analysis is that your "explanatory" variables are independent. Often data on your explanatory variables are not statistically independent, and this is gives rise to the problem of multicollinearity. As Professor Caplan explains, the multicollinearity causes the estimated standard erros on the coefficients to be biased and too large. The result is that a hypothesis test for the significance of the estimated coefficient is biased as well. The estimated coefficient is itself unbiased, but the hypothesis test will tend to say that a statistically significant explanatory variable is not significant, when it is. I don't think it is accurate to suggest that one explanatory variable takes on a significance from another explanatory variable.

Further, long, long ago, I learned about statistics and econometrics and one thing I learned was not to say things like "highly significant." You pick your significance level for your hypothesis test, and either your reject the null hypothesis or you don't. The margin by which you make this decision has no meaning. One reason you would take the position that the margin has no meaning is because you know all the ways in which your hypothesis test could be biased, e.g. multicollinearity.

Well done!
[url=http://ucdlqlll.com/epkq/mkpj.html]My homepage[/url] | [url=http://aukukkxb.com/snuh/jlxr.html]Cool site[/url]
Nice site!
My homepage | Please visit
Thank you!
http://ucdlqlll.com/epkq/mkpj.html | http://gpcjjzfr.com/vuly/unrw.html
Post a Comment

Links to this post:

Create a Link

<< Home

This page is powered by Blogger. Isn't yours?