Yahoo Answers is shutting down on May 4th, 2021 (Eastern Time) and the Yahoo Answers website is now in read-only mode. There will be no changes to other Yahoo properties or services, or your Yahoo account. You can find more information about the Yahoo Answers shutdown and how to download your data on this help page.
How to select predictors for logistic regression?
I am using logistic regression in my assignments and the correlation matrix shows that only two of the four independent variables are significantly correlated with the outcome.
I am wondering if I have to enter only these two variables in the regression analysis or I should enter all the 4 variables??
- OPMLv 79 years agoFavorite Answer
The answer is theoretical only and also depends upon whether you are using frequentist or Bayesian statistics.
If you are data mining rather than applying theory, then the answer is "it doesn't matter." The reason is that there is no way to show anything is specifically true or false.
Now as to Frequentist versus Bayesian models.
The simplest solution in Bayesian statistics is to use the Bayes factor to compare the two models. If one model is a significant improvement over the other then you should choose that one. If not, then you should choose the theoretically most sound choice. Which model most follows economic theory. This is because Bayesian models test the probability parameters (or in this case models) are true given the data.
In Frequentist statistics the answer is not so simple. Frequentist models test the probability parameters are NOT zero or NOT at some specified value. Frequentist models test the probability the data would not look the way it does, given the parameters are not significant. That is they are zero.
This gives you a couple of choices. First, if the signs in front of the significant slopes and the intercept do not change from positive to negative, or vice versa when you remove the two non-significant variables, then it is okay to remove them.
Second, you could perform a likelihood ratio test to see if either model was "significant." If neither are, then there is no information and as long as the above rule is met it doesn't matter.
Third, you could do a Akaike Information Criterion test or a Bayesian Information Criterion test.
Finally, you could do all three.
The problem you run into with Frequentist tests is that "significance" does not, in and of itself, mean a realtionship exists and likewise the absence of significance does not in and of itself mean a relationship does not exist.