Alternative prior distributions for variable selection with very many more variables than observations
Griffin, Jim E. and Brown, Philip J., 1944- (2005) Alternative prior distributions for variable selection with very many more variables than observations. Working Paper. University of Warwick. Centre for Research in Statistical Methodology, Coventry.
WRAP_Griffin_05-10w.pdf - Published Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...
The problem of variable selection in regression and the generalised linear model is addressed. We adopt a Bayesian approach with priors for the regression coefficients that are scale mixtures of normal distributions and embody a high prior probability of proximity to zero. By seeking modal estimates we generalise the lasso. Properties of the priors and their resultant posteriors are explored in the context of the linear and generalised linear model especially when there are more variables than observations. We develop EM algorithms that embrace the need to explore the multiple modes of the non log-concave posterior distributions. Finally we apply the technique to microarray data using a probit model to find the genetic predictors of osteo- versus rheumatoid arthritis. Keywords: Bayesian modal analysis, Variable selection in regression, Scale mixtures of normals, Improper Jeffreys prior, lasso, Penalised likelihood, EMalgorithm, Multiple modes, More variables than observations, Singular value decomposition, Latent variables, Probit regression.
|Item Type:||Working or Discussion Paper (Working Paper)|
|Subjects:||Q Science > QA Mathematics|
|Divisions:||Faculty of Science > Statistics|
|Library of Congress Subject Headings (LCSH):||Regression analysis, Mixture distributions (Probability theory)|
|Series Name:||Working papers|
|Publisher:||University of Warwick. Centre for Research in Statistical Methodology|
|Place of Publication:||Coventry|
|Number of Pages:||34|
|Status:||Not Peer Reviewed|
|Access rights to Published version:||Open Access|
|Funder:||Commonwealth Scientific and Industrial Research Organization (Australia) (CSIRO)|
|References:||Abramowitz, M. and Stegun, I. A. (Eds.) (1964) “Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables,” Dover: New York. Akaike, H. (1974): “A new look at the statistical identification model,” IEEE Transactions on Automatic Control, 19, 716-723. Albert, J. and Chib, S. (1993): “Bayesian Analysis of Binary and Polychotomous Response Data,” Journal of the American Statistical Association, 88, 669-679. Bae, K. and Mallick, B. K. (2004): “Gene selection using two-level hierarchical Bayesian model,” Bioinformatics, 20, 3423-3430. Barndorff-Nielsen, O. E. and Blaesild, P. (1981): “Hyperbolic distributions and ramifications: contributions to the theory and applications,” in Statistical Distributions in Scientific Work, Vol. 4 C. Taillie, G. Patil and B. Baldessari (ed.): , Reidal : Dorderecht. Bernardo, J. M. and Smith, A. F. M. (1994): “Bayesian Theory,” Wiley : Chichester. Bibby, B. M. and Sorensen, M. (2003): “Hyperbolic Processes in Finance, in Handbook of Heavy Tailed Distributions in Finance S. Rachev (ed.): , Elsevier Science, 211-248. Box, G. E. P. and Tiao, G. C. (1973) “Bayesian Inference in Statistical Analysis,” Wiley: New York. Breiman, L.(1996): “Heuristics of instability and stabilization in model selection,” Annals of Statistics, 24, 2350-238 . Brown, P. J., Vannucci, M. and Fearn, T. (1998): “Multivariate Bayesian variable selection and prediction,” Journal of the Royal Statistical Society B, 60, 627-641. Brown, P. J., Vannucci, M. and Fearn, T. (2002): “Bayes model averaging with selection of regressors,” Journal of the Royal Statistical Society B, 64, 519-536. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977): “Maximum-likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society B, 39, 1-38. Fan, J. and Li, R.Z. (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, 96, 1348-1360. Fan, J. and Peng, H. (2004): “Nonconcave penalized likelihood with diverging number of parameters,” Annals of Statistics, 32, 928-961. Figueiredo, M. A. T. and Jain, A. K. (2001): “Bayesian learning of sparse classifiers,” Proceedings IEEE Computer Society Conference in Computer Vision and Pattern Recognition, Vol 1, 35-41. Figueiredo, M. A. T. (2003): “Adaptive sparseness for supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150-1159. Foster, D. P. and George, E. I. (1994): “The risk inflation criterion for multiple regression,” Annals of Statistics, 22, 1947-75. Gelfand, A. E. and Sahu, S. K. (1999): “Identifiability, improper priors, and Gibbs sampling for generalised linear models.” Journal of the American Statistical Association, 94, 247-253. George, E. I. and McCulloch, R. E. (1997): “Approaches for Bayesian variable selection,” Statistica Sinica 7, 339-373. Gradshteyn, I. S. and Ryzik, I. M. (1980) “Tables of Integrals, Series and Products: Corrected and Enlarged Edition,” (A. Jeffrey, Ed.) Academic Press: New York. Johnstone, I. M. and Silverman, B. W. (2005): “Empirical Bayes selection of wavelet thresholds,” Annals of Statistics, to appear. Kiiveri, H. (2003): “ A Bayesian approach to variable selection when the number of variables is very large,” In Goldstein, D.R. (Ed) “Science and Statistics: Festschrift for Terry Speed” Institute ofMathematical Statistics Lecture Notes-Monograph Series, Vol 40, 127-143. Knight, K. and Fu, W. (2000) “Asymptotics for lasso-type estimators”, Annals of Statistics, 28, 1356-1378. Liu, C. H., Rubin, D. B. and Wu, Y. N. (1998): “Parameter expansion to accelerate EM: The PX-EM algorithm,” Biometrika, 85, 755-770. Liu, J. S. (2001): “Monte Carlo Strategies in Scientific Computing,” Springer: New York. MacKay, D. J. C. (1994): “Bayesian methods for back-propagation networks,” In Domany, E. et al (Eds) “Models of Neural Networks III” Chapter 6, 211-254. McLachlan, G. J. and Peel, D. (2000): “Finite Mixture Models,” Wiley: New York. Mallick, B. K., Ghosh, D. and Ghosh, M. (2005): “Bayesian classification of tumours by using gene expression data,” Journal of the Royal Statistical Society B, 67, 219-234. Meng, X. L., van Dyk, D. A. (1997): “The EM algorithm – an old folk song sung to a fast new tune (with discussion),” Journal of the Royal Statistical Society B, 59, 511-567. Mitchell, T.J. and Beauchamp, J. J. (1988): “Bayesian variable selection in linear regression (with Discussion),”Journal of the American Statistical Association, 83, 1023-1036. Schwarz, G. (1978): “Estimating the dimension of a model,” Annals of Statistics, 6, 461- 464. Sha, N., Vannucci, M., Brown, P. J., Trower, M. K., Amphlett G., Falciani F. (2003): “Gene selection in arthritis classification with large-scale microarray expression profiles.” Comparative and Functional Genomics, 4, 171-181. Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society B, 58, 267-288. Tipping, M. E. and Faul, A. (2003): “Fast marginal likelihood maximisation for sparse Bayesian models,” In Frey, B. and Bishop, C. M. (Eds) Proceedings 9th International Workshop on Artificial Intelligence and Statistics, Key West, Florida. Ueda, N. and Nakano, R. (1995): “Deterministic annealing variants of EM,” In G. Tesauro, D. S. Tourestzky, T. K. Leen (Eds) Advances in Neural Information Processing Systems 7, 545-552, MIT Press. Vidakovic, B. (1998): “Wavelet-Based Nonparametric Bayes Methods,” in Practical Nonparametric and Semiparametric Bayesian Statistics D. Dey, P. Muller and D. Sinha (eds.):, New York : Springer-Verlag, 133-156. West, M. (2003): “Bayesian Factor regression models in the large p, small n paradigm,” In Bernardo J. M. et al (Eds), “Bayesian Statistics 7”, 733-742: Clarendon Press: Oxford. West, M. (1987): “On scale mixtures of normal distributions. Biometrika, 74, 646-648. Wolfe, P. J., Godsill, S. J. and Ng, W. J. (2004): “Bayesian variable selection and regularisation for time-frequency surface estimation,” Journal of the Royal Statistical Society, B , 66, 575-589. Zhang, S. and Jin, J. (1996): “Computation of Special Functions,” Wiley : New York.|
Actions (login required)