Bayesian adaptive lassos with non-convex penalization
Griffin, Jim E. and Brown, Philip J., 1944- (2007) Bayesian adaptive lassos with non-convex penalization. Working Paper. Coventry: University of Warwick. Centre for Research in Statistical Methodology. (Working papers).
WRAP_Griffin_07-2wv2.pdf - Published Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...
The lasso (Tibshirani,1996) has sparked interest in the use of penalization of the log-likelihood for variable selection, as well as shrinkage. Recently, there have been attempts to propose penalty functions which improve upon the Lassos properties for variable selection and prediction, such as SCAD (Fan and Li, 2001) and the Adaptive Lasso (Zou, 2006). We adopt the Bayesian interpretation of the Lasso as the maximum a posteriori (MAP) estimate of the regression coefficients, which have been given independent, double exponential prior distributions. Generalizing this prior provides a family of adaptive lasso penalty functions, which includes the quasi-cauchy distribution (Johnstone and Silverman, 2005) as a special case. The properties of this approach are explored. We are particularly interested in the more variables than observations case of characteristic importance for data arising in chemometrics, genomics and proteomics - to name but three. Our methodology can give rise to multiple modes of the posterior distribution and we show how this may occur even with the convex lasso. These multiple modes do no more than reflect the indeterminacy of the model. We give fast algorithms and suggest a strategy of using a set of perfectly fitting random starting values to explore different regions of the parameter space with substantial posterior support. Simulations show that our procedure provides significant improvements on a range of established procedures and we provide an example from chemometrics.
|Item Type:||Working or Discussion Paper (Working Paper)|
|Subjects:||Q Science > QA Mathematics|
|Divisions:||Faculty of Science > Statistics|
|Library of Congress Subject Headings (LCSH):||Regression analysis|
|Series Name:||Working papers|
|Publisher:||University of Warwick. Centre for Research in Statistical Methodology|
|Place of Publication:||Coventry|
|Number of Pages:||30|
|Status:||Not Peer Reviewed|
|Access rights to Published version:||Open Access|
|References:||Abramowitz, M. and Stegun, I. A. (Eds.) (1964) “Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables,” Dover: New York. Bae, K. and Mallick, B. K. (2004): “Gene selection using two-level hierarchical Bayesian model,” Bioinformatics, 20, 3423-3430. Berger, J. O. (1985): “Statistical Decision Theory and Bayesian Analysis,” Berlin: Springer. Bernardo, J. M. and Smith, A. F. M. (1994): “Bayesian Theory,” Wiley : Chichester. Bibby, B. M. and Sorensen, M. (2003): “Hyperbolic Processes in Finance, in Handbook of Heavy Tailed Distributions in Finance S. Rachev (ed.): , Elsevier Science, 211-248. Breiman, L.(1996): “Heuristics of instability and stabilization in model selection,” Annals of Statistics, 24, 2350-238 . Brown, P. J., Vannucci, M. and Fearn, T. (1998): “Multivariate Bayesian variable selection and prediction,” Journal of the Royal Statistical Society B, 60, 627- 641. Brown, P. J., Fearn, T. and Vannucci, M. (2001): “Bayesian wavelet regression on curves with application to a spectroscopic calibration problem,” Journal of the American Statistical Association, 96, 398-408. Brown, P. J., Vannucci, M. and Fearn, T. (2002): “Bayes model averaging with selection of regressors,” Journal of the Royal Statistical Society B, 64, 519- 536. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977): “Maximum-likelihood from incomplete data via the EMalgorithm,” Journal of the Royal Statistical Society B, 39, 1-38. Fan, J. and Li, R.Z. (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, 96, 1348-1360. Figueiredo, M. A. T. and Jain, A. K. (2001): “Bayesian learning of sparse classifiers,” Proceedings IEEE Computer Society Conference in Computer Vision and Pattern Recognition, Vol 1, 35-41. Figueiredo, M. A. T. (2003): “Adaptive sparseness for supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150- 1159. George, E. I. and McCulloch, R. E. (1997): “Approaches for Bayesian variable selection,” Statistica Sinica 7, 339-373. Gradshteyn, I. S. and Ryzik, I. M. (1980) “Tables of Integrals, Series and Products: Corrected and Enlarged Edition,” (A. Jeffrey, Ed.) Academic Press: New York. Jeffreys, H. (1939/1961) “Theory of Probability”, 3rd Edition 1961, Oxford: Clarendon Press Johnstone, I. M. and Silverman, B. W. (2005): “Empirical Bayes selection of wavelet thresholds,” Annals of Statistics, 33, 1700-1752. Kiiveri, H. (2003): “ A Bayesian approach to variable selection when the number of variables is very large,” In Goldstein, D.R. (Ed) “Science and Statistics: Festschrift for Terry Speed” Institute ofMathematical Statistics Lecture Notes-Monograph Series, Vol 40, 127-143. Li, B. and Goel, P. K. (2006): “Regularized optimization in statistical learning: A Bayesian perspective,” Statistica Sinica, 16, 411-424. Mallick, B. K., Ghosh, D. and Ghosh, M. (2005): “Bayesian classification of tumours by using gene expression data,” Journal of the Royal Statistical Society B, 67, 219-234. Meinshausen, N. and B¨uhlmann, P. (2006) “High dimensional graphs and variable selection with the lasso”, Annals of Statistics, 34, 1436-1462. Meng, X. L., van Dyk, D. A. (1997): “The EMalgorithm – an old folk song sung to a fast new tune (with discussion),” Journal of the Royal Statistical Society B, 59, 511-567. Mitchell, T.J. and Beauchamp, J. J. (1988): “Bayesian variable selection in linear regression (with Discussion),”Journal of the American Statistical Association, 83, 1023-1036. Osborne, B. G., Fearn, T., Miller, A. R. & Douglas, S. (1984):“Application of near infrared reflectance spectroscopy to compositional analysis of biscuits and biscuit doughs,”J. Sci. Food Agric., 35, 99-105. Osborne, M. R., Presnell, B. and Turlach, B. A. (1998): “Knot selection for regression splines via the LASSO,” in Dimension Reduction, Computational Complexity, and Information, Proceedings of the 30’th Symposium on the Interface, Interface 98 (Editor S. Weisberg), Interface Foundation of North America, 44-49. Rosset, S., Zhu, J. and Hastie, T. (2004) “Boosting as a Regularized Path to a Maximum Margin Classifier”, Journal of Machine Learning Research, 5, 941-973. ter Braak, C. J. F. (2006) “Bayesian sigmoid shrinkage with improper variance priors and an application to wavelet denoising”, Computational Statistics and Data Analysis, 51, 1232-1242. Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society B, 58, 267-288. Vidakovic, B. (1998): “Wavelet-Based Nonparametric Bayes Methods,” in Practical Nonparametric and Semiparametric Bayesian Statistics D. Dey, P.Muller and D. Sinha (eds.):, New York : Springer-Verlag, 133-156. West, M. (2003): “Bayesian Factor regression models in the large p, small n paradigm,” In Bernardo J. M. et al (Eds), “Bayesian Statistics 7”, 733-742: Clarendon Press: Oxford. West, M. (1987): “On scale mixtures of normal distributions,” Biometrika, 74, 646-648. Zhang, S. and Jin, J. (1996): “Computation of Special Functions,” Wiley : New York. Zou, H. (2006) “The adaptive lasso and its oracle properties”,Journal of the American Statistical Association, 101, 1418-1429. Zou, H. and Hastie, T. (2005) “Regularization and variable selection via the elastic net”, Journal of the Royal Statistical Society, B, 67, 301-320.|
Actions (login required)