The Library
Cross-validation prior choice in Bayesian probit regression with many covariates
Tools
Lamnisos, Demetris, Griffin, Jim E. and Steel, Mark F. J.. (2012) Cross-validation prior choice in Bayesian probit regression with many covariates. Statistics and Computing, Vol.22 (No.2). pp. 359-373. ISSN 0960-3174
|
PDF
WRAP_Steel_150911-cvpriorchoice_rev.pdf - Accepted Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader Download (542Kb) |
Official URL: http://dx.doi.org/10.1007/s11222-011-9228-1
Abstract
This paper examines prior choice in probit regression through a predictive cross-validation criterion. In particular, we focus on situations where the number of potential covariates is far larger than the number of observations, such as in gene expression data. Cross-validation avoids the tendency of such models to fit perfectly. We choose the scale parameter c in the standard variable selection prior as the minimizer of the log predictive score. Naive evaluation of the log predictive score requires substantial computational effort, and we investigate computationally cheaper methods using importance sampling. We find that K−fold importance densities perform best, in combination with either mixing over different values of c or with integrating over c through an auxiliary distribution.
| Item Type: | Journal Article |
|---|---|
| Subjects: | Q Science > QA Mathematics |
| Divisions: | Faculty of Science > Statistics |
| Library of Congress Subject Headings (LCSH): | Probits, Bayesian statistical decision theory, Regression analysis, Gene expression -- Data processing |
| Journal or Publication Title: | Statistics and Computing |
| Publisher: | Springer |
| ISSN: | 0960-3174 |
| Date: | March 2012 |
| Volume: | Vol.22 |
| Number: | No.2 |
| Page Range: | pp. 359-373 |
| Identification Number: | 10.1007/s11222-011-9228-1 |
| Status: | Peer Reviewed |
| Publication Status: | Published |
| Access rights to Published version: | Restricted or Subscription Access |
| References: | Alon, U., N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America 96, 6745–6750. Brown, P. J. and M. Vannucci (1998). Multivariate Bayesian variable selection and prediction. Journal of the Royal Statistical Society 60 (3), 627–641. Celeux, G., J.-M. Marin, and C. P. Robert (2006). S´election bay´esienne de variables en r´egression lin´eaire. Journal de la Soci´et´e Fran¸caise de Statistique 147, 59–79. Cui, W. and E. I. George (2008). Empirical Bayes vs. Fully Bayes variable selection. Journal of Statistical Planning and Inference 138, 888–900. Denison, D. G. T., C. C. Holmes, B. K.Mallick, and A. F.M. Smith (2002). Bayesian Methods for Nonlinear Classification and Regression. John Wiley and Sons. Dobra, A. (2009). Variable selection and dependency networks for genomewide data. Biostatistics 10, 621–639. Fern´andez, C., E. Ley, and M. F. J. Steel (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics 100, 381–427. Geisser, S. and W. F. Eddy (1979). A predictive approach to model selection. Journal of American Statistical Association 74, 153–160. Gelfand, A. E. and D. K. Dey (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society, Series B 56, 501–514. Gelfand, A. E., D. K. Dey, and H. Chang (1992). Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Statistics 4, 147–167. George, E. I. and D. P. Foster (2000). Calibration and empirical Bayes variable selection. Biometrika 87 (4), 731–747. Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in MCMC. Technical Report 568, University of Minnesota, School of Statistics. Gneiting, T. and A. E. Raftery (2007). Strictly proper scoring rules, prediction and estimation. Journal of the American Statistical Association 102, 359–378. Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society, Series B 14 (1), 107–114. Hastie, T., R. Tibshirani, and J. H. Friedman (2001). The Elements of Statistical Learning. Springer series in statistics, New York. Holmes, C. C. and L. Held (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis 1 (1), 145–168. Key, J., L. Pericchi, and A. F. M. Smith (1999). Bayesian model choice: what and why? In J. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (Eds.), Bayesian Statistics Volume 6, pp. 343–370. Oxford: Oxford University Press. Lee, K. E., N. Sha, E. R. Dougherty, M. Vannucci, and B. Mallick (2003). Gene selection: A Bayesian variable selection approach. Bioinformatics 19, 90–97. Liang, F., R. Paulo, G. Molina, M. A. Clyde, and J. O. Berger (2008). Mixture of g−priors for Bayesian variable selection. Journal of the American Statistical Association 103, 410–423. Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer-Verlag, New York. Owen, A. and Y. Zhou (2000). Safe and effective importance sampling. Journal of the American Statistical Association 95, 135–143. Robert, C. P. and G. Casella (2004). Monte Carlo Statistical Methods (Second ed.). Springer, New York. Scott, J. G. and J. O. Berger (2006). An exploration of aspects of Bayesian multiple testing. Journal of Statistical Planning and Inference 136, 2144–2162. Sha, N., M. Vannucci, P. J. Brown, M. K. Trower, G. Amphlett, and F. Falciani (2003). Gene selection in arthritis classification with large-scale microarray expression profiles. Comparative and Functional Genomics 4, 171–181. Sha, N., M. Vannucci, M. G. Tadesse, P. J. Brown, I. Dragoni, N. Davies, T. C. Roberts, A. Contestabile, M. Salmon, C. Buckley, and F. Falciani (2004). Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60, 812–819. Shafer, G. (1982). Lindley’s paradox. Journal of the American Statistical Associa- tion 77, 325–351. Singh, D., P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D’Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R. Sellers (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1, 203–209. Strimenopoulou, F. and P. J. Brown (2008). Empirical Bayes logistic regression. Statistical Applications in Genetics and Molecular Biology 7, Article 9. Veach, E. and L. Guibas (1995). Optimally combining sampling techniques for Monte Carlo rendering. In SIGGRAPH ’95 Conference Proceedings, pp. 419–428. Reading, MA: Addision-Wesley. Ventura, V. (2002). Non-parametric bootstrap recycling. Statistics and Comput- ing 12, 261–273. Zhou, X., K.-Y. Liu, and S. T. C. Wong (2004). Cancer classification and prediction using logistic regression with Bayesian gene selection. Journal of Biomedical Informatics 37 (4), 249–259. |
| URI: | http://wrap.warwick.ac.uk/id/eprint/37681 |
Actions (login required)
![]() |
View Item |
Tools
Tools

