Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations
Lamnisos, Demetris, Griffin, Jim E. and Steel, Mark F. J. (2008) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. Working Paper. Coventry: University of Warwick. Centre for Research in Statistical Methodology. (Working papers).
WRAP_Lamnisos_08-08w.pdf - Published Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...
One flexible technique for model search in probit regression is Markov chain Monte Carlo methodology that simultaneously explores the model and parameter space. The reversible jump sampler is designed to achieve this simultaneous exploration. Standard samplers, such as those based on MC3, often have low model acceptance probabilities when there are many more regressors than observations. Simple changes to the form of the proposal leads to much higher acceptance rates. However, high acceptance rates are often associated with poor mixing of chains. This suggests defining a more general model proposal that allows us to propose models "further" from our current model. We design such a proposal which can be tuned to achieve a suitable acceptance rate for good mixing (rather like the tuning of a random walk proposal in fixed dimension problems). The effectiveness of this proposal is linked to the form of the marginalisation scheme when updating the model and we propose a new efficient implementation of the automatic generic transdimensional algorithm of Green (2003), which uses our preferred marginalisation. The efficiency of these methods is compared with several previously proposed samplers on some gene expression data sets. The samplers considered are: the data augmentation method of Holmes and Held (2006), the automatic generic transdimensional algorithm of Green (2003) and the efficient jump proposal methods of Brooks et al (2003). Finally, the results of these applications lead us to propose guidelines for choosing between samplers.
|Item Type:||Working or Discussion Paper (Working Paper)|
|Subjects:||Q Science > QA Mathematics|
|Divisions:||Faculty of Science > Statistics|
|Library of Congress Subject Headings (LCSH):||Sampling (Statistics), Probits|
|Series Name:||Working papers|
|Publisher:||University of Warwick. Centre for Research in Statistical Methodology|
|Place of Publication:||Coventry|
|Number of Pages:||30|
|Status:||Not Peer Reviewed|
|Access rights to Published version:||Open Access|
|Version or Related Resource:||Lamnisos, Demetris, et al. (2009). Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. Journal of Computational and Graphical Statistics, 18(3), pp. 592-612. http://wrap.warwick.ac.uk/id/eprint/17215|
|References:||Albert, J. and S. Chib (1993): "Bayesian analysis of binary and polychotomous response data," Journal of the American Statistical Association, 88, 669-679. Alon, U., N. Barkai, and D. A. Notterman (1999): "Broad patterns of gene expres- sion revealed by clustering analysis of tumour and normal colon tissues probe by oligonucleotide array," Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750. Armstrong, S. A., J. E. Staunton and L. B. Silverman (2002): "MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia," Nature Genetics, 30, 41-47. Atchade, Y. F. and J. S. Rosenthal (2005): "On adaptive Markov chain Monte Carlo algorithms," Bernoulli, 5, 815-828. Brooks, S. P., P. Giudici and G. O. Roberts (2003): "Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions," Journal of the Royal Statistical Society B, 65, 3-55. Brown, P. J., M. Vanucci and T.Fearn (1998a): "Multivariate Bayesian variable selection and prediction," Journal of the Royal Statistical Society B, 60, 627- 641. Brown, P. J., M. Vanucci and T.Fearn (1998b): "Bayesian wavelength selection in multicomponent analysis," Journal of Chemometrics, 12, 173-182. Chipman, H., E. I. George and R. E. McCullogh (2001): "The practical implementation of Bayesian model selection," in Model Selection, ed. P.Lahiri, Hayward, CA:IMS, 67-134. Denison, D. G. T., C. C. Holmes, B. K. Mallick and A. F. M. Smith (2002): Bayesian Methods for Nonlinear Classification and Regression, Chichester: John Wiley & Sons. Dudoit, S., J. Fridlyand and T. P. Speed (2002): "Comparison of discrimination methods for the classification of tumous using gene expression data," Journal of the American Statistical Association, 97. 77-87. Fernandez, C., E. Ley and M. F. J. Steel (2001): "Benchmark priors for Bayesian model averaging," Journal of Econometrics, 100, 381-427. Gamerman, D. (1997): "Sampling from the posterior distribution in generalized linear mixed models," Statistics and Computing, 7, 57-68. Geweke, J. (1991): "Efficient simulation from the multivariate normal and student- t distributions subject to linear constraints and the evaluation of constraint probabilities," Computing Science and Statistics: Proceedings of the Twenty- Third Symposium on the Interface, 571-578. Alexandria, Virginia: American Statistical Association. Geyer, C. J. (1992): "Practical Markov chain Monte Carlo," Statistical Science, 7, 473-511. Golub, T. R., D. K. Slonim, P. Tamayo and C. Huard (1999): "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring," Science, 531-537. Green, P. J. (1995): "Reversible jump Markov chain Monte Carlo computation and Bayesian model determination," Biometrika, 82, 711-732. Green, P. J. (2003): "Trans-dimensional Markov chain Monte Carlo," in Highly Structured Stochastic Systems, eds. Green, P.J, N.L. Hjord and S.Richardson, Oxford, U.K.: Oxford University Press, 179-198. Hans, C., A. Dobra and M. West (2007): "Shotgun stochastic search for "large p" regression," Journal of the American Statistical Association, 102, 507-516. Holmes, C. C. and L. Held (2006): "Bayesian auxiliary variable models for binary and multinomial regression," Bayesian Analysis, 1, 145-168. Lee, K. E., N. Sha, R. Dougherty, M. Vannucci and B. K. Mallick (2003): "Gene selection: A Bayesian variable selection approach," Bioinformatics, 19, 90-97. Ley, E. and M.F.J. Steel (2007): "On the effect of prior assumptions in Bayesian Model Averaging with applications to growth regression," Journal of Applied Econometrics, forthcoming . Madigan, D. and J. York (1995): "Bayesian graphical models for discrete data," International Statistical Review, 63, 215-232. Mitchell, T. J. and J. J. Beauchamp (1988): "Bayesian variable selection in linear regression," Journal of the American Statistical Association, 83, 1023-1032. Nguyen, D. V. and D. M. Rocke (2002): "Tumor classification by partial least squares using microaary gene expression data," Bioinformatics, 18, 39-50. Raftery, A. E, D. Madigan and J. A. Hoeting (1997): "Bayesian model averaging for linear regression models," Journal of the American Statistical Assocation, 92, 179-191. Roberts, G. O. and J. S. Rosenthal (2001): "Optimal scaling of various Metropolis- Hastings algorithms," Statistical Science, 16, 351-367. Sha, N., M. Vanucci, P. J. Brown, M. Trower and G. Amphlett (2003): "Gene selec- tion in arthritis classi¯cation with large-scale microarray expression profiles," Comparative and Functional Genomics, 4, 171-181. Sha, N., M. Vanucci, M. G. Tadesse, P. J. Brown, I. Dragoni, N. Davies, T. C. Roberts, A. Contestabile, M. Salmon, C. Buckley and F. Falciani (2004): "Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage," Biometrics, 60, 812-819. Singh, D., P. G. Febbo and K. Ross (2002): "Gene expression correlates of clinical prostate cancer behaviour," Cancer cell, 1, 203-209. Sisson, S. (2005): "Transdimensional Markov chains: A decade of progress and future perspectives," Journal of the American Statistical Association, 100, 1077-1089. Yeung, K. Y., R. E. Bumgarner and A. E. Raftery (2005): "Bayesian model averaging: development of an improved multi-class gene selection and classification tool for microarray data," Bioinformatics, 21, 2394-2402.|
Actions (login required)