Clustering with proportional scaling
Smith, J. Q., 1953-, Anderson, Paul E. and Liverani, Silvia (2008) Clustering with proportional scaling. Working Paper. Coventry: University of Warwick. Centre for Research in Statistical Methodology. (Working papers).
WRAP_Smith_08-04w.pdf - Published Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...
Conjugacy assumptions are often used in Bayesian selection over a partition because they allow the otherwise unfeasibly large model space to be searched very quickly. The implications of such models can be analysed algebraically. In this paper we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Under this family of assignments we prove that model selection is determined by an implicit separation measure: a function of the hyperparameters and the sufficient statistics of clusters in a given partition. We show that this family of separation measures has desirable properties. The proposed methodology is illustrated through the selection of clusters of longitudinal gene expression profiles.
|Item Type:||Working or Discussion Paper (Working Paper)|
|Subjects:||Q Science > QA Mathematics|
|Divisions:||Faculty of Science > Statistics
Faculty of Science > Centre for Systems Biology
|Library of Congress Subject Headings (LCSH):||Cluster analysis|
|Series Name:||Working papers|
|Publisher:||University of Warwick. Centre for Research in Statistical Methodology|
|Place of Publication:||Coventry|
|Number of Pages:||26|
|Status:||Not Peer Reviewed|
|Access rights to Published version:||Open Access|
|References:||Anderson, P. E., J. Q. Smith, K. D. Edwards, and A. J. Millar (2006). Guided Conjugate Bayesian Clustering for Uncovering Circadian Genes. Technical Report 06-07, CRiSM paper, Department of Statistics, University of Warwick. Barry, D. and J. A. Hartigan (1992). Product partitions for change point problems. Annals of Statistics 20, 260–279. Bernardo, J. M. and A. F. M. Smith (1994). Bayesian Theory. Wiley. Chipman, H., E. George, and R. McCullough (1998). Bayesian CART Model Search. J. Amer. Statist. Assoc. 93, 935–960. Chipman, H. and R. Tibshirani (2006). Hybrid Hierarchical Clustering with Applications to Microarray Data. Biostatistics 7, 268–285. Chipman, H. A., E. George, and R. E. McCullock (2001). The Practical Implementation of Bayesian Model Selection. Model Selection 38, 1–50. Chipman, H. A., E. I. George, and R. E. McCulloch (2002). Bayesian treed models. Machine Learning 48(1–3), 299–320. Denison, D. G. T., C. C. Holmes, B. K. Mallick, and A. F. M. Smith (2002). Bayesian Methods for Nonlinear Classification and Regression. Wiley Series in Probability and Statistics. John Wiley and Sons. Edwards, K. D., P. E. Anderson, A. Hall, N. S. Salathia, J. C. W. Locke, J. R. Lynn, M. Straume, J. Q. Smith, and A. J. Millar (2006). FLOWERING LOCUS C Mediates Natural Variation in the High-Temperature Response of the Arabidopsis Circadian Clock. The Plant Cell 18, 639–650. Fernandez, C., E. Ley, and M. J. F. Steel (2001). Benchmark priors for Bayesian Model Averaging. Journal of Econometrics 100, 381–427. Garthwaite, P. H. and J. H. Dickey (1992). Elicitation of prior distributions for variable- selection problems in regression. Annals of Statistics 20(4), 1697–1719. Gordon, A. (1999). Classification (2nd ed.). CRC Press, London: Chapman and Hall. Hastie, T., R. Tibshirani, and J. Friedman (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer-Verlag. Heard, N. A., C. C. Holmes, and D. A. Stephens (2006). A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Applictaion of Bayesian Hierarchical Clustering of Curves. J. Amer. Statist. Assoc. 101(473), 18–29. McCullagh, P. and J. Yang (2006). Stochastic classification models. In Proceedings of the International Congress of Mathematicians, Madrid. O’Hagan, A. and J. Forster (2004). Bayesian Inference: Kendall’s Advanced Theory of Statistics (2nd ed.). Arnold. O’Hagan, A. and H. Le (1994). Conflicting Information and a Class of Bivariate Heavy- tailed Distributions. In P. R. Freeman and A. F. M. Smith (Eds.), Aspects of Uncertainty, pp. 311–327. Wiley. Quintana, F. A. and P. L. Ingelias (2003). Bayesian Clustering and Product Partition Models. J. Royal Statist. Soc.: Series B 65(2), 557–574. Ray, S. and B. Mallick (2006). Functional clustering by Bayesian wavelet methods. J. Royal Statist. Soc.: Series B 68(2), 305–332. Smith, M. and R. Kohn (1996). Non-parametric Regression using Bayesian Variable Selec- tion. Journal of Econometrics 75, 317–343. Wakefield, J., C. Zhou, and S. Self (2003). Modelling gene expression over time: curve clustering with informative prior distributions. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West (Eds.), Bayesian Statistics 7. Oxford University Press. West, M., , and J. Harrison (1997). Bayesian forecasting and dynamic models (2nd ed.). New York: Springer-Verlag. Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g- prior distributions. In P. K. Goel and A. Zellner (Eds.), Bayesian Inference and Decision Techniques: Essays in Honour of Bruno De Finetti, pp. 233–243. Elsevier.|
Actions (login required)