The Library
The geometry of independence tree models with hidden variables
Tools
Zwiernik, Piotr and Smith, J. Q., 1953 (2010) The geometry of independence tree models with hidden variables. Working Paper. Coventry: University of Warwick. Centre for Research in Statistical Methodology. (Working papers).

PDF
WRAP_Zwiernik_1003w.pdf  Published Version  Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader Download (602Kb) 
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...
Abstract
In this paper we investigate the geometry of undirected discrete graphical models of trees when all the variables in the system are binary, where leaves represent the observable variables and where the inner nodes are unobserved. We obtain a full geometric description of these models which is given by polynomial equations and inequalities. We also give exact formulas for their parameters in terms of the marginal probability over the observed variables. Our analysis is based on combinatorial results generalizing the notion of cumulants and introduce a novel use of Mobius functions on partially ordered sets. The geometric structure we obtain links to the notion of a tree metric considered in phylogenetic analysis and to some interesting determinantal formulas involving hyperdeterminants of 2 x 2 x 2 tables as defined in [19].
Item Type:  Working or Discussion Paper (Working Paper) 

Subjects:  Q Science > QA Mathematics 
Divisions:  Faculty of Science > Statistics 
Library of Congress Subject Headings (LCSH):  Trees (Graph theory), Multivariate analysis  Graphic methods 
Series Name:  Working papers 
Publisher:  University of Warwick. Centre for Research in Statistical Methodology 
Place of Publication:  Coventry 
Date:  2010 
Volume:  Vol.2010 
Number:  No.3 
Number of Pages:  26 
Status:  Not Peer Reviewed 
Access rights to Published version:  Open Access 
References:  [1] E. S. Allman and J. A. Rhodes, Phylogenetic ideals and varieties for the general Markov model, Adv. in Appl. Math., 40 (2008), pp. 127{148. [2] V. Auvray, P. Geurts, and L. Wehenkel, A SemiAlgebraic Description of Discrete Naive Bayes Models with Two Hidden Classes, in Proc. Ninth International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, Florida, Jan 2006. [3] S. Basu, R. Pollack, and M. Roy, Algorithms in Real Algebraic Geometry, Springer, 2003. [4] P. Buneman, A note on the metric properties of trees, J. Combinatorial Theory Ser. B, 17 (1974), pp. 48{50. [5] M. Casanellas and J. FernandezSanchez, Performance of a New Invariants Method on Homogeneous and Nonhomogeneous Quartet Trees, Molecular Biology and Evolution, 24 (2007), p. 288. [6] J. Cavender and J. Felsenstein, Invariants of phylogenies in a simple case with discrete states, Journal of Classification, 4 (1987), pp. 57{71. [7] J. A. Cavender, Letter to the editor, Molecular Phylogenetics and Evolution, 8 (1997), pp. 443 { 444. [8] J. Chang, Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency, Mathematical Biosciences, 137 (1996), pp. 51{73. [9] D. A. Cox, J. B. Little, and D. O'Shea, Ideals, Varieties, and Algorithms, SpringerVerlag, NY, 3rd ed., 2007. [10] D. R. Cox and N. Wermuth, A note on the quadratic exponential binary distribution, Biometrika, 81 (1994), pp. 403{408. [11] M. Drton and S. Sullivant, Algebraic Statistical Models, Statistica Sinica, 17 (2007), pp. 1273{1297. [12] N. Eriksson, Using invariants for phylogenetic tree construction, vol. 149 of The IMA Volumes in Mathematics and its Applications, Springer, 2007, pp. 89{108. [13] N. Eriksson, K. Ranestad, B. Sturmfels, and S. Sullivant, Phylogenetic algebraic geometry, in Projective varieties with unexpected properties, Walter de Gruyter GmbH & Co. KG, Berlin, 2005, pp. 237{255. [14] W. Feller, An Introduction to Probability Theory and Applications, vol. 2, John Wiley & Sons, New York, second ed., 1971. [15] L. Garcia, M. Stillman, and B. Sturmfels, Algebraic geometry of Bayesian networks, J. Symbolic Comput, 39 (2005), pp. 331{355. [16] D. Geiger, D. Heckerman, H. King, and C. Meek, Stratified exponential families: graphical models and model selection, Ann. Statist., 29 (2001), pp. 505{529. [17] D. Geiger and C. Meek, Graphical models and exponential families, in Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, Madison, WI, August 1998, pp. 156{165. [18] D. Geiger, C. Meek, and B. Sturmfels, On the toric algebra of graphical models, Annals of Statistics, 34 (2006), pp. 1463{1492. [19] I. Gelfand, M. Kapranov, and A. Zelevinsky, Discriminants, Resultants, and Multidimensional Determinants, Birkhauser, 1994. [20] Z. Gilula, Singular value decomposition of probability matrices: Probabilistic aspects of latent dichotomous variables, Biometrika, 66 (1979), pp. 339{344. [21] J. Lake, A rateindependent technique for analysis of nucleic acid sequences: evolutionary parsimony, 1987. [22] S. L. Lauritzen, Graphical models, vol. 17 of Oxford Statistical Science Series, The Clarendon Press Oxford University Press, New York, 1996. Oxford Science Publications. [23] P. Lazarsfeld and N. Henry, Latent structure analysis, Houghton, Miffin, New York, 1968. [24] F. Matsen, Fourier transform inequalities for phylogenetic trees, Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 6 (2009), pp. 89{95. [25] P. McCullagh, Tensor methods in statistics, Monographs on Statistics and Applied Probability, Chapman & Hall, London, 1987. [26] J. Pearl and M. Tarsi, Structuring causal trees, J. Complexity, 2 (1986), pp. 60{77. Complexity of approximately solved problems (Morningside Heights, N.Y., 1985). [27] G. Pistone and H. P. Wynn, Cumulant varieties, Journal of Symbolic Computation, 41 (2006), pp. 210{221. [28] G. Rota, On the foundations of combinatorial theory I. Theory of Mobius Functions, Probability Theory and Related Fields, 2 (1964), pp. 340{368. [29] G.C. Rota and J. Shen, On the combinatorics of cumulants, J. Combin. Theory Ser. A, 91 (2000), pp. 283{304. In memory of GianCarlo Rota. [30] D. Rusakov and D. Geiger, Asymptotic model selection for naive Bayesian networks, J. Mach. Learn. Res., 6 (2005), pp. 1{35 (electronic). [31] C. Semple and M. Steel, Phylogenetics, vol. 24 of Oxford Lecture Series in Mathematics and its Applications, Oxford University Press, Oxford, 2003. [32] R. Settimi and J. Q. Smith, Geometry, moments and conditional independence trees with hidden variables, Ann. Statist., 28 (2000), pp. 1179{1205. [33] R. Speicher, Free probability theory and noncrossing partitions, Sem. Lothar. Combin., 39 (1997), pp. Art. B39c, 38 pp. (electronic). [34] D. J. Spiegelhalter, A. P. Dawid, S. L. Lauritzen, and R. G. Cowell, Bayesian analysis in expert systems, Statist. Sci., 8 (1993), pp. 219{283. With comments and a rejoinder by the authors. [35] R. P. Stanley, Enumerative combinatorics. Volume I, no. 49 in Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2002. [36] M. Steel and B. Faller, Markovian logsupermodularity, and its applications in phylogenetics, Applied Mathematics Letters, (2009). [37] B. Streitberg, Lancaster interactions revisited, Ann. Statist., 18 (1990), pp. 1878{1885. [38] B. Sturmfels, Solving systems of polynomial equations, vol. 97 of CBMS Regional Conference Series in Mathematics, Published for the Conference Board of the Mathematical Sciences, Washington, DC, 2002. [39] B. Sturmfels and S. Sullivant, Toric Ideals of Phylogenetic Invariants, Journal of Computational Biology, 12 (2005), pp. 204{228. [40] S. Sullivant, Algebraic geometry of Gaussian Bayesian networks, Advances in Applied Mathematics, 40 (2008), pp. 482{513. 
URI:  http://wrap.warwick.ac.uk/id/eprint/35067 
Actions (login required)
View Item 