Waiting time distribution of generalized later patterns
Martina, Donald E. K. and Aston, John A. D.. (2008) Waiting time distribution of generalized later patterns. Computational Statistics & Data Analysis, Volume 52 (Number 11). pp. 4879-4890. ISSN 0167-9473Full text not available from this repository.
Official URL: http://dx.doi.org/10.1016/j.csda.2008.04.019
In this paper the concept of later waiting time distributions for patterns in multi-state trials is generalized to cover a collection of compound patterns. that must all be counted pattern-specific numbers of times, and a practical method is given to compute the generalized distribution. The solution given applies to overlapping counting and two types of non-overlapping counting, and the underlying sequences are assumed to be Markovian of a general order. Patterns are allowed to be weighted so that an occurrence is counted multiple times, and patterns may be completely included in longer patterns. Probabilities are computed through an auxiliary Markov chain. As the state space associated with the auxiliary chain can be quite large if its setup is handled in a naive fashion, an algorithm is given for generating a "minimal". state space that leaves out states that can never be reached. For the case of overlapping counting, a formula that relates probabilities for intersections of events to probabilities for unions of subsets of the events is also used, so that the distribution is also computed in terms of probabilities for competing patterns. A detailed example is given to illustrate the methodology. (C) 2008 Elsevier B.V. All rights reserved.
|Item Type:||Journal Article|
|Subjects:||Q Science > QA Mathematics|
|Divisions:||Faculty of Science > Statistics|
|Library of Congress Subject Headings (LCSH):||Distribution (Probability theory), Bioinformatics|
|Journal or Publication Title:||Computational Statistics & Data Analysis|
|Publisher:||Elsevier Science Ltd|
|Date:||15 July 2008|
|Number of Pages:||12|
|Page Range:||pp. 4879-4890|
|Access rights to Published version:||Restricted or Subscription Access|
|References:||Aston, J.A.D., Martin, D.E.K., 2005. Waiting time distributions of competing patterns in higher-order Markovian sequences. J. Appl. Probab. 42, 977–988. Balakrishnan, N., Koutras, M.V., 2002. Runs and Scans with Applications. John Wiley & Sons, Inc., New York. Biggins, J.D., Cannings, C., 1987. Markov renewal processes, counters and repeated sequences in Markov chains. Adv. Appl. Probab. 19 (3), 521–545. Cox, D.R., 1955. The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables. Proc. Cambridge Phil. Soc. 51, 433–441. Ebneshahrashoob, M., Sobel, M., 1990. Sooner and later waiting time problems for Bernoulli trials: Frequency and run quotas. Statist. Probab. Lett. 9, 5–11. Feller, W., 1968. An Introduction to Probability Theory, vol. 1. John Wiley & Sons, Inc., New York. Fu, J.C., Chang, Y.M., 2003. On ordered series and later waiting time distributions in a sequence of Markov dependent trials. J. Appl. Probab. 40 (3), 623–642. Fu, J.C., Koutras, M.V., 1994. Distribution theory of runs: A Markov chain approach. J. Amer. Statist. Assoc. 89, 1050–1058. Hampson, S., Kibler, D., Baldi, P., 2002. Distribution patterns of over-represented k-mers in non-coding yeast DNA. Bioinformatics 18, 513–528. Kirchhamer, C.V., Yuh, C.-H., Davidson, E.H., 1996. Modular cis-regulatory organization of developmentally expressed genes: Two genes transcribed territorially in the sea urchin embryo, and additional examples. Proc. Natl. Acad. Sci. USA 93, 9322–9328. Kolev, N.W., Minkova, L.D., 1999a. Run and frequency quotas in a multi-state Markov chain. Commun. Statist. Theory Meth. 28, 2223–2233. Kolev, N.W., Minkova, L.D., 1999b. Quotas on runs of successes and failures in a multi-state Markov chain. Commun. Statist. Theory Meth. 28, 2235–2248. Ling, K.D., 1992. A generalization of the sooner and later waiting time problems for Bernoulli trials: Frequency quotas. Statist. Probab. Lett. 14, 401–405. Ling, K.D., Low, T.Y., 1993. On the soonest and the latest waiting time distributions: Succession quotas. Commun. Statist. Theory Meth. 22, 2207–2221. Mariño-Ramírez, L., Spouge, J.L., Kange, G.C, Landsman, D., 2004. Statistical analysis of over-represented words in human promoter sequences. Nucleic Acids Res. 32 (3), 949–958. Pavesi, G., Mauri, G., Pesole, G., 2004. In silico representation and discovery of transcription factor binding sites. Briefings Bioinform. 5 (3), 217–236. Robin, S., Rodolphe, F., Schbath, S., 2005. DNA, Words and Models. Cambridge University Press, Cambridge, UK. Sinha, S., Tompa, M., 2002. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 30, 5549–5560. Sumazin, P., Chen, G., Hata, N., Smith, A.D., Zhang, T., Zhang, M.Q., 2005. DWE: Discriminating word enumerator. Bioinformatics 21 (1), 31–38. van Helden, J., Andre, B., Collado-Vides, J., 1998. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842. van Helden, J., del Olmo, M., Pérez-Ortín, J., 2000. Statistical analysis of yeast downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res. 28, 1000–1010.|
Actions (login required)