Browse > Article
http://dx.doi.org/10.5351/CKSS.2005.12.1.139

Binary Segmentation Procedure for Detecting Change Points in a DNA Sequence  

Yang Tae Young (Department of Mathematics, Myongji University)
Kim Jeongjin (Department of Mathematics, Myongji University)
Publication Information
Communications for Statistical Applications and Methods / v.12, no.1, 2005 , pp. 139-147 More about this Journal
Abstract
It is interesting to locate homogeneous segments within a DNA sequence. Suppose that the DNA sequence has segments within which the observations follow the same residue frequency distribution, and between which observations have different distributions. In this setting, change points correspond to the end points of these segments. This article explores the use of a binary segmentation procedure in detecting the change points in the DNA sequence. The change points are determined using a sequence of nested hypothesis tests of whether a change point exists. At each test, we compare no change-point model with a single change-point model by using the Bayesian information criterion. Thus, the method circumvents the computational complexity one would normally face in problems with an unknown number of change points. We illustrate the procedure by analyzing the genome of the bacteriophage lambda.
Keywords
Bayesian information criterion; bacteriophage lambda; binary segmentation procedure;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Scott, A. and Knott, M. (1974). Cluster analysis method for grouping means in the analysis of variance, Biometrics, Vol. 30, 507-512   DOI   ScienceOn
2 Titterington, D.M., Smith, A.F.M. and Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions, Wiley, New York
3 van Dyk, D.A. and Hans, C.M. (2002). Accounting for absorption lines in images obtained with the Chandra X-ray Observatory, In Spatial Cluster Modelling, A. Lawson and D. Denison (editors). Chapman and Hall, London, 175-198
4 Venkatraman, E.S. (1992). Consistency results in multiple change-point situations, Unpublished PhD Thesis, Department of Statistics, Stanford University
5 Vostrikova, L.J, (1981). Detecting 'disorder' in multidimensional random processes, Soviet Mathematics Doklady, Vol. 24, 55-59
6 Yang, T.Y. and Kuo, L. (2001). Bayesian binary segmentation procedure for a Poisson process with multiple changepoints, Journal of Computational and Graphical Statistics, Vol. 10, 772-785   DOI   ScienceOn
7 Yang, T.Y. (2005). A tree-based model for homogeneous groupings of multinomials, Statistics in Medicine, in press
8 Yang, T.Y. and Swartz, T. (2005). Applications of binary segmentation to the estimation of quantal response curves and spatial intensity. Biometrical Journal, in press
9 Braun, J,V., Braun, P.K. and Muller, H. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation, Biometrika, Vol 87, 301-314   DOI   ScienceOn
10 Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees, Wadworth and Brooks/Cole, Monterey
11 Holm, S. (1979). A simple sequentially rejective Bonferroni test procedure, Scandinavian Journal of Statistics. Vol. 6, 65-70
12 Kadane, J.B. and Lazar, N.A. (2004). Methods and criteria for model selection, Journal of the American Statistical Society, Vol. 99 279-290   DOI   ScienceOn
13 Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals, Journal of the American Statistical Association, Vol. 92, 894-902   DOI   ScienceOn
14 Kass, R.E. and Raftery, A.E. (1995). Bayes factor, Journal of the American Statistical Association, Vol. 90, 773-795   DOI   ScienceOn
15 Kim, H. and Mallick, B.K. (2002). Analyzing spatial data using skew-Gaussian processes, In Spatial Cluster Modelling, A Lawson and D. Denison (editors). Chapman and Hall, London, 163-173
16 Liu, J.S. and Lawrence, C.E. (1999). Bayesian inference on bipolymer models, Bioinformatics, Vol. 15, 38-52   DOI   ScienceOn
17 Schlattmann, P., Gallinat, J. and Bohning, D. (2002). Spatia-temporal partition modelling: an example from neurophysiology, In Spatial Cluster Modelling, A Lawson and D. Denison (editors). Chapman and Hall, London, 227-234
18 Schwarz, G. (1978). Estimating the dimension of a model, The Annals of Statistics, Vol. 6, 461-464   DOI   ScienceOn
19 Skalka, A. Burge, E. and Hershey, A.D. (1968). Segmental distribution of nucleotides in the DNA of bacteriophage lambda, Journal of Molecular Biology, Vol. 34, 1-16   DOI
20 Akaike, H. (1973). Information measures and model selection, Bulletin of the International Statistical Institute, Vol. 50, 277-290
21 Braun, J.V. and Muller, H. (1998). Statistical methods for DNA sequence segmentation, Statistical Science, Vol. 13, 142-162   DOI   ScienceOn
22 Raftery, A. (1995). Bayesian model selection in social research, In Sociological Methodology, Marsden P(ed). Blackwells, Cambridge, 111-196
23 Yang, T.Y. (2004). Bayesian binary segmentation procedure for detecting streakiness in sports, Journal of the Royal Statistical Society Series A, Vol. 167, 627-637   DOI   ScienceOn
24 Chen, J. and Gupta, A. (1997). Testing and locating variance change points with applications to stock prices, Journal of the American Statistical Association, Vol. 92, 739-747   DOI   ScienceOn