Browse > Article

A Heuristic Algorithm to Find All Normalized Local Alignments Above Threshold  

Kim, Sangtae (Department of Computer Science, Korea Military Academy)
Sim, Jeong Seop (Electronics and Telecommunications Research Institute)
Park, Heejin (School of Computer Science and Engineering, Seoul National University)
Park, Kunsoo (Electronics and Telecommunications Research Institute, School of Computer Science and Engineering, Seoul National University)
Park, Hyunseok (Institute of Bioinformatics, Macrogen, Inc., Department of Computer Science, Ewha Womans University)
Seo, Jeong-Sun (Institute of Bioinformatics, Macrogen, Inc., Ilcheon Molecular Medicine Institute, Seoul National University)
Abstract
Local alignment is an important task in molecular biology to see if two sequences contain regions that are similar. The most popular approach to local alignment is the use of dynamic programming due to Smith and Waterman, but the alignment reported by the Smith-Waterman algorithm has some undesirable properties. The recent approach to fix these problems is to use the notion of normalized scores for local alignments by Arslan, Egecioglu and Pevzner. In this paper we consider the problem of finding all local alignments whose normalized scores are above a given threshold, and present a fast heuristic algorithm. Our algorithm is 180-330 times faster than Arslan et al.'s for sequences of length about 120 kbp and about 40-50 times faster for sequences of length about 30 kbp.
Keywords
local alignment; dynamic programming; normalized score; fractional programming;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Arslan, A.N., and E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}.(2003), Efficient algorithms for normalized edit distances, Journal of Discrete Algorithms, Hermes Science Publications, in press
2 Sellers, P.H. (1984). Pattern recognition in genetic sequences by mismatch density, Bulletin of Mathematical Biology 46, 501-504   DOI
3 Chen, T., and Skiena, S.S., (1997). Trie-based data structures for sequence assembly, Combinatorial Pattem Matching' 97, 206-223
4 Zhang, Z., Berman, P., and Miller, W. (1998). Alignments without low scoring regions, Journal of Computational Biology 5, 197-200   DOI   ScienceOn
5 Arslan, A.N., and E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}.(1999), An efficient uniform-cost normalized edit distance algorithm, Symposium on String Processing and Information Retrieval' 99, IEEE Computer Society, 8-15
6 Green, P., Documentation for phrap, Genome Center, University of Washington, http://www.phrap.org/phrap.docs/phrap.html
7 Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences, Cambridge University Press
8 Marzal, A., and Vidal, E. (1993) Computation of normalized edit distances and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 926-932   DOI   ScienceOn
9 Alexandrov, N.N., and Solovyev, V.V., (1998). Statistical significance of ungapped alignments, Pacific Symposium on Biocomputing' 98, 463-472
10 Arslan, A.N., E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}., and Pevzner, P. (2001). A new approach to sequence comparison: normalized sequence alignment, Bioinformatics 17, 327-337   DOI   ScienceOn
11 Dinkelbach, W., (1967). On nonlinear fractional programming, Management Science 13, 492-498   DOI   ScienceOn
12 E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}., and Ibel, M. (1996). Parallel algorithms for fast computation of normalized edit distances, IEEE Symposium on Parallel and Distributed Processing' 96, 496-503
13 Goad, W.B., and Kanehisa, M.I. (1982). Pattern recognition in nucleic acid sequences. i. a general method for finding local homologies and symmetries, Nucleic Acids Research 10, 247-263   DOI   ScienceOn
14 Smith, T.F., and Waterman, M.S. (1981). Identification of common molecular subsequences, Journal of Molecular Biology 147, 195-197   DOI   PUBMED
15 Lipman, D., and Pearson, W. (1988) Improved tools for biological sequence comparison, Proceedings of National Academy of Science 85, 2444-2448   DOI   ScienceOn
16 Setubal, J., and Meidanis, J., (1997). Introduction to computational molecular biology, PWS Publishing Company
17 Waterman, M.S., (1995). Introduction to Computational Biology, Chapman & Hall, London
18 Gotoh, O., (1982). improved algorithm for matching biological sequences, Joumal of Molecular Biology 162, 705-708   DOI   PUBMED
19 Zhang, Z., Berman, P., Wiehe, T., and Miller, W. (1999). Post-processing long pairwise alignments, Bioinformatics 15, 1012-1019   DOI   ScienceOn