A Heuristic Algorithm to Find All Normalized Local Alignments Above Threshold

Kim, Sangtae;Sim, Jeong Seop;Park, Heejin;Park, Kunsoo;Park, Hyunseok;Seo, Jeong-Sun;

Genomics & Informatics

Volume 1 Issue 1
/
Pages.25-31
/
2003
/
1598-866X(pISSN)
/
2234-0742(eISSN)

Korea Genome Organization (한국유전체학회)

A Heuristic Algorithm to Find All Normalized Local Alignments Above Threshold

Kim, Sangtae (Department of Computer Science, Korea Military Academy) ;
Sim, Jeong Seop (Electronics and Telecommunications Research Institute) ;
Park, Heejin (School of Computer Science and Engineering, Seoul National University) ;
Park, Kunsoo (Electronics and Telecommunications Research Institute, School of Computer Science and Engineering, Seoul National University) ;
Park, Hyunseok (Institute of Bioinformatics, Macrogen, Inc., Department of Computer Science, Ewha Womans University) ;
Seo, Jeong-Sun (Institute of Bioinformatics, Macrogen, Inc., Ilcheon Molecular Medicine Institute, Seoul National University)

Published : 2003.09.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Local alignment is an important task in molecular biology to see if two sequences contain regions that are similar. The most popular approach to local alignment is the use of dynamic programming due to Smith and Waterman, but the alignment reported by the Smith-Waterman algorithm has some undesirable properties. The recent approach to fix these problems is to use the notion of normalized scores for local alignments by Arslan, Egecioglu and Pevzner. In this paper we consider the problem of finding all local alignments whose normalized scores are above a given threshold, and present a fast heuristic algorithm. Our algorithm is 180-330 times faster than Arslan et al.'s for sequences of length about 120 kbp and about 40-50 times faster for sequences of length about 30 kbp.

Keywords

References

Alexandrov, N.N., and Solovyev, V.V., (1998). Statistical significance of ungapped alignments, Pacific Symposium on Biocomputing' 98, 463-472
Arslan, A.N., and E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}.(1999), An efficient uniform-cost normalized edit distance algorithm, Symposium on String Processing and Information Retrieval' 99, IEEE Computer Society, 8-15
Arslan, A.N., and E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}.(2003), Efficient algorithms for normalized edit distances, Journal of Discrete Algorithms, Hermes Science Publications, in press
Arslan, A.N., E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}., and Pevzner, P. (2001). A new approach to sequence comparison: normalized sequence alignment, Bioinformatics 17, 327-337 https://doi.org/10.1093/bioinformatics/17.4.327
Chen, T., and Skiena, S.S., (1997). Trie-based data structures for sequence assembly, Combinatorial Pattem Matching' 97, 206-223
Dinkelbach, W., (1967). On nonlinear fractional programming, Management Science 13, 492-498 https://doi.org/10.1287/mnsc.13.7.492
E{\breve{g}}ecio{\breve{g}}lu,\;{\ddot{O}}., and Ibel, M. (1996). Parallel algorithms for fast computation of normalized edit distances, IEEE Symposium on Parallel and Distributed Processing' 96, 496-503
Gotoh, O., (1982). improved algorithm for matching biological sequences, Joumal of Molecular Biology 162, 705-708 https://doi.org/10.1016/0022-2836(82)90398-9
Goad, W.B., and Kanehisa, M.I. (1982). Pattern recognition in nucleic acid sequences. i. a general method for finding local homologies and symmetries, Nucleic Acids Research 10, 247-263 https://doi.org/10.1093/nar/10.1.247
Green, P., Documentation for phrap, Genome Center, University of Washington, http://www.phrap.org/phrap.docs/phrap.html
Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences, Cambridge University Press
Lipman, D., and Pearson, W. (1988) Improved tools for biological sequence comparison, Proceedings of National Academy of Science 85, 2444-2448 https://doi.org/10.1073/pnas.85.8.2444
Marzal, A., and Vidal, E. (1993) Computation of normalized edit distances and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 926-932 https://doi.org/10.1109/34.232078
Sellers, P.H. (1984). Pattern recognition in genetic sequences by mismatch density, Bulletin of Mathematical Biology 46, 501-504 https://doi.org/10.1007/BF02459499
Setubal, J., and Meidanis, J., (1997). Introduction to computational molecular biology, PWS Publishing Company
Smith, T.F., and Waterman, M.S. (1981). Identification of common molecular subsequences, Journal of Molecular Biology 147, 195-197 https://doi.org/10.1016/0022-2836(81)90087-5
Waterman, M.S., (1995). Introduction to Computational Biology, Chapman & Hall, London
Zhang, Z., Berman, P., and Miller, W. (1998). Alignments without low scoring regions, Journal of Computational Biology 5, 197-200 https://doi.org/10.1089/cmb.1998.5.197
Zhang, Z., Berman, P., Wiehe, T., and Miller, W. (1999). Post-processing long pairwise alignments, Bioinformatics 15, 1012-1019 https://doi.org/10.1093/bioinformatics/15.12.1012

Genomics & Informatics

A Heuristic Algorithm to Find All Normalized Local Alignments Above Threshold

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)