Browse > Article
http://dx.doi.org/10.9723/jksiis.2012.17.6.041

Effective Biological Sequence Alignment Method using Divide Approach  

Choi, Hae-Won (Department of Computer Engineering, Kyungwoon University)
Kim, Sang-Jin (Department of Computer Engineering, Kyungwoon University)
Pi, Su-Young (Department of Computer Engineering, Catholic University of Daegu)
Publication Information
Journal of Korea Society of Industrial Information Systems / v.17, no.6, 2012 , pp. 41-50 More about this Journal
Abstract
This paper presents a new sequence alignment method using the divide approach, which solves the problem by decomposing sequence alignment into several sub-alignments with respect to exact matching subsequences. Exact matching subsequences in the proposed method are bounded on the generalized suffix tree of two sequences, such as protein domain length more than 7 and less than 7. Experiment results show that protein sequence pairs chosen in PFAM database can be aligned using this method. In addition, this method reduces the time about 15% and space of the conventional dynamic programming approach. And the sequences were classified with 94% of accuracy.
Keywords
suffix tree; DNA sequence alignment; divide method; dynamic algorithm;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 David W., Bioinformatics, sequences and Genome Analysis, MOUNT Press, 2001.
2 Younshin Oh, Dinh Truong Nguyen, "Identification of 1,531 cSNPs from Full-length Enriched cDNA Libraries of the Korean Native Pig Using in Silico Analysis," Genomics & Informatics, vol. 7, no. 2, 2009, pp. 65-84.   DOI
3 Audry P. G., Alan M.M., "Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi," PLOS Biology, vol. 2, no. 12, 2004, pp. 398-405.   DOI
4 Josue Samayoa1, Fitnat H. Yildiz and Kevin Karplus, "Identification of prokaryotic small proteins using a comparative genomic approach," Bioinformatics, vol.27, no.13, 2011, pp.1765-1771.   DOI
5 Chan Park, Ji-Seong Jeong, "Design and Implementation of Bio-Medical Data Measurement System through the Stereo Microscope," Korea Contents Association KISTI-KOCON ICCC2009, November, vol.7, no.2, 2009, pp.357-360.
6 Young-Ohk Song, Sung-young Kim and Duk- Jin Chang, "Design of the System and Algorithm for the Pattern Analysis of the Bio-Data," Korea Contents Association, November, vol.10, no.8, 2008, pp.104-110.   과학기술학회마을   DOI
7 이성열, "A Modified Heuristic Algorithm for the Mixed Model Assembly Line Balancing ," 산업정보학회논문지, vol.15, no.3, 2010, pp.51-57.   과학기술학회마을
8 유기동, "문서 자동요약 기술을 적용한 클라우드 스토리지 기반 지능적 아카이빙 시스템," 산업정보학회논문지, vol.17, no.3, 2012, pp.59-68.   과학기술학회마을   DOI
9 P. Agarwal, "Comparative accuracy of methods for protein sequences similarity search," Bioinformatics, vol.14, no.1, 1998, pp.40-47.   DOI
10 X. Guan and L. Du, "Domain identification by clustering sequences alignment," Bioinformatics, vol.14, no.9,1998, pp.783-788.   DOI
11 D. Gusfield, Algorithms on strings, trees, and sequences : Computer science and Computational biology, CAMBRIDGE University Press, 1997.
12 Data Structure and Algorithm: Tree and Suffix trees, Mcgill University ,1997.
13 J. Karkkainen and E. Ukkonen, "Sparse Suffix Tree," COCOON , 1996, pp.219-233.
14 E. Ukkonen, "On-line Construction of Suffix- Trees," Algorithmica, vol.14, 1995, pp.249-260.   DOI   ScienceOn
15 Mark Nelson, "Fast String Searching With Suffix Trees," Dr. Dobb's Journal, 1996.
16 M.I. Abouelhoda, S. Kurtz, and E. Ohkebusch, "Replacing suffix trees with enhances suffix arrays," Journal of Discrete Algorithms, vol. 2, no. 1, 2004, pp. 53-86.   DOI
17 D.K.Kim, M.Kim, and H.Park, "Linearized suffix tree: an efficient index data structure with the capabilities of suffix trees and suffix arrays," Algorithmica, vol. 52, no. 3, 2008, pp. 350-377.   DOI
18 L.Russo, G.Navarro, and A.Oliveria, "Fully- Comoressed suffix trees," LATIN, 2008, pp. 362-373.
19 Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. and Lander, E. S. "Human and mouse gene structure: comparative analysis and application to exon prediction," Genome Research, vol. 10, 2000, pp. 950-958.   DOI
20 Sean R. Eddy, "Where did the BLOSUM62 alignment score matrix come from?," Nature Biotechnology, vol.22, 2004. pp.1035-1046.   DOI
21 I. Mihalek1, I. Res and O. Lichtarge, "Background frequencies for residue variability estimates: BLOSUM revisited," BMC Bioinformatics, vol.8, 2007, pp.488-498.   DOI
22 Marco Punta1,Penny C. Coggill, "The Pfam protein families database," Nucleic Acids Research, November, 2011, pp.1-12.
23 Alex Bateman, "The PFAM Protein Family Database," Nucl. Acids Res. vol. 30, no. 1, 2002, pp. 276-280   DOI   ScienceOn