Effective Biological Sequence Alignment Method using Divide Approach

Choi, Hae-Won;Kim, Sang-Jin;Pi, Su-Young;

doi:10.9723/jksiis.2012.17.6.041

Journal of Korea Society of Industrial Information Systems (한국산업정보학회논문지)

Volume 17 Issue 6
/
Pages.41-50
/
2012
/
1229-3741(pISSN)

Korea Society of Industrial Information Systems (한국산업정보학회)

DOI QR Code

Effective Biological Sequence Alignment Method using Divide Approach

Choi, Hae-Won (Department of Computer Engineering, Kyungwoon University) ;
Kim, Sang-Jin (Department of Computer Engineering, Kyungwoon University) ;
Pi, Su-Young (Department of Computer Engineering, Catholic University of Daegu)

Received : 2012.10.05
Accepted : 2012.11.14
Published : 2012.12.31

https://doi.org/10.9723/jksiis.2012.17.6.041 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents a new sequence alignment method using the divide approach, which solves the problem by decomposing sequence alignment into several sub-alignments with respect to exact matching subsequences. Exact matching subsequences in the proposed method are bounded on the generalized suffix tree of two sequences, such as protein domain length more than 7 and less than 7. Experiment results show that protein sequence pairs chosen in PFAM database can be aligned using this method. In addition, this method reduces the time about 15% and space of the conventional dynamic programming approach. And the sequences were classified with 94% of accuracy.

Keywords

References

David W., Bioinformatics, sequences and Genome Analysis, MOUNT Press, 2001.
Younshin Oh, Dinh Truong Nguyen, "Identification of 1,531 cSNPs from Full-length Enriched cDNA Libraries of the Korean Native Pig Using in Silico Analysis," Genomics & Informatics, vol. 7, no. 2, 2009, pp. 65-84. https://doi.org/10.5808/GI.2009.7.2.065
Audry P. G., Alan M.M., "Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi," PLOS Biology, vol. 2, no. 12, 2004, pp. 398-405. https://doi.org/10.1371/journal.pbio.0020398
Josue Samayoa1, Fitnat H. Yildiz and Kevin Karplus, "Identification of prokaryotic small proteins using a comparative genomic approach," Bioinformatics, vol.27, no.13, 2011, pp.1765-1771. https://doi.org/10.1093/bioinformatics/btr275
Chan Park, Ji-Seong Jeong, "Design and Implementation of Bio-Medical Data Measurement System through the Stereo Microscope," Korea Contents Association KISTI-KOCON ICCC2009, November, vol.7, no.2, 2009, pp.357-360.
Young-Ohk Song, Sung-young Kim and Duk- Jin Chang, "Design of the System and Algorithm for the Pattern Analysis of the Bio-Data," Korea Contents Association, November, vol.10, no.8, 2008, pp.104-110. https://doi.org/10.5392/JKCA.2010.10.8.104
이성열, "A Modified Heuristic Algorithm for the Mixed Model Assembly Line Balancing ," 산업정보학회논문지, vol.15, no.3, 2010, pp.51-57.
유기동, "문서 자동요약 기술을 적용한 클라우드 스토리지 기반 지능적 아카이빙 시스템," 산업정보학회논문지, vol.17, no.3, 2012, pp.59-68. https://doi.org/10.9723/jksiis.2012.17.3.059
P. Agarwal, "Comparative accuracy of methods for protein sequences similarity search," Bioinformatics, vol.14, no.1, 1998, pp.40-47. https://doi.org/10.1093/bioinformatics/14.1.40
X. Guan and L. Du, "Domain identification by clustering sequences alignment," Bioinformatics, vol.14, no.9,1998, pp.783-788. https://doi.org/10.1093/bioinformatics/14.9.783
D. Gusfield, Algorithms on strings, trees, and sequences : Computer science and Computational biology, CAMBRIDGE University Press, 1997.
Data Structure and Algorithm: Tree and Suffix trees, Mcgill University ,1997.
J. Karkkainen and E. Ukkonen, "Sparse Suffix Tree," COCOON , 1996, pp.219-233.
E. Ukkonen, "On-line Construction of Suffix- Trees," Algorithmica, vol.14, 1995, pp.249-260. https://doi.org/10.1007/BF01206331
Mark Nelson, "Fast String Searching With Suffix Trees," Dr. Dobb's Journal, 1996.
M.I. Abouelhoda, S. Kurtz, and E. Ohkebusch, "Replacing suffix trees with enhances suffix arrays," Journal of Discrete Algorithms, vol. 2, no. 1, 2004, pp. 53-86. https://doi.org/10.1016/S1570-8667(03)00065-0
D.K.Kim, M.Kim, and H.Park, "Linearized suffix tree: an efficient index data structure with the capabilities of suffix trees and suffix arrays," Algorithmica, vol. 52, no. 3, 2008, pp. 350-377. https://doi.org/10.1007/s00453-007-9061-2
L.Russo, G.Navarro, and A.Oliveria, "Fully- Comoressed suffix trees," LATIN, 2008, pp. 362-373.
Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. and Lander, E. S. "Human and mouse gene structure: comparative analysis and application to exon prediction," Genome Research, vol. 10, 2000, pp. 950-958. https://doi.org/10.1101/gr.10.7.950
Sean R. Eddy, "Where did the BLOSUM62 alignment score matrix come from?," Nature Biotechnology, vol.22, 2004. pp.1035-1046. https://doi.org/10.1038/nbt0804-1035
I. Mihalek1, I. Res and O. Lichtarge, "Background frequencies for residue variability estimates: BLOSUM revisited," BMC Bioinformatics, vol.8, 2007, pp.488-498. https://doi.org/10.1186/1471-2105-8-488
Alex Bateman, "The PFAM Protein Family Database," Nucl. Acids Res. vol. 30, no. 1, 2002, pp. 276-280 https://doi.org/10.1093/nar/30.1.276
Marco Punta1,Penny C. Coggill, "The Pfam protein families database," Nucleic Acids Research, November, 2011, pp.1-12.

Journal of Korea Society of Industrial Information Systems (한국산업정보학회논문지)

Effective Biological Sequence Alignment Method using Divide Approach

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)