DOI QR코드

DOI QR Code

Effective Biological Sequence Alignment Method using Divide Approach

  • Choi, Hae-Won (Department of Computer Engineering, Kyungwoon University) ;
  • Kim, Sang-Jin (Department of Computer Engineering, Kyungwoon University) ;
  • Pi, Su-Young (Department of Computer Engineering, Catholic University of Daegu)
  • Received : 2012.10.05
  • Accepted : 2012.11.14
  • Published : 2012.12.31

Abstract

This paper presents a new sequence alignment method using the divide approach, which solves the problem by decomposing sequence alignment into several sub-alignments with respect to exact matching subsequences. Exact matching subsequences in the proposed method are bounded on the generalized suffix tree of two sequences, such as protein domain length more than 7 and less than 7. Experiment results show that protein sequence pairs chosen in PFAM database can be aligned using this method. In addition, this method reduces the time about 15% and space of the conventional dynamic programming approach. And the sequences were classified with 94% of accuracy.

Keywords

References

  1. David W., Bioinformatics, sequences and Genome Analysis, MOUNT Press, 2001.
  2. Younshin Oh, Dinh Truong Nguyen, "Identification of 1,531 cSNPs from Full-length Enriched cDNA Libraries of the Korean Native Pig Using in Silico Analysis," Genomics & Informatics, vol. 7, no. 2, 2009, pp. 65-84. https://doi.org/10.5808/GI.2009.7.2.065
  3. Audry P. G., Alan M.M., "Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi," PLOS Biology, vol. 2, no. 12, 2004, pp. 398-405. https://doi.org/10.1371/journal.pbio.0020398
  4. Josue Samayoa1, Fitnat H. Yildiz and Kevin Karplus, "Identification of prokaryotic small proteins using a comparative genomic approach," Bioinformatics, vol.27, no.13, 2011, pp.1765-1771. https://doi.org/10.1093/bioinformatics/btr275
  5. Chan Park, Ji-Seong Jeong, "Design and Implementation of Bio-Medical Data Measurement System through the Stereo Microscope," Korea Contents Association KISTI-KOCON ICCC2009, November, vol.7, no.2, 2009, pp.357-360.
  6. Young-Ohk Song, Sung-young Kim and Duk- Jin Chang, "Design of the System and Algorithm for the Pattern Analysis of the Bio-Data," Korea Contents Association, November, vol.10, no.8, 2008, pp.104-110. https://doi.org/10.5392/JKCA.2010.10.8.104
  7. 이성열, "A Modified Heuristic Algorithm for the Mixed Model Assembly Line Balancing ," 산업정보학회논문지, vol.15, no.3, 2010, pp.51-57.
  8. 유기동, "문서 자동요약 기술을 적용한 클라우드 스토리지 기반 지능적 아카이빙 시스템," 산업정보학회논문지, vol.17, no.3, 2012, pp.59-68. https://doi.org/10.9723/jksiis.2012.17.3.059
  9. P. Agarwal, "Comparative accuracy of methods for protein sequences similarity search," Bioinformatics, vol.14, no.1, 1998, pp.40-47. https://doi.org/10.1093/bioinformatics/14.1.40
  10. X. Guan and L. Du, "Domain identification by clustering sequences alignment," Bioinformatics, vol.14, no.9,1998, pp.783-788. https://doi.org/10.1093/bioinformatics/14.9.783
  11. D. Gusfield, Algorithms on strings, trees, and sequences : Computer science and Computational biology, CAMBRIDGE University Press, 1997.
  12. Data Structure and Algorithm: Tree and Suffix trees, Mcgill University ,1997.
  13. J. Karkkainen and E. Ukkonen, "Sparse Suffix Tree," COCOON , 1996, pp.219-233.
  14. E. Ukkonen, "On-line Construction of Suffix- Trees," Algorithmica, vol.14, 1995, pp.249-260. https://doi.org/10.1007/BF01206331
  15. Mark Nelson, "Fast String Searching With Suffix Trees," Dr. Dobb's Journal, 1996.
  16. M.I. Abouelhoda, S. Kurtz, and E. Ohkebusch, "Replacing suffix trees with enhances suffix arrays," Journal of Discrete Algorithms, vol. 2, no. 1, 2004, pp. 53-86. https://doi.org/10.1016/S1570-8667(03)00065-0
  17. D.K.Kim, M.Kim, and H.Park, "Linearized suffix tree: an efficient index data structure with the capabilities of suffix trees and suffix arrays," Algorithmica, vol. 52, no. 3, 2008, pp. 350-377. https://doi.org/10.1007/s00453-007-9061-2
  18. L.Russo, G.Navarro, and A.Oliveria, "Fully- Comoressed suffix trees," LATIN, 2008, pp. 362-373.
  19. Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. and Lander, E. S. "Human and mouse gene structure: comparative analysis and application to exon prediction," Genome Research, vol. 10, 2000, pp. 950-958. https://doi.org/10.1101/gr.10.7.950
  20. Sean R. Eddy, "Where did the BLOSUM62 alignment score matrix come from?," Nature Biotechnology, vol.22, 2004. pp.1035-1046. https://doi.org/10.1038/nbt0804-1035
  21. I. Mihalek1, I. Res and O. Lichtarge, "Background frequencies for residue variability estimates: BLOSUM revisited," BMC Bioinformatics, vol.8, 2007, pp.488-498. https://doi.org/10.1186/1471-2105-8-488
  22. Alex Bateman, "The PFAM Protein Family Database," Nucl. Acids Res. vol. 30, no. 1, 2002, pp. 276-280 https://doi.org/10.1093/nar/30.1.276
  23. Marco Punta1,Penny C. Coggill, "The Pfam protein families database," Nucleic Acids Research, November, 2011, pp.1-12.