Gene Sequences Clustering for the Prediction of Functional Domain

Han Sang-Il;Lee Sung-Gun;Hou Bo-Kyeng;Byun Yoon-Sup;Hwang Kyu-Suk;

doi:10.5302/J.ICROS.2006.12.10.1044

Journal of Institute of Control, Robotics and Systems (제어로봇시스템학회논문지)

Volume 12 Issue 10
/
Pages.1044-1049
/
2006
/
1976-5622(pISSN)
/
2233-4335(eISSN)

Institute of Control, Robotics and Systems (제어로봇시스템학회)

DOI QR Code

Gene Sequences Clustering for the Prediction of Functional Domain

기능 도메인 예측을 위한 유전자 서열 클러스터링

한상일 (부산대학교 화학공학과) ;
이성근 (부산대학교 화학공학과) ;
허보경 (한국생명공학연구원) ;
변윤섭 (부산대학교 화학공학과) ;
황규석 (부산대학교 화학공학과)

Published : 2006.10.01

https://doi.org/10.5302/J.ICROS.2006.12.10.1044 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.

Keywords

References

D. W. Mount, 'Bioinformatics: Sequence and genome analysism,' Cold Spring Harbor Laboratory Press, New York, pp. 3-5, 2001
J. Y. Chen and J. V. Carlis, 'Genomic data modeling,' Information Systems, vol. 28, pp. 287, 2003 https://doi.org/10.1016/S0306-4379(02)00071-6
J. M. Ostell, S. J. Wheelan, and J. A. Kans, 'The NCBI data model.,' Methods Biochem. Anal,. vol. 43, pp. 19, 2001 https://doi.org/10.1002/0471223921.ch2
N. Volfovsky, B. J. Haas, and S. L. Salzberg, 'A clustering method for repeat analysis in DNA sequences,' Genome Biol., vol. 2, pp. 1-11, 2001
A. L. Deicher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg, 'Alignment of whole genomes,' Nucleic Acids Res., vol. 27(11), pp. 2369-2376, 1999 https://doi.org/10.1093/nar/27.11.2369
A. L. Delcher, A. Phillippy, J. Carlton, and S. L. Salzberg, 'Fast algorithms for large-scale genome alignment and comparisonm,' Nucleic Acids Res., vol. 30(11), pp. 2478-2483, 2002 https://doi.org/10.1093/nar/30.11.2478
A. Kalyanaraman, S. Aluru, and S. Kothari, 'Parallel EST clustering,' HICOMB, 185, 2002
S. I. Han, S. G. Lee, B. K. Hou, S. H. Park, Y. H. Kim, and K. S. Hwang, 'A gene clustering method with masking cross-matching fragments using modified suffix tree clustering method,' Korean J. Chem. Eng., vol. 22(3), pp. 345, 2005 https://doi.org/10.1007/BF02719409
O. Zamir, O. Etzioni, O. Madani and R. M. Karp, 'Fast and intuitive clustering of web documents,' In Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 287-290, 1997
S. F. Altschul, W. Gish, W. Miller, E. Myers, and D. J. Lipman, 'Basic local alignment search tool,' J. Mol. Biol., vol. 215, pp. 403-410, 1990 https://doi.org/10.1016/S0022-2836(05)80360-2
E. Ukkonen, 'On-line construction of suffix trees,' Algorithmica, vol. 14, pp. 249-260, 1995 https://doi.org/10.1007/BF01206331
D. Gusfield, 'Algorithms on strings, trees, and sequences: computer science and computational biology,' Cambridge University Press, London, pp. 116, 1997

Journal of Institute of Control, Robotics and Systems (제어로봇시스템학회논문지)

Gene Sequences Clustering for the Prediction of Functional Domain

기능 도메인 예측을 위한 유전자 서열 클러스터링

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)