Browse > Article

Comparison Architecture for Large Number of Genomic Sequences  

Choi, Hae-won (Department of Computer Engineering, Kyungwoon University)
Ryoo, Myung-Chun (Department of Computer Engineering, Kyungwoon University)
Park, Joon-Ho (Department of Computer Engineering, Kyungwoon University)
Abstract
Generally, a suffix tree is an efficient data structure since it reveals the detailed internal structures of given sequences within linear time. However, it is difficult to implement a suffix tree for a large number of sequences because of memory size constraints. Therefore, in order to compare multi-mega base genomic sequence sets using suffix trees, there is a need to re-construct the suffix tree algorithms. We introduce a new method for constructing a suffix tree on secondary storage of a large number of sequences. Our algorithm divides three files, in a designated sequence, into parts, storing references to the locations of edges in hash tables. To execute experiments, we used 1,300,000 sequences around 300Mbyte in EST to generate a suffix tree on disk.
Keywords
suffix tree; large data sets; sequence analysis; genomic sequences;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hardison, R. C., Oeltjen, J. and Miller, W., "Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome", Genome Research, Vol. 7, pp. 959-966, 1997.
2 Dunham, I., Shimuzu, N., Roe, B. A., Chissoe, S. et al., "The DNA sequence of human chromosome", Nature, Vol. 40, pp. 489-495, 1999.
3 Younshin Oh and Dinh Truong Nguyen, "Identification of 1,531 cSNPs from Full-length Enriched cDNA Libraries of the Korean Native Pig Using in Silico Analysis", Genomics & Informatics, Vol. 7, No. 2, pp. 65-84, 2009.
4 Audry, P. G. and Alan, M. M., "Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi", PLOS Biology, Vol. 2, No. 12, pp. 398-405, 2004.
5 Wookey Lee, Wonhee Lee and Hwaki Lee, Web Clustering Simulation Using Genetic Algorithm, Journal of Information Technology and Architecture, Vol. 7. No. 1, pp. 111-121, 2010.
6 Bokyoung Kang, Dongsoo Kim and Suk-Ho Kang, Development of an Imputation-based Real-time Process Monitoring Method for Treating Unobserved Data, Journal of Information Technology and Architecture, Vol. 7. No. 2, pp. 161-170, 2010.
7 Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J., "Gapped BLAST and PSIBLAST: a new generation of protein database search programs", Nucleic Acids Research, Vol. 25, pp. 3389-3402, 1997.
8 Green, P., Crossmatch. http://bozeman.mbt.washington. edu/phrap.docs/swat.html, 2009.
9 Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R. and Miller, W., ipMaker, "A web server for aligning two genomic DNA sequences", Genome Research, Vol. 10, pp. 577-586, 2000.
10 Delcher, A. L., Kasif, S., Fleischmann, R. D., Peterson, J., White, O. and Salzberg, S. L., "Alignment of whole genomes", Nucleic Acids Research, Vol. 27, pp. 2369- 2376, 1999.
11 Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. and Lander, E. S., "Human and mouse gene structure: comparative analysis and application to exon prediction", Genome Research, Vol. 10, pp. 950-958, 2000.
12 Juha Karkkainen and Esko Ukkonen, "Sparse Suffix Tree", COCOON, Vol. 1, pp. 219-233, 1996.
13 Kurtz, S., "Reducing the Space Requirement of Suffix Trees", Software Practice and Experience, Vol. 29, pp. 1149-1171, 1999.
14 Abouelhoda, M. I., Kurtz, S. and Ohkebusch, E., "Replacing suffix trees with enhances suffix arrays", Journal of Discrete Algorithms, Vol. 2, No. 1, pp. 53- 86, 2004.
15 Kim, D. K., Kim, M. and Park, H., "Linearized suffix tree: an efficient index data structure with the capabilities of suffix trees and suffix arrays", Algorithmica, Vol. 52, No. 3, pp. 350-377, 2008.
16 Russo, L., Navarro, G. and Oliveria, A., "Fully-Comoressed suffix trees", LATIN, pp. 362-373, 2008.
17 Dan Gusfield, "Algorithms on strings, trees, and sequences: Computer science and Computational biology", CAMBRIDGE University Press, 1997.
18 Mark Nelson, Fast String Searching With Suffix Trees, Dr. Dobb's Journal, August, 1996.
19 Ukkonen, E., "On-line Construction of Suffix-Trees", Algorithmica, pp. 249-260, 1995.