A Pattern Summary System Using BLAST for Sequence Analysis

  • Choi, Han-Suk (Department of Multimedia Engineering, Mokpo National University) ;
  • Kim, Dong-Wook (Department of Multimedia Engineering, Mokpo National University) ;
  • Ryu, Tae-W. (Department of Computer Science, California State University-Fullerton)
  • 발행 : 2006.12.31

초록

Pattern finding is one of the important tasks in a protein or DNA sequence analysis. Alignment is the widely used technique for finding patterns in sequence analysis. BLAST (Basic Local Alignment Search Tool) is one of the most popularly used tools in bio-informatics to explore available DNA or protein sequence databases. BLAST may generate a huge output for a large sequence data that contains various sequence patterns. However, BLAST does not provide a tool to summarize and analyze the patterns or matched alignments in the BLAST output file. BLAST lacks of general and robust parsing tools to extract the essential information out from its output. This paper presents a pattern summary system which is a powerful and comprehensive tool for discovering pattern structures in huge amount of sequence data in the BLAST. The pattern summary system can identify clusters of patterns, extract the cluster pattern sequences from the subject database of BLAST, and display the clusters graphically to show the distribution of clusters in the subject database.

키워드

참고문헌

  1. Ben-Dor, A., Shamir, R., and Yakhini, Z. (1999). Clustering gene expression patterns. Journal of Comptu. Biol. 6, 281-297 https://doi.org/10.1089/106652799318274
  2. Feng, D. F. and Doolittle, R. F. (1996). Progressive alignment of amino acid sequences ad construction of phylogenetic trees from them. Methods Enzymol. 266, 368-382 https://doi.org/10.1016/S0076-6879(96)66023-6
  3. Hughey, R., Krogh, A., Barrett, C., and Grate, L. (1996). SAM: Sequence alignment and modeling software. University of California. Baskin center for Computer Engineering and Information Sciences (http://www.cse.ucsc.edu/research/compbi/papers/sam_doc/sam_doc.html)
  4. Sequence Analysis. (2002). The Kimmel Cancer Center NCI-designated. (http://www.kcc.tju.edu/Science/whatls/what_is_sequence_analysis.htm)
  5. Holguin, G., and Patten, C. (2000). Finding Patterns in Biological Sequences
  6. Ostell, J. M. (1996). The NCBI software tools. In Nucleic Acid and Protein Analysis: A Practical Approach, M. Bishop and C. Rawlings, Eds. (IRL press, Osford), p.31-43
  7. Subramaniam, S., and Pevzner, P. (2002). Heuristic Alignment Program for Database Search. (http://genome.ucsd.edu/classes/be202/html/part5.html)
  8. Zhang, J. and Madden, T. L. (1997). Power BLAST: A new network BLAST application for interactive or automated sequences analysis and annotatuon. Genome Res. 7, 649-656 https://doi.org/10.1101/gr.7.6.649