• Title/Summary/Keyword: BLAST database

Search Result 129, Processing Time 0.03 seconds

A Physical Data Design and Query Routing Technique of High Performance BLAST on E-Cluster (고성능 BLAST구현을 위한 E-Cluster 기반 데이터 분할 및 질의 라우팅 기법)

  • Kim, Tae-Kyung;Cho, Wan-Sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.2
    • /
    • pp.139-147
    • /
    • 2009
  • BLAST (Basic Local Alignment Search Tool) is a best well-known tool in a bioinformatics area. BLAST quickly compares input sequences with annotated huge sequence databases and predicts their functions. It helps biologists to make it easy to annotate newly found sequences with reduced experimental time, scope, and cost. However, as the amount of sequences is increasing remarkably with the advance of sequencing machines, performance of BLAST has been a critical issue and tried to solve it with several alternatives. In this paper, we propose a new PC-Based Cluster system (E-Cluster), a new physical data design methodology (logical partitioning technique) and a query routing technique (intra-query routing). To verify our system, we measure response time, speedup, and efficiency for various sizes of sequences in NR (Non-Redundancy) database. Experimental result shows that proposed system has better speedup and efficiency (maximum 600%) than those o( conventional approaches such as SMF machines, clusters, and grids.

Construction of web-based Database for Haliotis SNP (웹기반 전복류 (Haliotis) SNP 데이터베이스 구축)

  • Jeong, Ji-Eun;Lee, Jae-Bong;Kang, Se-Won;Baek, Moon-Ki;Han, Yeon-Soo;Choi, Tae-Jin;Kang, Jung-Ha;Lee, Yong-Seok
    • The Korean Journal of Malacology
    • /
    • v.26 no.2
    • /
    • pp.185-188
    • /
    • 2010
  • The Web-based the genus Haliotis SNP database was constructed on the basis of Intel Server Platform ZSS130 dual Xeon 3.2 GHz cpu and Linux-based (Cent OS) operating system. Haliotis related sequences (2,830 nucleotide sequences, 9,102 EST sequences) were downloaded through NCBI taxonomy browser. In order to eliminate vector sequences, we conducted vector masking step using cross match software with vector sequence database. In addition, poly-A tails were removed using Trimmest software from EMBOSS package. The processed sequences were clustered and assembled by TGICL package (TIGR tools) equipped with CAP3 software. A web-based interface (Haliotis SNP Database, http://www.haliotis.or.kr) was developed to enable optimal use of the clustered assemblies. The Clustering Res. menu shows the contig sequences from the clustering, the alignment results and sequences from each cluster. And also we can compare any sequences with Haliotis related sequences in BLAST menu. The search menu is equipped with its own search engine so that it is possible to search all of the information in the database using the name of a gene, accession number and/or species name. Taken together, the Web-based SNP database for Haliotis will be valuable to develop SNPs of Haliotis in the future.

Construction of PANM Database (Protostome DB) for rapid annotation of NGS data in Mollusks

  • Kang, Se Won;Park, So Young;Patnaik, Bharat Bhusan;Hwang, Hee Ju;Kim, Changmu;Kim, Soonok;Lee, Jun Sang;Han, Yeon Soo;Lee, Yong Seok
    • The Korean Journal of Malacology
    • /
    • v.31 no.3
    • /
    • pp.243-247
    • /
    • 2015
  • A stand-alone BLAST server is available that provides a convenient and amenable platform for the analysis of molluscan sequence information especially the EST sequences generated by traditional sequencing methods. However, it is found that the server has limitations in the annotation of molluscan sequences generated using next-generation sequencing (NGS) platforms due to inconsistencies in molluscan sequence available at NCBI. We constructed a web-based interface for a new stand-alone BLAST, called PANM-DB (Protostome DB) for the analysis of molluscan NGS data. The PANM-DB includes the amino acid sequences from the protostome groups-Arthropoda, Nematoda, and Mollusca downloaded from GenBank with the NCBI taxonomy Browser. The sequences were translated into multi-FASTA format and stored in the database by using the formatdb program at NCBI. PANM-DB contains 6% of NCBInr database sequences (as of 24-06-2015), and for an input of 10,000 RNA-seq sequences the processing speed was 15 times faster by using PANM-DB when compared with NCBInr DB. It was also noted that PANM-DB show two times more significant hits with diverse annotation profiles as compared with Mollusks DB. Hence, the construction of PANM-DB is a significant step in the annotation of molluscan sequence information obtained from NGS platforms. The PANM-DB is freely downloadable from the web-based interface (Malacological Society of Korea, http://malacol.or/kr/blast) as compressed file system and can run on any compatible operating system.

Construction of a Genetic Information Database for Analysis of Oncolytic Viruses

  • Cho, Myeongji;Son, Hyeon Seok;Kim, Hayeon
    • International journal of advanced smart convergence
    • /
    • v.9 no.1
    • /
    • pp.90-97
    • /
    • 2020
  • Oncolytic viruses are characterized by their ability to selectively kill cancer cells, and thus they have potential for application as novel anticancer agents. Despite an increase in the number of studies on methodologies involving oncolytic viruses, bioinformatic studies generating useful data are lacking. We constructed a database for oncolytic virus research (the oncolytic virus database, OVDB) by integrating scattered genetic information on oncolytic viruses and proposed a systematic means of using the biological data in the database. Our database provides data on 14 oncolytic viral strains and other types of viruses for comparative analysis. We constructed the OVDB using the basic local alignment search tool, and therefore can provides genetic information on highly homologous oncolytic viruses. This study contributes to facilitate systematic bioinformatics research, providing valuable data for development of oncolytic virus-based anticancer therapies.

A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST (서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구)

  • Han, Sang-Il;Lee, Sung-Gun;Kim, Kyung-Hoon;Lee, Ju-Yeong;Kim, Young-Han;Hwang, Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.10
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

New Approach to the Analysis of Palindromic Structure in Genome Sequences

  • Kim, Seok-Won;Lee, Yong-Seok;Choi, Sang-Haeng;Chae, Sung-Hwa;Kim, Dae-Won;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • v.4 no.4
    • /
    • pp.167-169
    • /
    • 2006
  • PABAP (Palindrome Analysis by BLAST Program) is an analysis system that identifies palindromic sequences from a large genome sequence up to several megabases long. It uses NCBI BLAST as a searching engine, and data processing such as alignment filtration and detection of inverted repeats which satisfy user-defined parameters is performed by manipulating data after populating into a MySQL database. PABAP outperforms publicly available palindrome search program in that it can detect large palindrome with internal spacer at a faster speed from bacterial genomes. It is a standalone application and is freely available for noncommercial users.

Characterization of Pseudomonas syringae pv. syringae, Causal Agent of Citrus Blast of Mandarin in Montenegro

  • Ivanovic, Zarko;Perovic, Tatjana;Popovic, Tatjana;Blagojevic, Jovana;Trkulja, Nenad;Hrncic, Snjezana
    • The Plant Pathology Journal
    • /
    • v.33 no.1
    • /
    • pp.21-33
    • /
    • 2017
  • Citrus blast caused by bacterium Pseudomonas syringae is a very important disease of citrus occuring in many areas of the world, but with few data about genetic structure of the pathogen involved. Considering the above fact, this study reports genetic characterization of 43 P. syringae isolates obtained from plant tissue displaying citrus blast symptoms on mandarin (Citrus reticulata) in Montenegro, using multilocus sequence analysis of gyrB, rpoD, and gap1 gene sequences. Gene sequences from a collection of 54 reference pathotype strains of P. syringae from the Plant Associated and Environmental Microbes Database (PAMDB) was used to establish a genetic relationship with our isolates obtained from mandarin. Phylogenetic analyses of gyrB, rpoD, and gap1 gene sequences showed that P. syringae pv. syringae causes citrus blast in mandarin in Montenegro, and belongs to genomospecies 1. Genetic homogeneity of isolates suggested that the Montenegrian population might be clonal which indicates a possible common source of infection. These findings may assist in further epidemiological studies of this pathogen and for determining mandarin breeding strategies for P. syringae control.

Global Approaches to Identify Genes Involved during Infection Structure Formation in Rice Blast Fungus, Magnaporthe grisea

  • Park, Woo-Bong
    • The Plant Pathology Journal
    • /
    • v.19 no.1
    • /
    • pp.34-42
    • /
    • 2003
  • The ascomycete Magnaporthe grisea is a pathogen of rice blast and is known to form specialized infection structures called appressoria for successful infection into host cells. To understand the molecular mechanism underlying infection process, appressorium-related genes were identified through global approaches including EST sequencing, differential hybridization, and sup-pression subtractive hybridization. EST database was generated on >2,000 cDNA clones randomly selected from appressorium stage cDNA library. Large number of ESTs showed homology to known proteins possibly involved in infection-related cellular development (attachment, germination, appressorium formation, and colonization) of rice blast fungus. The 1051 ESTs showing significant homology to known genes were assigned to 11 functional categories. Differential hybridization and suppression subtractive hybridization were applied to identify genes showing an appressorium stage specific expression pattern. A number of genes were selected as up-regulated during appressorium formation compared with the vegetative growing stage. Clones from various cDNA libraries constructed in different developmental stages were arrayed on slide glass for further expression profiling study. functional characterization of genes identified from these global approaches may lead to a better understand-ing of the infection process of this devastating plant disease, and the development of novel ways to protect host plant.