• 제목/요약/키워드: BLAST database

검색결과 129건 처리시간 0.028초

고성능 BLAST구현을 위한 E-Cluster 기반 데이터 분할 및 질의 라우팅 기법 (A Physical Data Design and Query Routing Technique of High Performance BLAST on E-Cluster)

  • 김태경;조완섭
    • 한국컴퓨터정보학회논문지
    • /
    • 제14권2호
    • /
    • pp.139-147
    • /
    • 2009
  • BLAST는 생명정보학 분야에서 가장 많이 사용하는 도구이다. 이 도구는 입력서열을 기존 서열 데이터베이스와 신속히 비교하고 그 기능을 예측한다. 생물학자는 BLAST를 이용하여 실험의 범위, 시간과 비용을 줄일 수 있다. 하지만, 서열 데이터 양이 급격히 증가함에 따라 그 처리 시간도 같이 증가하여 성능개선 방안이 필요하다. 본 논문에서는 대용량 BLAST처리 성능 향상을 위한 PC 기반의 클러스터 인프라 (E-Cluster)를 제시하고 이 기반에서 데이터베이스 분할기법 (Logical Partitioning)과 질의 라우팅 기법(Intra-Query)을 제안한다. 제안된 시스템을 평가하기 위해 다양한 길이의 서열들과 NR 데이터베이스와 비교하여 응답시간(Response Time), 성능 향상(Speedup), 효율(Efficiency) 관점에서 평가한다. 본 실험을 통해 기존 SMP, Cluster, 그리드 기반의 BLAST 시스템보다 성능, 효율이 뛰어남을 확인하였고, 특히 제안한 시스템의 최대 효율은 600%로 매우 높았다.

웹기반 전복류 (Haliotis) SNP 데이터베이스 구축 (Construction of web-based Database for Haliotis SNP)

  • 정지은;이재봉;강세원;백문기;한연수;최태진;강정하;이용석
    • 한국패류학회지
    • /
    • 제26권2호
    • /
    • pp.185-188
    • /
    • 2010
  • - 본 웹 데이터베이스 서버의 구축을 통해 Haliotis 속간의 염기서열과 일치하는 서열을 자체 BLAST 를 통해 매우 빠른 속도로 추출 할 수 있었다. - Repeat elements, E. coli, vector 등의 서열들과 동시에 BLAST를 시행할 수 있어 cDNA 또는 genomic DNA 라이브러리를 구축할 때 라이브러리의 오염, 삽입체의 길이 등의 상태를 쉽게 확인 할 수 있었다. - Clustering Res. 인터페이스를 통해 SNPs 발굴이 용이하게 되었으며 자체 구축된 primer3 를 통해 실험용 시발체를 제작할 수 있게 되었다 (Evans et al. 2001). - 이러한 SNP 데이터베이스 구축은 SNP 발굴 작업을 극대화 시킬 수 있어 차후 수행될 Haliotis 관련 분자육종 관련연구에 많은 도움이 될 것으로 기대된다.

Construction of PANM Database (Protostome DB) for rapid annotation of NGS data in Mollusks

  • Kang, Se Won;Park, So Young;Patnaik, Bharat Bhusan;Hwang, Hee Ju;Kim, Changmu;Kim, Soonok;Lee, Jun Sang;Han, Yeon Soo;Lee, Yong Seok
    • 한국패류학회지
    • /
    • 제31권3호
    • /
    • pp.243-247
    • /
    • 2015
  • A stand-alone BLAST server is available that provides a convenient and amenable platform for the analysis of molluscan sequence information especially the EST sequences generated by traditional sequencing methods. However, it is found that the server has limitations in the annotation of molluscan sequences generated using next-generation sequencing (NGS) platforms due to inconsistencies in molluscan sequence available at NCBI. We constructed a web-based interface for a new stand-alone BLAST, called PANM-DB (Protostome DB) for the analysis of molluscan NGS data. The PANM-DB includes the amino acid sequences from the protostome groups-Arthropoda, Nematoda, and Mollusca downloaded from GenBank with the NCBI taxonomy Browser. The sequences were translated into multi-FASTA format and stored in the database by using the formatdb program at NCBI. PANM-DB contains 6% of NCBInr database sequences (as of 24-06-2015), and for an input of 10,000 RNA-seq sequences the processing speed was 15 times faster by using PANM-DB when compared with NCBInr DB. It was also noted that PANM-DB show two times more significant hits with diverse annotation profiles as compared with Mollusks DB. Hence, the construction of PANM-DB is a significant step in the annotation of molluscan sequence information obtained from NGS platforms. The PANM-DB is freely downloadable from the web-based interface (Malacological Society of Korea, http://malacol.or/kr/blast) as compressed file system and can run on any compatible operating system.

Construction of a Genetic Information Database for Analysis of Oncolytic Viruses

  • Cho, Myeongji;Son, Hyeon Seok;Kim, Hayeon
    • International journal of advanced smart convergence
    • /
    • 제9권1호
    • /
    • pp.90-97
    • /
    • 2020
  • Oncolytic viruses are characterized by their ability to selectively kill cancer cells, and thus they have potential for application as novel anticancer agents. Despite an increase in the number of studies on methodologies involving oncolytic viruses, bioinformatic studies generating useful data are lacking. We constructed a database for oncolytic virus research (the oncolytic virus database, OVDB) by integrating scattered genetic information on oncolytic viruses and proposed a systematic means of using the biological data in the database. Our database provides data on 14 oncolytic viral strains and other types of viruses for comparative analysis. We constructed the OVDB using the basic local alignment search tool, and therefore can provides genetic information on highly homologous oncolytic viruses. This study contributes to facilitate systematic bioinformatics research, providing valuable data for development of oncolytic virus-based anticancer therapies.

서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구 (A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST)

  • 한상일;이성근;김경훈;이주영;김영한;황규석
    • 제어로봇시스템학회논문지
    • /
    • 제11권10호
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

New Approach to the Analysis of Palindromic Structure in Genome Sequences

  • Kim, Seok-Won;Lee, Yong-Seok;Choi, Sang-Haeng;Chae, Sung-Hwa;Kim, Dae-Won;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • 제4권4호
    • /
    • pp.167-169
    • /
    • 2006
  • PABAP (Palindrome Analysis by BLAST Program) is an analysis system that identifies palindromic sequences from a large genome sequence up to several megabases long. It uses NCBI BLAST as a searching engine, and data processing such as alignment filtration and detection of inverted repeats which satisfy user-defined parameters is performed by manipulating data after populating into a MySQL database. PABAP outperforms publicly available palindrome search program in that it can detect large palindrome with internal spacer at a faster speed from bacterial genomes. It is a standalone application and is freely available for noncommercial users.

Characterization of Pseudomonas syringae pv. syringae, Causal Agent of Citrus Blast of Mandarin in Montenegro

  • Ivanovic, Zarko;Perovic, Tatjana;Popovic, Tatjana;Blagojevic, Jovana;Trkulja, Nenad;Hrncic, Snjezana
    • The Plant Pathology Journal
    • /
    • 제33권1호
    • /
    • pp.21-33
    • /
    • 2017
  • Citrus blast caused by bacterium Pseudomonas syringae is a very important disease of citrus occuring in many areas of the world, but with few data about genetic structure of the pathogen involved. Considering the above fact, this study reports genetic characterization of 43 P. syringae isolates obtained from plant tissue displaying citrus blast symptoms on mandarin (Citrus reticulata) in Montenegro, using multilocus sequence analysis of gyrB, rpoD, and gap1 gene sequences. Gene sequences from a collection of 54 reference pathotype strains of P. syringae from the Plant Associated and Environmental Microbes Database (PAMDB) was used to establish a genetic relationship with our isolates obtained from mandarin. Phylogenetic analyses of gyrB, rpoD, and gap1 gene sequences showed that P. syringae pv. syringae causes citrus blast in mandarin in Montenegro, and belongs to genomospecies 1. Genetic homogeneity of isolates suggested that the Montenegrian population might be clonal which indicates a possible common source of infection. These findings may assist in further epidemiological studies of this pathogen and for determining mandarin breeding strategies for P. syringae control.

Global Approaches to Identify Genes Involved during Infection Structure Formation in Rice Blast Fungus, Magnaporthe grisea

  • Park, Woo-Bong
    • The Plant Pathology Journal
    • /
    • 제19권1호
    • /
    • pp.34-42
    • /
    • 2003
  • The ascomycete Magnaporthe grisea is a pathogen of rice blast and is known to form specialized infection structures called appressoria for successful infection into host cells. To understand the molecular mechanism underlying infection process, appressorium-related genes were identified through global approaches including EST sequencing, differential hybridization, and sup-pression subtractive hybridization. EST database was generated on >2,000 cDNA clones randomly selected from appressorium stage cDNA library. Large number of ESTs showed homology to known proteins possibly involved in infection-related cellular development (attachment, germination, appressorium formation, and colonization) of rice blast fungus. The 1051 ESTs showing significant homology to known genes were assigned to 11 functional categories. Differential hybridization and suppression subtractive hybridization were applied to identify genes showing an appressorium stage specific expression pattern. A number of genes were selected as up-regulated during appressorium formation compared with the vegetative growing stage. Clones from various cDNA libraries constructed in different developmental stages were arrayed on slide glass for further expression profiling study. functional characterization of genes identified from these global approaches may lead to a better understand-ing of the infection process of this devastating plant disease, and the development of novel ways to protect host plant.