• Title/Summary/Keyword: BLAST database

Search Result 129, Processing Time 0.036 seconds

Development of Local Animal BLAST Search System Using Bioinformatics Tools (생물정보시스템을 이용한 Local Animal BLAST Search System 구축)

  • Kim, Byeong-Woo;Lee, Geun-Woo;Kim, Hyo-Seon;No, Seung-Hui;Lee, Yun-Ho;Kim, Si-Dong;Jeon, Jin-Tae;Lee, Ji-Ung;Jo, Yong-Min;Jeong, Il-Jeong;Lee, Jeong-Gyu
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.99-102
    • /
    • 2006
  • The Basic Local Alignment Search Tool (BLAST) is one of the most established software in bioinformatics research and it compares a query sequence against the libraries of known sequences in order to investigate sequence similarity. Expressed Sequence Tags (ESTs) are single-pass sequence reads from mRNA (or cDNA) and represent the expression for a given cDNA library and the snapshot of genes expressed in a given tissue and/or at a given developmental stage. Therefore, ESTs can be very valuable information for functional genomics and bioinformatics researches. Although major bio database (DB) websites including NCBI are providing BLAST services and EST data, local DB and search system is demanding for better performance and security issue. Here we present animal EST DBs and local BLAST search system. The animal ESTs DB in NCBI Genbank were divided by animal species using the Perl script we developed. and we also built the new extended DB search systems fur the new data (Local Animal BLAST Search System: http://bioinfo.kohost.net), which was constructed on the high-capacity PC Cluster system fur the best performance. The new local DB contains 650,046 sequences for Bos taurus(cattle), 368,120 sequences for Sus scrofa (pig), 693,005 sequences for Gallus gallus (fowl), respectively.

  • PDF

Building of Protein 3-D Structure Database and Similarity Search System (3D 단백질 구조 데이터베이스 및 유사성 검색 시스템 구축)

  • Li, Rong-Hua;Park, Sung-Hee;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.79-82
    • /
    • 2002
  • 단백질 3차 구조 정보는 PDB에서 플랫화일 형태로 제공되고 있으며 이러한 플랫화일 각각의 엔트리들은 단백질 3차 분자 구조를 구성하는 원자들의 공간좌표정보, 서열정보, 실험정보 및 참조정보 등으로 구성된다. 이러한 정보들을 포함하고 있는 플랫파일로부터 필수적인 구조정보 및 서열정보 등의 효율적 검색을 위해서는 플랫파일을 데이터베이스로 구축함과 동시에, 구축된 데이터베이스를 위한 유사성 검색시스템 구축이 요구된다. 따라서, 이 논문에서는 Protein DataBank에서 제공하는 플랫파일을 공간객체 모델링기법에 기반한 관계형 데이터베이스로 구축하고 PSI-BLAST를 적용하여 단백질 서열 유사성 검색 시스템을 구축한다. 이렇게 함으로써 단백질 3자 구조 분자를 구성하는 원자에 대한 검색과 구조에 대한 서열 유사성 검색을 통하여 단백질 3차 구조 분류 및 구조 예측 시스템 구축에 활용할 수 있다.

  • PDF

Algorithm for Predicting Functionally Equivalent Proteins from BLAST and HMMER Searches

  • Yu, Dong Su;Lee, Dae-Hee;Kim, Seong Keun;Lee, Choong Hoon;Song, Ju Yeon;Kong, Eun Bae;Kim, Jihyun F.
    • Journal of Microbiology and Biotechnology
    • /
    • v.22 no.8
    • /
    • pp.1054-1058
    • /
    • 2012
  • In order to predict biologically significant attributes such as function from protein sequences, searching against large databases for homologous proteins is a common practice. In particular, BLAST and HMMER are widely used in a variety of biological fields. However, sequence-homologous proteins determined by BLAST and proteins having the same domains predicted by HMMER are not always functionally equivalent, even though their sequences are aligning with high similarity. Thus, accurate assignment of functionally equivalent proteins from aligned sequences remains a challenge in bioinformatics. We have developed the FEP-BH algorithm to predict functionally equivalent proteins from protein-protein pairs identified by BLAST and from protein-domain pairs predicted by HMMER. When examined against domain classes of the Pfam-A seed database, FEP-BH showed 71.53% accuracy, whereas BLAST and HMMER were 57.72% and 36.62%, respectively. We expect that the FEP-BH algorithm will be effective in predicting functionally equivalent proteins from BLAST and HMMER outputs and will also suit biologists who want to search out functionally equivalent proteins from among sequence-homologous proteins.

Web-Based Computational System for Protein-Protein Interaction Inference

  • Kim, Ki-Bong
    • Journal of Information Processing Systems
    • /
    • v.8 no.3
    • /
    • pp.459-470
    • /
    • 2012
  • Recently, high-throughput technologies such as the two-hybrid system, protein chip, Mass Spectrometry, and the phage display have furnished a lot of data on protein-protein interactions (PPIs), but the data has not been accurate so far and the quantity has also been limited. In this respect, computational techniques for the prediction and validation of PPIs have been developed. However, existing computational methods do not take into account the fact that a PPI is actually originated from the interactions of domains that each protein contains. So, in this work, the information on domain modules of individual proteins has been employed in order to find out the protein interaction relationship. The system developed here, WASPI (Web-based Assistant System for Protein-protein interaction Inference), has been implemented to provide many functional insights into the protein interactions and their domains. To achieve those objectives, several preprocessing steps have been taken. First, the domain module information of interacting proteins was extracted by taking advantage of the InterPro database, which includes protein families, domains, and functional sites. The InterProScan program was used in this preprocess. Second, the homology comparison with the GO (Gene Ontology) and COG (Clusters of Orthologous Groups) with an E-value of $10^{-5}$, $10^{-3}$ respectively, was employed to obtain the information on the function and annotation of each interacting protein of a secondary PPI database in the WASPI. The BLAST program was utilized for the homology comparison.

Identification of Viral Taxon-Specific Genes (VTSG): Application to Caliciviridae

  • Kang, Shinduck;Kim, Young-Chang
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.23.1-23.5
    • /
    • 2018
  • Virus taxonomy was initially determined by clinical experiments based on phenotype. However, with the development of sequence analysis methods, genotype-based classification was also applied. With the development of genome sequence analysis technology, there is an increasing demand for virus taxonomy to be extended from in vivo and in vitro to in silico. In this study, we verified the consistency of the current International Committee on Taxonomy of Viruses taxonomy using an in silico approach, aiming to identify the specific sequence for each virus. We applied this approach to norovirus in Caliciviridae, which causes 90% of gastroenteritis cases worldwide. First, based on the dogma "protein structure determines its function," we hypothesized that the specific sequence can be identified by the specific structure. Firstly, we extracted the coding region (CDS). Secondly, the CDS protein sequences of each genus were annotated by the conserved domain database (CDD) search. Finally, the conserved domains of each genus in Caliciviridae are classified by RPS-BLAST with CDD. The analysis result is that Caliciviridae has sequences including RNA helicase in common. In case of Norovirus, Calicivirus coat protein C terminal and viral polyprotein N-terminal appears as a specific domain in Caliciviridae. It does not include in the other genera in Caliciviridae. If this method is utilized to detect specific conserved domains, it can be used as classification keywords based on protein functional structure. After determining the specific protein domains, the specific protein domain sequences would be converted to gene sequences. This sequences would be re-used one of viral bio-marks.

Construction of BLAST Server for Mollusks (연체동물 전용 서열 블라스트 서버구축)

  • Lee, Yong-Seok;Jo, Yong-Hun;Kim, Dae-Soo;Kim, Dae-Won;Kim, Min-Young;Choi, Sang-Haeng;Yon, Jei-Oh;Byun, In-Sun;Kang, Bo-Ra;Jeong, Kye-Heon;Park, Hong-Seog
    • The Korean Journal of Malacology
    • /
    • v.20 no.2
    • /
    • pp.165-169
    • /
    • 2004
  • The BLAST server for the mollusk was constructed on the basis of the Intel Server Platform SC-5250 dual Xeon 2.8 GHz cpu and Linux operating system. After establishing the operating system, we installed NCBI (National Center for Biotechnology Information) WebBLAST package after web server configuration for cgi (common gate interface) (http://chimp.kribb.re.kr/mollusks). To build up the stand alone blast, we conducted as follows: First, we downloaded the genome information (mitochondria genome information), DNA sequences, amino acid sequences related with mollusk available at NCBI. Second, it was translated into the multifasta format that was stored as database by using the formatdb program provided by NCBI. Finally, the cgi was used for the Stand Alone Blast server. In addition, we have added the vector, Escherichia coli, and repeat sequences into the server to confirm a potential contamination. Finally, primer3 program is also installed for the users to design the primer. The stand alone BLAST gave us several advantages: (1) we can get only the data that agree with the nucleotide sequence directly related with the mollusks when we are searching BLAST; (2) it will be very convenient to confirm contamination when we made the cDNA or genomic library from mollusks; (3) Compared to the current NSBI, we can quickly get the BLAST results on the mollusks sequence information.

  • PDF

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

Gene Expression Profiling in Rice Infected with Rice Blast Fungus using SAGE

  • Kim, Sang-Gon;Kim, Sun-Tae;Kim, Sung-Kun;Kang, Kyu-Young
    • The Plant Pathology Journal
    • /
    • v.24 no.4
    • /
    • pp.384-391
    • /
    • 2008
  • Rice blast disease, caused by the pathogenic fungus Magnaporthe grisea, is a serious issue in rice (Oryza sativa L.) growing regions of the world. Transcript profiling in rice inoculated with the fungus has been investigated using the transcriptomics technology, serial analysis of gene expression (SAGE). Short sequence tags containing sufficient information which are ten base-pairs representing the unique transcripts were identified by SAGE technology. We identified a total of 910 tag sequences via the GenBank database, and the resulting genes were shown to be up-regulated in all functional categories under the fungal biotic stress. Compared to the compatible interaction, the stress and defense genes in the incompatible interaction appear to be more up-regulated. Particularly, thaumatin-like gene (TLP) was investigated in determining the gene and protein expression level utilizing Northern and Western blotting analyses, resulting in an increase in both the gene and the protein expression level which arose earlier in the incompatible interaction than in the compatible interaction.

Anlaysis of Eukaryotic Sequence Pattern using GenScan (GenScan을 이용한 진핵생물의 서열 패턴 분석)

  • Jung, Yong-Gyu;Lim, I-Suel;Cha, Byung-Heun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.113-118
    • /
    • 2011
  • Sequence homology analysis in the substances in the phenomenon of life is to create database by sorting and indexing and to demonstrate the usefulness of informatics. In this paper, Markov models are used in GenScan program to convert the pattern of complex eukaryotic protein sequences. It becomes impossible to navigate the minimum distance, complexity increases exponentially as the exact calculation. It is used scorecard in amino acid substitutions between similar amino acid substitutions to have a differential effect score, and is applied the Markov models sophisticated concealment of the transition probability model. As providing superior method to translate sequences homologous sequences in analysis using blast p, Markov models. is secreted protein structure of sequence translations.

Identification Based on Computational Analysis of rpoB Sequence of Bacillus anthracis and Closely Related Species (Bacillus anthracis와 그 유연종의 rpoB 유전자 컴퓨터 분석을 통한 동정)

  • Kim, Kyu-Kwang;Kim, Han-Bok
    • Korean Journal of Microbiology
    • /
    • v.44 no.4
    • /
    • pp.333-338
    • /
    • 2008
  • Computational analysis of partial rpoB gene sequence (777 bp) was done in this study to identify B. anthracis and its closely related species B. cereus and B. thuringiensis. Sequence data including 17 B. anthracis strains, 9 B. cereus strains, and 7 B. thuringiensis strains were obtained by searching databases. Those sequences were aligned and used for other computational analysis. B. anthracis strains were identificated by in silico restriction enzyme digestion. B. cereus and B. thuringiensis were not segregated by this method. Those sequencing and BLAST search were required to distinguish the two. In actual identification tests, B. anthracis strains could be identified by PCR-RFLP, and B. cereus and B. thuringiensis strains were distinguished by BLAST search with reliable e-value. In this study fast and accurate method for identifying three Bacillus species, and flow chart of identification were developed.