• 제목/요약/키워드: Protein sequence search

검색결과 114건 처리시간 0.02초

단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측 (Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences)

  • 지상문
    • 한국정보통신학회논문지
    • /
    • 제20권9호
    • /
    • pp.1816-1821
    • /
    • 2016
  • 단백질은 대부분의 생물학적 과정에서 중대한 역할을 수행하고 있으므로, 단백질 진화, 구조와 기능을 알아내기 위하여 많은 연구가 수행되고 있는데, 단백질의 이차 구조는 이러한 연구의 중요한 기본적 정보이다. 본 연구는 대규모 단백질 구조 자료로부터 단백질 이차 구조 정보를 효과적으로 추출하여 미지의 단백질 서열이 가지는 이차 구조를 예측하려 한다. 질의 서열과 상동관계에 있는 단백질 구조자료내의 서열들을 광범위하게 찾아내기 위하여, 탐색에 사용하는 프로파일의 구성에 질의 서열과 유사한 서열들을 사용하고 갭을 허용하여 반복적인 탐색이 가능한 PSI-BLAST를 사용하였다. 상동 단백질들의 이차구조는 질의 서열과의 상동 관계의 강도에 따라 가중되어 이차 구조 예측에 기여되었다. 이차 구조를 각각 세 개와 여덟 개로 분류하는 예측 실험에서 상동 서열들과 신경망을 동시에 사용하여 93.28%와 88.79%의 정확도를 얻어서 기존 방법보다 성능이 향상되었다.

Molecular Identification and Expression of Myosin Light Chain in Shortspine Spurdog (Squalus mitsukurii)

  • Kim, Soo Cheol;Sumi, Kanij Rukshana;Sharker, Md Rajib;Kho, Kang Hee
    • 한국해양생명과학회지
    • /
    • 제3권1호
    • /
    • pp.1-8
    • /
    • 2018
  • Myosin is considered as the vital motor protein in vertebrates and invertebrates. Our present study was conducted to decipher the occurrence of myosin in dog fish (Squalus mitsukurii). We isolated one clone containing 979 bp cDNA sequence, which consisted of a complete coding sequence of 453 bp and a deduced amino acid sequence of 150 amino acids from the open reading frame with molecular weight, isoelectric point and aliphatic index are 16.72 Kda, 4.49 and 78.00, respectively. It contained 428 bp long 3' UTR with single potential polyadenylation signals (AATAAA). The predicted EF CA2+ binding domains were identified in residue 6-41, 83-118 and 133-150. A BLAST search indicates this protein exhibits a strong similarity to whale shark (Rhincodon typus) MLC3 (91% identical) and also house mouse (Mus musculus) MLC isoform 3f (81% identical). Phylogenetic analysis revealed that this protein is a MLC 3 isoform like protein. This protein also demonstrates highly conserved region with other myosin proteins. Homology modeling of S. mitsukuri was performed using crystal structure of Gallus gallus skeletal muscle myosin II based on high similarity. Reverse transcription-polymerase chain reaction (PCR), quantitative PCR results exhibits dogfish myosin protein is highly expressed in muscle tissue.

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

  • You, Youngki;Jang, Inhwan;Lee, Kyungro;Kim, Heonjoo;Lee, Kwanhee
    • Interdisciplinary Bio Central
    • /
    • 제6권4호
    • /
    • pp.3.1-3.10
    • /
    • 2014
  • Amino acid substitution matrices are essential tools for protein sequence analysis, homology sequence search in protein databases and multiple sequence alignment. The PAM matrix was the first widely used amino acid substitution matrix. The BLOSUM series then succeeded the PAM matrix. Most substitution matrixes were developed by using the statistical frequency of substitution between each amino acid at blocks representing groups of protein families or related proteins. However, substitution of amino acids is based on the similarity of physiochemical properties of each amino acid. In this study, a new approach was used to obtain major physiochemical properties in multiple sequence alignment. Frequency of amino acid substitution in multiple sequence alignment database and selected attributes of amino acids in physiochemical properties database were merged. This merged data showed the major physiochemical properties through principle components analysis. Using factor analysis, these four principle components were interpreted as flexibility of electronic movement, polarity, negative charge and structural flexibility. Applying these four components, BAPS was constructed and validated for accuracy. When comparing receiver operated characteristic ($ROC_{50}$) values, BAPS scored slightly lower than BLOSUM and PAM. However, when evaluating for accuracy by comparing results from multiple sequence alignment with the structural alignment results of two test data sets with known three-dimensional structure in the homologous structure alignment database, the result of the test for BAPS was comparatively equivalent or better than results for prior matrices including PAM, Gonnet, Identity and Genetic code matrix.

A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data

  • Lee, Yong-Seok;Oh, Jeong-Su
    • Genomics & Informatics
    • /
    • 제6권3호
    • /
    • pp.157-159
    • /
    • 2008
  • Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.

Human Proteome Data Analysis Protocol Obtained via the Bacterial Proteome Analysis

  • Kwon, Kyung-Hoon;Park, Gun-Wook;Kim, Jin-Young;Lee, Jeong-Hwa;Kim, Seung-Il;Yoo, Jong-Shin
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.91-95
    • /
    • 2005
  • In the multidimensional protein identification technology of high-throughput proteomics, we use one-dimensional gel electrophoresis and after the separation by two-dimensional liquid chromatography, the sample is analyzed by tandem mass spectrometry. In this study, we have analyzed the Pseudomonas Putida KT2440 protein. From the protein identification, the protein database was combined with its reversed sequence database. From the peptide selection whose error rate is less than 1%, the SEQUEST database search for the tandem mass spectral data identified 2,045 proteins. For each protein, we compared the molecular weight calibrated from 1D-gel band position with the theoretical molecular weight computed from the amino acid sequence, by defining a variable MW$_{corr}$ Since the bacterial proteome is simpler than human proteome considering the complexity and modifications, the proteome analysis result for the Pseudomonas Putida KT2440 could suggest a guideline to build the protocol to analyze human proteome data.

  • PDF

Structure-based Functional Discovery of Proteins: Structural Proteomics

  • Jung, Jin-Won;Lee, Weon-Tae
    • BMB Reports
    • /
    • 제37권1호
    • /
    • pp.28-34
    • /
    • 2004
  • The discovery of biochemical and cellular functions of unannotated gene products begins with a database search of proteins with structure/sequence homologues based on known genes. Very recently, a number of frontier groups in structural biology proposed a new paradigm to predict biological functions of an unknown protein on the basis of its three-dimensional structure on a genomic scale. Structural proteomics (genomics), a research area for structure-based functional discovery, aims to complete the protein-folding universe of all gene products in a cell. It would lead us to a complete understanding of a living organism from protein structure. Two major complementary experimental techniques, X-ray crystallography and NMR spectroscopy, combined with recently developed high throughput methods have played a central role in structural proteomics research; however, an integration of these methodologies together with comparative modeling and electron microscopy would speed up the goal for completing a full dictionary of protein folding space in the near future.

Anti-Apoptosis Engineering Using a Gene of Bombyx mori

  • 김은정;박태현
    • 한국생물공학회:학술대회논문집
    • /
    • 한국생물공학회 2002년도 생물공학의 동향 (X)
    • /
    • pp.62-65
    • /
    • 2002
  • We have previously shown that the addition of silkworm hemolymph to a culture medium increases the longevity of insect and mammalian cells by inhibiting apoptosis. This indicates that the component which inhibits apoptosis is contained in the silkworm hemolymph, The apoptosis-inhibiting component was isolated from silkwonn hemolymph and characterized in our previous study. A database search using the N-terminal amino acid sequence of this component as a template resulted in a 95% homology with a low molecular weight lipoprotein, the so called ’30K protein' of unknown function. In this study, the 30K protein gene was expressed in mammalian and insect cells to confirm the apoptosis-inhibiting effect. The overexpression of 30K protein in mammalian cell inhibited the staurosporin-induced apoptosis by the prevention of the activation of caspase 3. Using an Autographa californicanuclear polyhedrosis virus (AcNPV) system, the 30K protein was overexpressed also in insect cells. The expression of the 30K protein increased the longevity of baculovirus-infected insect cells by inhibiting apoptosis. These results suggest that the 30K protein is a novel anti-apoptotic protein.

  • PDF

인삼 모상근 프로테옴 데이터 분석 : 인삼 EST database와의 통합 분석에 의한 단백질 동정 (Proteome Data Analysis of Hairy Root of Panax ginseng : Use of Expressed Sequence Tag Data of Ginseng for the Protein Identification)

  • 권경훈;김승일;김경욱;김은아;조건;김진영;김영환;양덕춘;허철구;유종신;박영목
    • Journal of Plant Biotechnology
    • /
    • 제29권3호
    • /
    • pp.161-170
    • /
    • 2002
  • 인삼 모상근의 프로테옴 분석에 의해 얻은 질량분석 스펙트럼 데이터는 MALDI/TOF/MS에서 얻는 질량 스펙트럼과 ESI/Q-TOF/MS에서 얻는 탄뎀 질량 스펙트럼으로 구분된다. 질량 스펙트럼은 단백질이 효소에 의해 분해된 펩타이드들의 분자량 정보를 제공하며, 탄뎀 질량 스펙트럼에서는 아미노산 단위로 분해된 절편 단백질의 분자량으로부터 아미노산 서열을 결과로 얻는다. 펩타이드의 아미노산 서열을 BLAST로 검색하면 유사한 단백질을 GenBank에서 검색할 수 있다. 이러한 단백질 동정 방법은 완전한 유전체 서열이 알려진 생물체의 경우 높은 정확도로 단백질을 동정할 수 있으나, 그렇지 않은 경우는 유사한 단백질이 데이터베이스에 존재하지 않아 분석이 용이하지 않다. 본 연구에서는 질량 스펙트럼 및 절편 단백질의 아미노산 서열을 EST (expressed sequence tag) 서열과 비교하여 프로테옴 데이터와 일치하는 EST 서열을 찾아내고 이를 BLAST검색에 의해 단백질 동정에 활용하였다. ESI/Q-TOF/MS 에서 얻은 아미노산 서열은 길이는 짧지만 데이터의 신뢰도가 높으므로 EST 서열과의 연관 관계를 밝힘으로써 단백질에 대한 정보를 보완할 수 있었다. ESI/Q-TOF/MS에서 얻은 펩타이드의 아미노산 서열을 EST 서열과 비교한 결과 90%의 아미노산 서열이 EST DB에서 발견되었다. NCBI의 nr 데이터베이스에서 아미노산 서열을 검색하여 찾은 단백질이 68%임에 비하여, 인삼 EST 서열에 의한 검색이 22% 더 많은 결과를 얻었다. MALDI/TOF/MS의 질량 스펙트럼에서 nr 데이터베이스로 검색한 결과와 인삼 EST 데이터베이스를 검색한 결과가 일치하는 경우는 47개 중 9개인 19%에 불과하여, 탄뎀 질량 분석으로 아미노산 서열을 얻지 않고, 단지 질량 스펙트럼으로부터 단백질을 동정하는 방법으로는 단백질 동정의 정확한 결과를 기대하기 어려움을 확인하였다.

Identification and Characterization of Bombyx mori LDH Gene through Bioinformatics Approaches

  • Zhu, Minfeng;Chen, Keping;Yao, Qin
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • 제15권2호
    • /
    • pp.137-143
    • /
    • 2007
  • Lactate dehydrogenase (LDH) is a ubiquitous enzyme that plays a significant role in the clinical diagnosis of pathologic processes. Discovery of the LDH (BmLDH) gene in B. mori may shed light on its role in the biology of Lepidoptera species, and afford further understanding of the function of the enzyme. In this study, we used the bioinformatics tools to identify LDH gene in B. mori. Sequence analysis showed that BmLDH cDNA contains a 996 bp open reading frame, encoding 331 AA proteins, with seven introns. Compared with hHLDH (human heart LDH), BmLDH contained the same key active sites. Domain search and protein fold recognition analyses provide compelling evidences that the deduced protein is a LDH. Using the computer program MEGA3, we conducted a search for homologs of BmLDH among many eukaryotic species and confirmed that the BmLDH was conserved in all organisms investigated. This gene has been registered in GenBank under the accession number EU000385.

N-Block substring 가중 선형모형을 이용한 단백질 CDS의 특징 추출 및 분류 (Feature Selection and Classification of Protein CDS Using n-Block substring weighted Linear Model)

  • 최성용;김진수;한승진;최준혁;임기욱;이정현
    • 한국지능시스템학회논문지
    • /
    • 제19권5호
    • /
    • pp.730-736
    • /
    • 2009
  • 방대한 유전 정보를 분석, 가공하는 생명정보학의 중요성은 더욱 높아지고 있다. 본 논문에서는 단백질의 1차 구조만으로 단백질의 구조와 기능을 예측하는 새로운 데이터마이닝 방법을 제안한다. 단백질 서열만으로 특징 추출시 발생할 수 있는 문제점인 방대한 탐색공간을 효과적으로 축소하기 위해 n-Block substring 탐색 알고리즘을 제안한다. 또한 선별된 각 substring의 도메인 연관도를 결정하는 가중치를 구하여 가중 선형모형을 구축함으로써 구조와 기능에 관련이 있을 것으로 예상되는 단백질 도메인의 특징을 추출하고 분류에 효과적임을 보인다. 도메인에 포함되는 각각의 CDS(coding sequence)에 대해 모형으로부터 구한 점수를 통해 해당 도메인과의 연관성의 정도를 추정하며, 분류 효율을 더욱 향상시킬 수 있음을 보인다.