• 제목/요약/키워드: Protein Sequence

검색결과 2,315건 처리시간 0.029초

EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류 (Protein Disorder/Order Region Classification Using EPs-TFP Mining Method)

  • 이헌규;신용호
    • 한국산업정보학회논문지
    • /
    • 제17권6호
    • /
    • pp.59-72
    • /
    • 2012
  • 단백질은 서열의 disorder 구역이 생물학적 반응을 일으켜 order로 변하는 과정에서 그 기능을 하게 되므로 서열 데이터에서 disorder 구역과 order 구역을 분리하는 것은 단백질의 3차 구조 및 특성을 예측하는데 반드시 필요하다. 따라서 이 논문에서는 효율적인 disorder와 order 구역 분류를 위해서 단백질의 특정 특징에 치우치지 않는 분류 결과를 얻으면서, 분류 속도를 향상 시킬 수 있도록 서열 데이터를 이용한 분류/예측 기법을 제안한다. 출현패턴 기반의 EPs-TFP 기법은 중복 출현패턴이 제거된 필수 출현패턴만을 이용하는 분류/예측 기법이다. 이 분류 기법은 disorder 구역의 서열 출현패턴들을 발견하며, 이러한 서열 출현패턴은 disorder 구역에서는 빈발하지만 order 구역에서는 상대적으로 빈발하지 않는 패턴들이다. 또한 제안 알고리즘의 성능 향상을 위해서 기존의 P-tree, T-tree 개념의 TFP 기법을 확장하여 분류/예측 기법으로 적용하였다. EPs-TFP 기법의 성능평가를 위해서 Disprot 4.9와 CASP 7 데이터를 활용하였고, disorder/order 구역을 분류한 결과, 민감도 73.6, 특이도 69.5, 정확도 74.2를 보였다.

Computational Approaches for Structural and Functional Genomics

  • Brenner, Steven-E.
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.17-20
    • /
    • 2000
  • Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. We use computational methods to select families whose structures cannot be predicted and which are likely to be amenable to experimental characterization. Methods to be employed included modern sequence analysis and clustering algorithms. A critical component is consultation of the presage database for structural genomics, which records the community's experimental work underway and computational predictions. The protein families are ranked according to several criteria including taxonomic diversity and known functional information. Individual proteins, often homologs from hyperthermophiles, are selected from these families as targets for structure determination. The solved structures are examined for structural similarity to other proteins of known structure. Homologous proteins in sequence databases are computationally modeled, to provide a resource of protein structure models complementing the experimentally solved protein structures.

  • PDF

Nonspecific Association of a 17 kDa Isoform of the Myelin Basic Protein with the Postsynaptic Density Fraction

  • Moon, Il-Soo
    • BMB Reports
    • /
    • 제33권3호
    • /
    • pp.276-278
    • /
    • 2000
  • The postsynaptic density (PSD), a large protein complex beneath the postsynaptic membrane, is notorious for its 'stickiness'. In order to understand the molecular composition of the PSD fraction, a 17 kDa protein band was isolated by electroelution from SDS-geis, and its partial amino acid sequence was determined from HPLC-purified tryptic peptides of the protein. Surprisingly, the amino acid sequence was identical to that of the previously reported 17 kDa isoform of the myelin basic protein (MBP), an essential protein in CNS myelin formation. Since the protein band represented ~2% of the total proteins in the 1 % n-octyl glucoside-insoluble PSD fraction, these results indicate that a significant amount of the 17 kDa isoform of MBP is tightly associated with the PSD during preparation of the PSD fraction.

  • PDF

Kinetic analysis of Drosophila Vnd protein containing homeodomain with its target sequence

  • Yoo, Si-Uk
    • BMB Reports
    • /
    • 제43권6호
    • /
    • pp.407-412
    • /
    • 2010
  • Homeodomain (HD) is a highly conserved DNA-binding domain composed of helix-turn-helix motif. Drosophila Vnd (Ventral nervous system defective) containing HD acts as a regulator to either enhance or suppress gene expression upon binding to its target sequence. In this study, kinetic analysis of Vnd binding to DNA was performed. The result demonstrates that DNA-binding affinity of the recombinant protein containing HD and NK2-specific domain (NK2-SD) was higher than that of the full-length Vnd. To access whether phosphorylation sites within HD and NK2-SD affect the interaction of the protein with the target sequence, alanine substitutions were introduced. The result shows that S631A mutation within NK2-SD does not contribute significantly to the DNA-binding affinity. However, S571A and T600A mutations within HD showed lower affinity for DNA binding. In addition, DNA-binding analysis using embryonic nuclear protein also demonstrates that Vnd interacts with other nuclear proteins, suggesting the existence of Vnd as a complex.

The 52 kD Protein Gene of Odontoglossum Ringspot Virus Containing RNA-Dependent RNA Polymerase Motifs and Comparisons with Other Tobamoviruses

  • Park, Won-Mok
    • Journal of Plant Biology
    • /
    • 제38권2호
    • /
    • pp.129-136
    • /
    • 1995
  • Complementary DNA of the genomic RNA of odontoglossum ringspot virus Cymbidium strain (ORSV-Cy) was synthesized from polyadenylated viral RNA and cloned. Selected clones containing the viral RNA-dependent RNA polymerase gene of the virus has been sequenced by automated sequencing system. The complete nucleotide sequence of an open reading frame is 1377 base pairs in length, and encodes a protein of 458 amino acids about 52, 334 D. The 52 kD protein of ORSV shares four sequence motifs characteristic of viral RNA-dependent RNA polymerase. Comparison of the ORSV 52 kD protein sequence with that of other five viruses in tobamovirus group showed 76.0 to 60.7% homologies at the amino acid level and the conservation of the four motifs betwen the viruses.

  • PDF

단백질 서열정렬 정확도 예측을 위한 새로운 방법 (A new method to predict the protein sequence alignment quality)

  • 이민호;정찬석;김동섭
    • Bioinformatics and Biosystems
    • /
    • 제1권1호
    • /
    • pp.82-87
    • /
    • 2006
  • 현재 가장 많이 사용되는 단백질 구조 예측 방법은 비교 모델링 (comparative modeling) 방법이다. 비교 모델링 방법에서의 정확도를 높이기 위해서는 alignment의 정확도 역시 매우 필수적으로 필요하다. 비교 모델링 과정 중의 fold-recognition 단계에서 alignment의 정확도에 의해 template을 고르는 방법은 단지 가장 비슷한 template을 선택하는 방법에 비해 주목을 받지 못하고 있다. 최근에는 두 가지의 alignment에 사이의 shift 정보를 바탕으로 한 shift score라는 수치가 alignment의 성능을 표현하기 위해서 개발되었다. 우리는 더 정확한 구조 예측의 첫걸음이 될 수 있는 shift score를 예측하는 방법을 개발하였다. Shift score를 예측하기 위해 support vector regression (SVR)이 사용되었다. 사전에 구축된 라이브러리 안의 길이가 n 인 template과 구조를 알고 싶은 query 단백질 사이의 alignment는 n+2 차원의 input 벡터로 변환된다. Structural alignment가 가장 좋은 alignment로 가정되었고 SVR은 query 단백질과 template 단백질의 structural alignment과 profile-profile alignment 사이의 shift score를 예측하도록 training 되었다. 예측 정확도는 Pearson 상관계수로 측정되었다. Training 된 SVR은 실제의 shift score와 예측된 shift score 사이에 0.80의 Pearson 상관계수를 갖는 정도로 예측하였다.

  • PDF

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

  • You, Youngki;Jang, Inhwan;Lee, Kyungro;Kim, Heonjoo;Lee, Kwanhee
    • Interdisciplinary Bio Central
    • /
    • 제6권4호
    • /
    • pp.3.1-3.10
    • /
    • 2014
  • Amino acid substitution matrices are essential tools for protein sequence analysis, homology sequence search in protein databases and multiple sequence alignment. The PAM matrix was the first widely used amino acid substitution matrix. The BLOSUM series then succeeded the PAM matrix. Most substitution matrixes were developed by using the statistical frequency of substitution between each amino acid at blocks representing groups of protein families or related proteins. However, substitution of amino acids is based on the similarity of physiochemical properties of each amino acid. In this study, a new approach was used to obtain major physiochemical properties in multiple sequence alignment. Frequency of amino acid substitution in multiple sequence alignment database and selected attributes of amino acids in physiochemical properties database were merged. This merged data showed the major physiochemical properties through principle components analysis. Using factor analysis, these four principle components were interpreted as flexibility of electronic movement, polarity, negative charge and structural flexibility. Applying these four components, BAPS was constructed and validated for accuracy. When comparing receiver operated characteristic ($ROC_{50}$) values, BAPS scored slightly lower than BLOSUM and PAM. However, when evaluating for accuracy by comparing results from multiple sequence alignment with the structural alignment results of two test data sets with known three-dimensional structure in the homologous structure alignment database, the result of the test for BAPS was comparatively equivalent or better than results for prior matrices including PAM, Gonnet, Identity and Genetic code matrix.

Nucleotide Sequence of a Bacteriolytic Enzyme Gene from Alkalophilic Bacillus sp.

  • Jung, Myeong-Ho;Ohk, Seung-Ho;Yum, Do-Young;Kong, In-Soo;Bai, Dong-Hoon
    • Journal of Microbiology and Biotechnology
    • /
    • 제3권2호
    • /
    • pp.73-77
    • /
    • 1993
  • The nucleotide sequence of Bacillus sp. bacteriolytic enzyme gene, lytP and its flanking regions were determined. A unique open reading frame for a protein of Mw. 27, 000, and a putative terminator sequence, were found behind a concensus ribosome binding site located 8 nt upstream from ATG start codon. The primary amino acid sequence deduced from nucleotide sequence revealed a putative protein of 255 amino acid residues with an Mw. of 27, 420. No significant homology could be found between the amino acid sequence of Bacillus sp. bacteriolytic enzyme and that of other cell wall hydrolases.

  • PDF

Orphan G Protein-coupled Receptors in Post-Genome Era

  • Im, Dong-Soon
    • 대한약학회:학술대회논문집
    • /
    • 대한약학회 2002년도 Proceedings of the Convention of the Pharmaceutical Society of Korea Vol.2
    • /
    • pp.131-133
    • /
    • 2002
  • In 'Nature', Dixon et al. reported the first cloned mammalian G-protein coupled receptor sequence (1). The DNA sequence from a hamster encodes the $\beta$$_2$-aderenergic receptor. In the same year, 1986, Kubo et al. published the muscarinic acetylcholine receptor sequence (M$_1$) from a rat in the same journal (2). Both groups purified the receptor proteins and identified the DNA sequences (1, 2). (omitted)

  • PDF

생쥐 섬 유아세포에서 70 kDa 고온충격 단백질의 CDNA 클로닝과 염기서열 분석 (Isolation and Characterization of a CDNA Encoding a Protein Homologous to the Mouse 70 kDa Heat Shock Protein)

  • 김창환;정선미최준호
    • 한국동물학회지
    • /
    • 제35권2호
    • /
    • pp.203-210
    • /
    • 1992
  • Hsp70, a 70 kDa protein, is the maior protein expressed when cells are heat-shocked. A cDNA library from mouse ID13 cells was screened with the human hsp70 gene as a probe, and a positive clone was obtained. The positive clone was subcloned into puc19 and the precise restriction was obtained. The CDNA was sequenced by the Sanger's dideoxv termination method. Single open reading frame that codes for a protein of 70 kDa was found. The DNA sequence of the cloned mouse DNA shows great homology (66-90%) with other mouse hsp70 genes and somewhat less homology (50",) with E. coli hsp70 gene (dnak). With the exception of one amino acid, the protein sequence deduced from the CDNA is identical to the mouse that shock cognate protein 70 (hsc70) that is constitutivelv expressed at normal temperature. The result suggests that the cloned CDNA encodes a hsc70 family rather than a heatinducible family.mily.

  • PDF