• Title/Summary/Keyword: 단백질 서열 탐색

Search Result 37, Processing Time 0.012 seconds

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

A Performance Comparison of Protein Profiles for the Prediction of Protein Secondary Structures (단백질 이차 구조 예측을 위한 단백질 프로파일의 성능 비교)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.26-32
    • /
    • 2018
  • The protein secondary structures are important information for studying the evolution, structure and function of proteins. Recently, deep learning methods have been actively applied to predict the secondary structure of proteins using only protein sequence information. In these methods, widely used input features are protein profiles transformed from protein sequences. In this paper, to obtain an effective protein profiles, protein profiles were constructed using protein sequence search methods such as PSI-BLAST and HHblits. We adjust the similarity threshold for determining the homologous protein sequence used in constructing the protein profile and the number of iterations of the profile construction using the homologous sequence information. We used the protein profiles as inputs to convolutional neural networks and recurrent neural networks to predict the secondary structures. The protein profile that was created by adding evolutionary information only once was effective.

A Protein Sequence Prediction Method by Mining Sequence Data (서열 데이타마이닝을 통한 단백질 서열 예측기법)

  • Cho, Sun-I;Lee, Do-Heon;Cho, Kwang-Hwi;Won, Yong-Gwan;Kim, Byoung-Ki
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.261-266
    • /
    • 2003
  • A protein, which is a linear polymer of amino acids, is one of the most important bio-molecules composing biological structures and regulating bio-chemical reactions. Since the characteristics and functions of proteins are determined by their amino acid sequences in principle, protein sequence determination is the starting point of protein function study. This paper proposes a protein sequence prediction method based on data mining techniques, which can overcome the limitation of previous bio-chemical sequencing methods. After applying multiple proteases to acquire overlapped protein fragments, we can identify candidate fragment sequences by comparing fragment mass values with peptide databases. We propose a method to construct multi-partite graph and search maximal paths to determine the protein sequence by assembling proper candidate sequences. In addition, experimental results based on the SWISS-PROT database showing the validity of the proposed method is presented.

Efficient Sequence Association Rule Mining for Discovering Protein Relations (단백질 서열 연관 규칙 마이닝을 위한 효율적인 알고리즘 설계)

  • Kim, Hyun-Min;Kim, Ji-Hye;Ramakrishna, R.S.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04b
    • /
    • pp.1183-1186
    • /
    • 2002
  • DNA 의 염기서열 탐색을 위한 유전체학의 다음 세대인 구조유전체학은 유전체 사업으로 인한 인간 게놈지도의 완성과 축적된 생물정보를 이용한 생물정보학의 발달과 함께 급속한 성장을 계속하고 있다. 포스트 게놈 시대를 맞이하여 생명현상에 대한 궁극적인 이해를 위한 노력으로 단백질의 구조와 기능에 대한 연구가 주목을 받게 되었다. 다양한 구조 규명을 위한 도구들과 단백질 정보를 관리하기 위한 데이터베이스 구축에 따른 관련 기술의 발전은, 앞으로 다가올 생물정보의 방대함을 감안할 때, 가치 있는 지식정보를 얻기 위한 데이터 마이닝 기법들을 통해서만 가능하다. 본 논문은 데이터 마이닝의 근간 기술인 연관규칙 마이닝을 응용한 효율적인 서열 연관 규칙 알고리즘을 제안하며, 단백질 구조의 예측을 위한 단백질 서열 및 DNA 서열간의 패턴 비교 및 연관성을 목적으로 한다. 또한, 공간적 시간적 복잡성을 CMS-tree 라는 자료구조를 통해 알고리즘의 확장성 및 병렬화의 기본 알고리즘으로 사용하도록 개발하였다.

  • PDF

Location and Nucleotide Sequence of the Bombyx mori Nuclear Polyhedrosis Virus Polyhedrin Gene (누에 핵다각체병 바이러스의 다각체 단백질 유전자의 위치 탐색 및 염기서열)

  • 우수동;김현욱;박범석;강석권;양재명;정인식
    • Journal of Sericultural and Entomological Science
    • /
    • v.34 no.2
    • /
    • pp.20-25
    • /
    • 1992
  • The location of the polyhedrin gene of Bmbyx mori nuclear polyhedrosis virus(BmNPV) was determined by using a cloned polyhedrin gene from the Autographa californica nuclear polyhedrosis virus(AcNPV) as a hybridization probe. The 7.4 Kb PstⅠ fragment DNA of Bm-NPV was cloned to plasmid pUC19 vector. A fragment containing this gene was mapped and sequenced in its entire polyhedrin reading frame. Nucleotide sequences comparison of the polyhedrin of the BmNPV to that of previously reported by Ⅰatrou(1985) revealed that the sequence varied in 10 base, Comparison of the amino acid sequence of the two structured gene revealed that coding sequence varied 74 valine to isoleucine, 76 aspargine to serine and 155 methionine to valine.

  • PDF

Anlaysis of Eukaryotic Sequence Pattern using GenScan (GenScan을 이용한 진핵생물의 서열 패턴 분석)

  • Jung, Yong-Gyu;Lim, I-Suel;Cha, Byung-Heun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.113-118
    • /
    • 2011
  • Sequence homology analysis in the substances in the phenomenon of life is to create database by sorting and indexing and to demonstrate the usefulness of informatics. In this paper, Markov models are used in GenScan program to convert the pattern of complex eukaryotic protein sequences. It becomes impossible to navigate the minimum distance, complexity increases exponentially as the exact calculation. It is used scorecard in amino acid substitutions between similar amino acid substitutions to have a differential effect score, and is applied the Markov models sophisticated concealment of the transition probability model. As providing superior method to translate sequences homologous sequences in analysis using blast p, Markov models. is secreted protein structure of sequence translations.

Association Discovery Among Protein Motifs (단백질 모티프간 연관성 탐사)

  • Lee, Hyun-Suk;Lee, Do-Heon;Choi, Deok-Jai
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11c
    • /
    • pp.1827-1830
    • /
    • 2002
  • 단백질 모티프(motif)란 유사한 기능을 가진 여러 단백질 서열에서 공통적으로 발견되는 패턴으로서 단백질의 기능을 예측하는 단서로 활용된다. 현재 Prosite, Pfam 등의 데이터베이스에서 정규식(regular expression), 가중치 행렬(weighted matrix), 은닉 마코프 모델(hidden Markov model)의 형태로 4천여종 이상의 모티프가 등록되어 있다. 본 논문에서는 연관성 탐사 기법을 적용하여 Hits 데이터로부터 상당히 높은 연관성을 갖는 모티프 집단을 밝히고, 실제 자연현상에서 자주 나타나는 연관성을 교차타당성 (cross-validation) 기법을 통해 입증하였다. 이렇게 밝혀진 단백질 모티프간 연관성을 트라이 탐색 기법을 통해 웹으로 제공함으로써 단백질의 기능유추에 쉽게 접근하고자 한다.

  • PDF

Purification and Properties of Ribosome-inactivating Proteins from the Leaves of $Cucurbita\;moschata\;D_{UCHESNE}$ (호박$(Cucurbita\;moschata\;D_{UCHESNE})$잎에서 리보즘불활성화 단백질의 분리 및 특성)

  • Lee, Si-Myung;Kim, Yeong-Tae;Hwang, Young-Soo;Cho, Kang-Jin
    • Applied Biological Chemistry
    • /
    • v.40 no.5
    • /
    • pp.375-379
    • /
    • 1997
  • Two ribosome-inactivating proteins, PRIP 1 and PRIP 2 have been isolated from the leaves of $Cucurbita\;moschata\;D_{UCHESNE}$. Crude extracts were purified through ammonium sulfate precipitation and column chromatography using DE-52 cellulose, S-Sepharose, FPLC Suprose 12 HR and FPLC Mono-S. The molecular weights of PRIP 1 and PRIP 2 were 31,000 and 30,500, respectively. PRIP 2 was thermostabe and maintained its activity even after the incubation of the protein at $50^{\circ}C$ for 30 min. In a cell free in vitro translation system using rabbit reticulocyte lysate, protein synthesis was inhibited by the addition of PRIP 1 and PRIP 2. The $IC_{50}$ of PRIP 1 and PRIP 2 were 0.82 nM and 0.79 nM, respectively. The comparison of N-terminal amino acid sequences of the PRIP 1 and PRIP 2 with known RIPs revealed that PRIP 1 shows sequence similarity with Luffin B from Luffa cylindrica and Trichokirin from Trichosanthes kirilowii Maximowicz and PRH) 2 has sequence similarity with Momordin II and MAP 30 from Momordica charantia.

  • PDF

Developing a Protein-chip for Depigmenting Agents Screening (미백제 스크리닝용 단백질칩의 개발)

  • Kim, Eun-Ki;Kwak, Eun-Young;Han, Jung-Sun;Lee, Hyang-Bok;Shin, Jung-Hyun
    • Journal of the Society of Cosmetic Scientists of Korea
    • /
    • v.31 no.1 s.49
    • /
    • pp.13-16
    • /
    • 2005
  • For the high-throughput-screening system (HTS) of depigmenting agents using a protein chip, effects of oligonucleotide-inhibitor sequence on the binding of Mitf protein to E box of MC1R was investigated. The sequence of oligonucletide-inhibitor affected the binding of the target DNA to Mitf, depending on the location of the sequence variation in the inhibitor nucleotide. The oligonucletide-inhibitor that changed the CATGTG sequence didn't show enough inhibition of the target DNA to Mitf, whereas significant inhibition was observed when the sequence outside the CATGTG was changed. This result indicated that CATCTG is crucial sequence for the binding of Mitf to I-box which initiates the transcription of pigmenting genes.

HPV Risk Classification Using Kernel Based Learning (Kernel 기반 학습을 이용한 HPV의 위험군 분류)

  • 정제균;오석준;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.428-430
    • /
    • 2003
  • 인유두종바이러스(human papillomavirus: HPV)는 감염되었을 때 각종 악성 종양을 유발할 수 있는 작은 DNA 바이러스이다. 고위험군에 속하는 HPV의 감염은 암으로 진행될 수 있는 가능성이 크다. 본 논문은 HPV를 분류할 수 있는 기계 학습 기법을 제안하고자 한다. 제안된 학습 기법은 단백질 서열을 효과적으로 분류할 수 있는 커널(kernel) 방법에 기반을 두고 있다. 위험군 분류는 감염의 메커니즘의 이해와 유전자칩과 같은 새로운 의학 도구의 개발 등에 있어서 중요한 정보를 제공해 줄 수 있다. 실험 결과는 중요한 부위의 탐색에 의한 커널 기반의 학습 방법이 우수한 성능을 보이는 것으로 나타났다.

  • PDF