• 제목/요약/키워드: Protein identification

검색결과 1,148건 처리시간 0.031초

확률그래프모델을 이용한 MS/MS 기반 단백질 동정 기법 (A Method for Protein Identification Based on MS/MS using Probabilistic Graphical Models)

  • 이홍란;황규백
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2012년도 한국컴퓨터종합학술대회논문집 Vol.39 No.1(B)
    • /
    • pp.426-428
    • /
    • 2012
  • In order to identify proteins that are present in biological samples, these samples are separated and analyzed under the sequential procedure as follows: protein purification and digestion, peptide fragmentation by tandem mass spectrometry (MS/MS) which breaks peptides into fragments, peptide identification, and protein identification. One of the widely used methods for protein identification is based on probabilistic approaches such as ProteinProphet and BaysPro. However, they do not consider the difference in peptide identification probabilities according to their length. Here, we propose a probabilistic graphical model-based approach to protein identification from MS/MS data considering peptide identification probabilities, number of sibling peptides, and peptide length. We compared our approach with ProteinProphet using a yeast MS/MS dataset. As a result, our model identified 27 more proteins than ProteinProphet at 1% of FDR (false discovery rate), confirming the importance of peptide length information in protein identification.

Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

  • Sumathipala, Sagara;Yamada, Koichi;Unehara, Muneyuki;Suzuki, Izumi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제15권2호
    • /
    • pp.111-120
    • /
    • 2015
  • Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature. In this paper, we explore the use of abstracts of biomedical literature in MEDLINE for protein name identification and present the results of the conducted experiments. We present a robust and effective approach to classify biomedical named entities into protein and non-protein classes, based on a rich set of features: orthographic, keyword, morphological and newly introduced Protein-Score features. Our procedure shows significant performance in the experiments on GENIA corpus using Random Forest, achieving the highest values of precision 92.7%, recall 91.7%, and F-measure 92.2% for protein identification, while reducing the training and testing time significantly.

Informatics for protein identification by tandem mass spectrometry; Focused on two most-widely applied algorithms, Mascot and SEQUEST

  • Sohn, Chang-Ho;Jung, Jin-Woo;Kang, Gum-Yong;Kim, Kwang-Pyo
    • Bioinformatics and Biosystems
    • /
    • 제1권2호
    • /
    • pp.89-94
    • /
    • 2006
  • Mass spectrometry (MS) is widely applied for high throughput proteomics analysis. When large-scale proteome analysis experiments are performed, it generates massive amount of data. To search these proteomics data against protein databases, fully automated database search algorithms, such as Mascot and SEQUEST are routinely employed. At present, it is critical to reduce false positives and false negatives during such analysis. In this review we have focused on aspects of automated protein identification using tandem mass spectrometry (MS/MS) spectra and validation of the protein identifications of two most common automated protein identification algorithms Mascot and SEQUEST.

  • PDF

Proteomic Identification of Proteins Interacting with a Dual Specificity Protein Phosphatase, VHZ

  • Kim, Jae-Hoon;Jeong, Dae-Gwin
    • Journal of Applied Biological Chemistry
    • /
    • 제50권2호
    • /
    • pp.58-62
    • /
    • 2007
  • Identification of Dual-specificity protein phosphatase (DSP) substrates is essential in revealing physiological roles of DSPs. We isolated VHZ-interacting proteins from extracts of 293T cells overexpressing a VHZ (C95S, D65A) mutant known to be substrate- trapping mutant. Analysis of specific proteins bound to VHZ by 2D gel electrophoresis and mass spectroscopy revealed that these proteins contained Chaperonin containing TCP1, Type II phosphatidylinositol phosphate kinase ${\gamma}$, Intraflagellar transport 80 homolog, and Kinesin superfamily protein 1B. VHZ-interacting proteins showed that VHZ is involved in many important cellular signal pathways such as protein folding, molecular transportation, and tumor suppression.

탠덤 질량 분석을 위한 디코이 데이터베이스 생성 방법의 중복성 관점에서의 성능 평가 (Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis)

  • 이홍란;류단휘;이기욱;황규백
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제22권1호
    • /
    • pp.56-60
    • /
    • 2016
  • 탠덤 질량 분석에서는 신뢰도 높은 펩타이드 동정을 위해 목표 데이터베이스의 참조 단백질 순서를 재배치한 디코이 데이터베이스가 주로 이용된다. 한편 목표 데이터베이스와 디코이 데이터베이스 사이 혹은 디코이 데이터베이스 내부에 서열이 동일한 중복 펩타이드가 존재할 수 있으며, 이는 단백질 동정을 어렵게 하는 요인이 된다. 따라서 디코이 데이터베이스의 중복성을 최소화하는 것은 중요한 문제이다. 본 논문에서는 디코이 데이터베이스 생성에 널리 사용되는 의사셔플(pseudo-shuffling)과 의사역순(pseudo-reversing) 방법이 디코이 데이터베이스의 중복성에 미치는 영향을 조사하였다. 실험 결과, 목표 데이터베이스 크기와 데이터베이스 생성 시 허용되는 'missed cleavage site'의 최대 개수는 중복성을 증가시킴을 확인하였다. 또한 동일한 조건에서는 의사역순 방법이 의사셔플보다 항상 낮은 수준의 중복성을 가지는 디코이 데이터베이스를 생성하였다.

A Machine Learning Based Method for the Prediction of G Protein-Coupled Receptor-Binding PDZ Domain Proteins

  • Eo, Hae-Seok;Kim, Sungmin;Koo, Hyeyoung;Kim, Won
    • Molecules and Cells
    • /
    • 제27권6호
    • /
    • pp.629-634
    • /
    • 2009
  • G protein-coupled receptors (GPCRs) are part of multi-protein networks called 'receptosomes'. These GPCR interacting proteins (GIPs) in the receptosomes control the targeting, trafficking and signaling of GPCRs. PDZ domain proteins constitute the largest protein family among the GIPs, and the predominant function of the PDZ domain proteins is to assemble signaling pathway components into close proximity by recognition of the last four C-terminal amino acids of GPCRs. We present here a machine learning based approach for the identification of GPCR-binding PDZ domain proteins. In order to characterize the network of interactions between amino acid residues that contribute to the stability of the PDZ domain-ligand complex and to encode the complex into a feature vector, amino acid contact matrices and physicochemical distance matrix were constructed and adopted. This novel machine learning based method displayed high performance for the identification of PDZ domain-ligand interactions and allowed the identification of novel GPCR-PDZ domain protein interactions.

Optimization of Automated Suspension Trapping Digestion in Bottom-Up Proteomics via Mass Spectrometry

  • Haneul Song;Yejin Jeon;Iyun Choi;Minjoong Joo;Jong-Moon Park;Hookeun Lee
    • Mass Spectrometry Letters
    • /
    • 제15권1호
    • /
    • pp.62-68
    • /
    • 2024
  • The Suspension Trapping (S-Trap) method has been a prominent sample preparation technique since its introduction in 2014. Its capacity to induce protein aggregation using organic solvents has significantly improved protein purification and facilitated peptide identification. However, its full potential for automation has been limited by the lack of a suitable liquid handling system until recently. In this study, we aimed to enhance the automation of S-Trap sample preparation by optimizing the S-Trap digestion process, incorporating triethylammonium bicarbonate (TEAB) and CaCl2. The utilization of TEAB buffer conditions in this innovative process led to a noteworthy 12% improvement in protein identification. Additionally, through careful observation of various incubation conditions, we streamlined the entire sample preparation workflow into a concise 4 hours timeline, covering reduction, alkylation, and trypsin incubation stages. This refined and expedited automated S-Trap digestion process not only showcased exceptional time efficiency but also improved trypsin digestion, resulting in increased protein identification.

Proteomic Studies in Plants

  • Park, Ohk-Mae K.
    • BMB Reports
    • /
    • 제37권1호
    • /
    • pp.133-138
    • /
    • 2004
  • Proteomics is a leading technology for the high-throughput analysis of proteins on a genome-wide scale. With the completion of genome sequencing projects and the development of analytical methods for protein characterization, proteomics has become a major field of functional genomics. The initial objective of proteomics was the large-scale identification of all protein species in a cell or tissue. The applications are currently being extended to analyze various functional aspects of proteins such as post-translational modifications, protein-protein interactions, activities and structures. Whereas the proteomics research is quite advanced in animals and yeast as well as Escherichia coli, plant proteomics is only at the initial phase. Major studies of plant proteomics have been reported on subcellular proteomes and protein complexes (e.g. proteins in the plasma membranes, chloroplasts, mitochondria and nuclei). Here several plant proteomics studies will be presented, followed by a recent work using multidimensional protein identification technology (MudPIT).

LC-MS/MS Analysis of Surface Layer Proteins as a Useful Method for the Identification of Lactobacilli from the Lactobacillus acidophilus Group

  • Podlesny, Marcin;Jarocki, Piotr;Komon, Elwira;Glibowska, Agnieszka;Targonski, Zdzislaw
    • Journal of Microbiology and Biotechnology
    • /
    • 제21권4호
    • /
    • pp.421-429
    • /
    • 2011
  • For precise identification of a Lactobacillus K1 isolate, LC-MS/MS analysis of the putative surface layer protein was performed. The results obtained from LTQ-FT-ICR mass spectrometry confirmed that the analyzed protein spot is the surface layer protein originating from Lb. helveticus species. Moreover, the identified protein has the highest similarity with the surface layer protein from Lb. helveticus R0052. To evaluate the proteomic study, multilocus sequence analysis of selected housekeeping gene sequences was performed. Combination of 16S rRNA sequencing with partial sequences for the genes encoding the RNA polymerase alpha subunit (rpoA), phenylalanyl-tRNA synthase alpha subunit (pheS), translational elongation factor Tu (tuf), and Hsp60 chaperonins (groEL) also allowed to classify the analyzed isolate as Lb. helveticus. Further classification at the strain level was achieved by sequencing of the slp gene. This gene showed 99.8% identity with the corresponding slp gene of Lb. helveticus R0052, which is in good agreement with data obtained by nano-HPLC coupled to an LTQ-FT-ICR mass spectrometer. Finally, LC-MS/MS analysis of surface layer proteins extracted from three other Lactobacillus strains proved that the proposed method is the appropriate molecular tool for the identification of S-layer-possessing lactobacilli at the species and even strain levels.

Identification and Characterization of pH-Regulated Genes in Saccharomyces cerevisiae

  • Hong, Sung-Ki;Choi, Eui-Yul
    • Journal of Microbiology
    • /
    • 제34권4호
    • /
    • pp.327-333
    • /
    • 1996
  • Yeast, like many other microbes, encounters large variations in ambient pH in their natural environments. Microorganisms capable of growing over a wide pH range require a versatile, efficient pH homeostatic mechanism protecting intracellular processes against extremes of pH. In several organisms, fusions to the bacterial lacZ gene have been extremely useful for the identification of genes expressed at different time during the life cycle or under different growth conditions. In this study, using the lacZ gene screening system, we surveyed a large number of yeast strains with lacZ insertion to identify genes regulated by pH. A yeast genomic library was constructed and inserted with lacZ by a shuttle mutagenesis procedure. The yeast transformants were individually picked up with a toothpick, replica-plated, and grown in alkaline pH medium. Among the 35,000 colonies screened, 10 candidate strains were identified initially by the $\beta$-gal assay. We finally confirmed two yeast strains carrying the genes whose expression are strictly dependent on pH of growth medium. One of the fusions showing a 10-fold induction in expression level in response to alkali pH was selected and further characterized. The pH-regulated gene was cloned by inverse PCR and a partial sequence of the gene was determined. Identification and characterization of the gene is currently under investigation.

  • PDF