• Title/Summary/Keyword: Protein identification

Search Result 1,149, Processing Time 0.028 seconds

A Method for Protein Identification Based on MS/MS using Probabilistic Graphical Models (확률그래프모델을 이용한 MS/MS 기반 단백질 동정 기법)

  • Li, Hong-Lan;Hwang, Kyu-Baek
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.426-428
    • /
    • 2012
  • In order to identify proteins that are present in biological samples, these samples are separated and analyzed under the sequential procedure as follows: protein purification and digestion, peptide fragmentation by tandem mass spectrometry (MS/MS) which breaks peptides into fragments, peptide identification, and protein identification. One of the widely used methods for protein identification is based on probabilistic approaches such as ProteinProphet and BaysPro. However, they do not consider the difference in peptide identification probabilities according to their length. Here, we propose a probabilistic graphical model-based approach to protein identification from MS/MS data considering peptide identification probabilities, number of sibling peptides, and peptide length. We compared our approach with ProteinProphet using a yeast MS/MS dataset. As a result, our model identified 27 more proteins than ProteinProphet at 1% of FDR (false discovery rate), confirming the importance of peptide length information in protein identification.

Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

  • Sumathipala, Sagara;Yamada, Koichi;Unehara, Muneyuki;Suzuki, Izumi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.111-120
    • /
    • 2015
  • Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature. In this paper, we explore the use of abstracts of biomedical literature in MEDLINE for protein name identification and present the results of the conducted experiments. We present a robust and effective approach to classify biomedical named entities into protein and non-protein classes, based on a rich set of features: orthographic, keyword, morphological and newly introduced Protein-Score features. Our procedure shows significant performance in the experiments on GENIA corpus using Random Forest, achieving the highest values of precision 92.7%, recall 91.7%, and F-measure 92.2% for protein identification, while reducing the training and testing time significantly.

Informatics for protein identification by tandem mass spectrometry; Focused on two most-widely applied algorithms, Mascot and SEQUEST

  • Sohn, Chang-Ho;Jung, Jin-Woo;Kang, Gum-Yong;Kim, Kwang-Pyo
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.89-94
    • /
    • 2006
  • Mass spectrometry (MS) is widely applied for high throughput proteomics analysis. When large-scale proteome analysis experiments are performed, it generates massive amount of data. To search these proteomics data against protein databases, fully automated database search algorithms, such as Mascot and SEQUEST are routinely employed. At present, it is critical to reduce false positives and false negatives during such analysis. In this review we have focused on aspects of automated protein identification using tandem mass spectrometry (MS/MS) spectra and validation of the protein identifications of two most common automated protein identification algorithms Mascot and SEQUEST.

  • PDF

Proteomic Identification of Proteins Interacting with a Dual Specificity Protein Phosphatase, VHZ

  • Kim, Jae-Hoon;Jeong, Dae-Gwin
    • Journal of Applied Biological Chemistry
    • /
    • v.50 no.2
    • /
    • pp.58-62
    • /
    • 2007
  • Identification of Dual-specificity protein phosphatase (DSP) substrates is essential in revealing physiological roles of DSPs. We isolated VHZ-interacting proteins from extracts of 293T cells overexpressing a VHZ (C95S, D65A) mutant known to be substrate- trapping mutant. Analysis of specific proteins bound to VHZ by 2D gel electrophoresis and mass spectroscopy revealed that these proteins contained Chaperonin containing TCP1, Type II phosphatidylinositol phosphate kinase ${\gamma}$, Intraflagellar transport 80 homolog, and Kinesin superfamily protein 1B. VHZ-interacting proteins showed that VHZ is involved in many important cellular signal pathways such as protein folding, molecular transportation, and tumor suppression.

Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis (탠덤 질량 분석을 위한 디코이 데이터베이스 생성 방법의 중복성 관점에서의 성능 평가)

  • Li, Honglan;Liu, Duanhui;Lee, Kiwook;Hwang, Kyu-Baek
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.56-60
    • /
    • 2016
  • Peptide identification in tandem mass spectrometry is usually done by searching the spectra against target databases consisting of reference protein sequences. To control false discovery rates for high-confidence peptide identification, spectra are also searched against decoy databases constructed by permuting reference protein sequences. In this case, a peptide of the same sequence could be included in both the target and the decoy databases or multiple entries of a same peptide could exist in the decoy database. These phenomena make the protein identification problem complicated. Thus, it is important to minimize the number of such redundant peptides for accurate protein identification. In this regard, we examined two popular methods for decoy database generation: 'pseudo-shuffling' and 'pseudo-reversing'. We experimented with target databases of varying sizes and investigated the effect of the maximum number of missed cleavage sites allowed in a peptide (MC), which is one of the parameters for target and decoy database generation. In our experiments, the level of redundancy in decoy databases was proportional to the target database size and the value of MC, due to the increase in the number of short peptides (7 to 10 AA). Moreover, 'pseudo-reversing' always generated decoy databases with lower levels of redundancy compared to 'pseudo-shuffling'.

A Machine Learning Based Method for the Prediction of G Protein-Coupled Receptor-Binding PDZ Domain Proteins

  • Eo, Hae-Seok;Kim, Sungmin;Koo, Hyeyoung;Kim, Won
    • Molecules and Cells
    • /
    • v.27 no.6
    • /
    • pp.629-634
    • /
    • 2009
  • G protein-coupled receptors (GPCRs) are part of multi-protein networks called 'receptosomes'. These GPCR interacting proteins (GIPs) in the receptosomes control the targeting, trafficking and signaling of GPCRs. PDZ domain proteins constitute the largest protein family among the GIPs, and the predominant function of the PDZ domain proteins is to assemble signaling pathway components into close proximity by recognition of the last four C-terminal amino acids of GPCRs. We present here a machine learning based approach for the identification of GPCR-binding PDZ domain proteins. In order to characterize the network of interactions between amino acid residues that contribute to the stability of the PDZ domain-ligand complex and to encode the complex into a feature vector, amino acid contact matrices and physicochemical distance matrix were constructed and adopted. This novel machine learning based method displayed high performance for the identification of PDZ domain-ligand interactions and allowed the identification of novel GPCR-PDZ domain protein interactions.

Optimization of Automated Suspension Trapping Digestion in Bottom-Up Proteomics via Mass Spectrometry

  • Haneul Song;Yejin Jeon;Iyun Choi;Minjoong Joo;Jong-Moon Park;Hookeun Lee
    • Mass Spectrometry Letters
    • /
    • v.15 no.1
    • /
    • pp.62-68
    • /
    • 2024
  • The Suspension Trapping (S-Trap) method has been a prominent sample preparation technique since its introduction in 2014. Its capacity to induce protein aggregation using organic solvents has significantly improved protein purification and facilitated peptide identification. However, its full potential for automation has been limited by the lack of a suitable liquid handling system until recently. In this study, we aimed to enhance the automation of S-Trap sample preparation by optimizing the S-Trap digestion process, incorporating triethylammonium bicarbonate (TEAB) and CaCl2. The utilization of TEAB buffer conditions in this innovative process led to a noteworthy 12% improvement in protein identification. Additionally, through careful observation of various incubation conditions, we streamlined the entire sample preparation workflow into a concise 4 hours timeline, covering reduction, alkylation, and trypsin incubation stages. This refined and expedited automated S-Trap digestion process not only showcased exceptional time efficiency but also improved trypsin digestion, resulting in increased protein identification.

Proteomic Studies in Plants

  • Park, Ohk-Mae K.
    • BMB Reports
    • /
    • v.37 no.1
    • /
    • pp.133-138
    • /
    • 2004
  • Proteomics is a leading technology for the high-throughput analysis of proteins on a genome-wide scale. With the completion of genome sequencing projects and the development of analytical methods for protein characterization, proteomics has become a major field of functional genomics. The initial objective of proteomics was the large-scale identification of all protein species in a cell or tissue. The applications are currently being extended to analyze various functional aspects of proteins such as post-translational modifications, protein-protein interactions, activities and structures. Whereas the proteomics research is quite advanced in animals and yeast as well as Escherichia coli, plant proteomics is only at the initial phase. Major studies of plant proteomics have been reported on subcellular proteomes and protein complexes (e.g. proteins in the plasma membranes, chloroplasts, mitochondria and nuclei). Here several plant proteomics studies will be presented, followed by a recent work using multidimensional protein identification technology (MudPIT).

LC-MS/MS Analysis of Surface Layer Proteins as a Useful Method for the Identification of Lactobacilli from the Lactobacillus acidophilus Group

  • Podlesny, Marcin;Jarocki, Piotr;Komon, Elwira;Glibowska, Agnieszka;Targonski, Zdzislaw
    • Journal of Microbiology and Biotechnology
    • /
    • v.21 no.4
    • /
    • pp.421-429
    • /
    • 2011
  • For precise identification of a Lactobacillus K1 isolate, LC-MS/MS analysis of the putative surface layer protein was performed. The results obtained from LTQ-FT-ICR mass spectrometry confirmed that the analyzed protein spot is the surface layer protein originating from Lb. helveticus species. Moreover, the identified protein has the highest similarity with the surface layer protein from Lb. helveticus R0052. To evaluate the proteomic study, multilocus sequence analysis of selected housekeeping gene sequences was performed. Combination of 16S rRNA sequencing with partial sequences for the genes encoding the RNA polymerase alpha subunit (rpoA), phenylalanyl-tRNA synthase alpha subunit (pheS), translational elongation factor Tu (tuf), and Hsp60 chaperonins (groEL) also allowed to classify the analyzed isolate as Lb. helveticus. Further classification at the strain level was achieved by sequencing of the slp gene. This gene showed 99.8% identity with the corresponding slp gene of Lb. helveticus R0052, which is in good agreement with data obtained by nano-HPLC coupled to an LTQ-FT-ICR mass spectrometer. Finally, LC-MS/MS analysis of surface layer proteins extracted from three other Lactobacillus strains proved that the proposed method is the appropriate molecular tool for the identification of S-layer-possessing lactobacilli at the species and even strain levels.

Identification and Characterization of pH-Regulated Genes in Saccharomyces cerevisiae

  • Hong, Sung-Ki;Choi, Eui-Yul
    • Journal of Microbiology
    • /
    • v.34 no.4
    • /
    • pp.327-333
    • /
    • 1996
  • Yeast, like many other microbes, encounters large variations in ambient pH in their natural environments. Microorganisms capable of growing over a wide pH range require a versatile, efficient pH homeostatic mechanism protecting intracellular processes against extremes of pH. In several organisms, fusions to the bacterial lacZ gene have been extremely useful for the identification of genes expressed at different time during the life cycle or under different growth conditions. In this study, using the lacZ gene screening system, we surveyed a large number of yeast strains with lacZ insertion to identify genes regulated by pH. A yeast genomic library was constructed and inserted with lacZ by a shuttle mutagenesis procedure. The yeast transformants were individually picked up with a toothpick, replica-plated, and grown in alkaline pH medium. Among the 35,000 colonies screened, 10 candidate strains were identified initially by the $\beta$-gal assay. We finally confirmed two yeast strains carrying the genes whose expression are strictly dependent on pH of growth medium. One of the fusions showing a 10-fold induction in expression level in response to alkali pH was selected and further characterized. The pH-regulated gene was cloned by inverse PCR and a partial sequence of the gene was determined. Identification and characterization of the gene is currently under investigation.

  • PDF