• Title/Summary/Keyword: redundant peptides

Search Result 4, Processing Time 0.015 seconds

Theoretical Peptide Mass Distribution in the Non-Redundant Protein Database of the NCBI

  • Lim Da-Jeong;Oh Hee-Seok;Kim Hee-Bal
    • Genomics & Informatics
    • /
    • v.4 no.2
    • /
    • pp.65-70
    • /
    • 2006
  • Peptide mass mapping is the matching of experimentally generated peptides masses with the predicted masses of digested proteins contained in a database. To identify proteins by matching their constituent fragment masses to the theoretical peptide masses generated from a protein database, the peptide mass fingerprinting technique is used for the protein identification. Thus, it is important to know the theoretical mass distribution of the database. However, few researches have reported the peptide mass distribution of a database. We analyzed the peptide mass distribution of non-redundant protein sequence database in the NCBI after digestion with 15 different types of enzymes. In order to characterize the peptide mass distribution with different digestion enzymes, a power law distribution (Zipfs law) was applied to the distribution. After constructing simulated digestion of a protein database, rank-frequency plot of peptide fragments was applied to generalize a Zipfs law curve for all enzymes. As a result, our data appear to fit Zipfs law with statistically significant parameter values.

Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis (탠덤 질량 분석을 위한 디코이 데이터베이스 생성 방법의 중복성 관점에서의 성능 평가)

  • Li, Honglan;Liu, Duanhui;Lee, Kiwook;Hwang, Kyu-Baek
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.56-60
    • /
    • 2016
  • Peptide identification in tandem mass spectrometry is usually done by searching the spectra against target databases consisting of reference protein sequences. To control false discovery rates for high-confidence peptide identification, spectra are also searched against decoy databases constructed by permuting reference protein sequences. In this case, a peptide of the same sequence could be included in both the target and the decoy databases or multiple entries of a same peptide could exist in the decoy database. These phenomena make the protein identification problem complicated. Thus, it is important to minimize the number of such redundant peptides for accurate protein identification. In this regard, we examined two popular methods for decoy database generation: 'pseudo-shuffling' and 'pseudo-reversing'. We experimented with target databases of varying sizes and investigated the effect of the maximum number of missed cleavage sites allowed in a peptide (MC), which is one of the parameters for target and decoy database generation. In our experiments, the level of redundancy in decoy databases was proportional to the target database size and the value of MC, due to the increase in the number of short peptides (7 to 10 AA). Moreover, 'pseudo-reversing' always generated decoy databases with lower levels of redundancy compared to 'pseudo-shuffling'.

Proteomics Data Analysis using Representative Database

  • Kwon, Kyung-Hoon;Park, Gun-Wook;Kim, Jin-Young;Park, Young-Mok;Yoo, Jong-Shin
    • Bioinformatics and Biosystems
    • /
    • v.2 no.2
    • /
    • pp.46-51
    • /
    • 2007
  • In the proteomics research using mass spectrometry, the protein database search gives the protein information from the peptide sequences that show the best match with the tandem mass spectra. The protein sequence database has been a powerful knowledgebase for this protein identification. However, as we accumulate the protein sequence information in the database, the database size gets to be huge. Now it becomes hard to consider all the protein sequences in the database search because it consumes much computing time. For the high-throughput analysis of the proteome, usually we have used the non-redundant refined database such as IPI human database of European Bioinformatics Institute. While the non-redundant database can supply the search result in high speed, it misses the variation of the protein sequences. In this study, we have concerned the proteomics data in the point of protein similarities and used the network analysis tool to build a new analysis method. This method will be able to save the computing time for the database search and keep the sequence variation to catch the modified peptides.

  • PDF

Proteomic analysis of Korean mothers' human milk at different lactation stages; postpartum 1, 3, and 6 weeks (출산 후 경과한 날에 따른 한국인 산모의 모유 단백체 분석)

  • Park, Jong-Moon;lee, Hookeun;Song, Seunghyun;Hahn, Won-Ho;Kim, Mijeong;Lee, Joohyun;Kang, Nam Mi
    • Analytical Science and Technology
    • /
    • v.30 no.6
    • /
    • pp.348-354
    • /
    • 2017
  • In this study, patterns of proteome expression were monitored and specifically expressed proteins in human milk were detected in collected human milk after 1 week, 3 weeks, and 6 weeks from delivery. A quantitative shotgun proteomic approach was used to identify human milk proteins and reveal their relative expression amounts. For each sample, two independent human milk samples from two mothers were pooled, and then three replicated shotgun proteomic analyses were carried out. Casein, which is a highly abundant protein in human milk, was removed, and then trypsin was treated to produce a digested peptide mixture. The peptides were loaded in the home-made reversed-phase C18 fused-silica capillary column, and then the eluted peptides were analyzed by using a linear ion-trap mass spectrometer. The relative quantitation of proteins was performed by the normalized spectral count method. For each sample, 81-109 non-redundant proteins were identified. The identified proteins consisted of glycoproteins, metabolic enzyme, and chaperon enzymes such as lactoferrin, carboxylic ester hydrolase, and clusterin. The comparative analysis for the 63 proteins, which were reproducibly identified in all three replications, revealed that 25 proteins were statically significant differentially expressed. Among the differentially expressed proteins, Ig lambda-7 chain C region and tenascin drastically decreased with the delivery time.