• Title/Summary/Keyword: Protein prediction

Search Result 476, Processing Time 0.028 seconds

In Silico Functional Assessment of Sequence Variations: Predicting Phenotypic Functions of Novel Variations

  • Won, Hong-Hee;Kim, Jong-Won
    • Genomics & Informatics
    • /
    • v.6 no.4
    • /
    • pp.166-172
    • /
    • 2008
  • A multitude of protein-coding sequence variations (CVs) in the human genome have been revealed as a result of major initiatives, including the Human Variome Project, the 1000 Genomes Project, and the International Cancer Genome Consortium. This naturally has led to debate over how to accurately assess the functional consequences of CVs, because predicting the functional effects of CVs and their relevance to disease phenotypes is becoming increasingly important. This article surveys and compares variation databases and in silico prediction programs that assess the effects of CVs on protein function. We also introduce a combinatorial approach that uses machine learning algorithms to improve prediction performance.

A Study on the Prediction of Parturient Syndrome in Holstein Cows (젖소에서의 산욕기질병 발생예견에 관한 연구)

  • Youn Hwa-Young;Choi Hee-In
    • Journal of Veterinary Clinics
    • /
    • v.2 no.1
    • /
    • pp.133-141
    • /
    • 1985
  • In order to establish a method predicting susceptible cows to the parturient syndrome, various serum chemical parameters (calcium, phosphorus, Ca/P, magnesium, cholesterol, total protein, albumin, globulin, A/G, total lipid, non-esterified fatty acid(NEFA) and aspartate aminotransferase(AST)) were measured during late pregnancy and their relationships with periparturient diseases were investigated during puerpural period. The results obtained were as follows : 1. The factors affecting the prediction of susceptible cows to parturient syndrome were calcium, magnesium, total protein, globulin, A/G ratio and total lipid at 30 day antepartum and the diagnosability was 70.7%. 2. In the experimental cows producing more than 21kg of milk per day, the factors affecting the prediction of susceptible cows to parturient syndrome were calcium, NEFA and A/G ratio at 30 day antepartum and the diagnosability was 66.7%. 3. In the experimental cows calved more than 3 times, the factors affecting the perdiction of susceptible cows to parturient syndrome were calcium, total protein, albumin and NEFA at 30 day antepartum and the diagnosability was 83.3%.

  • PDF

A new method to predict the protein sequence alignment quality (단백질 서열정렬 정확도 예측을 위한 새로운 방법)

  • Lee, Min-Ho;Jeong, Chan-Seok;Kim, Dong-Seop
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.82-87
    • /
    • 2006
  • The most popular protein structure prediction method is comparative modeling. To guarantee accurate comparative modeling, the sequence alignment between a query protein and a template should be accurate. Although choosing the best template based on the protein sequence alignments is most critical to perform more accurate fold-recognition in comparative modeling, even more critical is the sequence alignment quality. Contrast to a lot of attention to developing a method for choosing the best template, prediction of alignment accuracy has not gained much interest. Here, we develop a method for prediction of the shift score, a recently proposed measure for alignment quality. We apply support vector regression (SVR) to predict shift score. The alignment between a query protein and a template protein of length n in our own library is transformed into an input vector of length n +2. Structural alignments are assumed to be the best alignment, and SVR is trained to predict the shift score between structural alignment and profile-profile alignment of a query protein to a template protein. The performance is assessed by Pearson correlation coefficient. The trained SVR predicts shift score with the correlation between observed and predicted shift score of 0.80.

  • PDF

Prediction of Implicit Protein - Protein Interaction Using Optimal Associative Feature Rule (최적 연관 속성 규칙을 이용한 비명시적 단백질 상호작용의 예측)

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2006
  • Proteins are known to perform a biological function by interacting with other proteins or compounds. Since protein interaction is intrinsic to most cellular processes, prediction of protein interaction is an important issue in post-genomic biology where abundant interaction data have been produced by many research groups. In this paper, we present an associative feature mining method to predict implicit protein-protein interactions of Saccharomyces cerevisiae from public protein interaction data. We discretized continuous-valued features by maximal interdependence-based discretization approach. We also employed feature dimension reduction filter (FDRF) method which is based on the information theory to select optimal informative features, to boost prediction accuracy and overall mining speed, and to overcome the dimensionality problem of conventional data mining approaches. We used association rule discovery algorithm for associative feature and rule mining to predict protein interaction. Using the discovered associative feature we predicted implicit protein interactions which have not been observed in training data. According to the experimental results, the proposed method accomplished about 96.5% prediction accuracy with reduced computation time which is about 29.4% faster than conventional method with no feature filter in association rule mining.

Variation in Energy and Nutrient Composition of Oilseed Meals from Different Countries (수입 박류사료내 에너지 및 영양소 함량의 변이)

  • Son, Ah Reum
    • Korean Journal of Poultry Science
    • /
    • v.47 no.2
    • /
    • pp.107-114
    • /
    • 2020
  • This study was conducted to investigate the variation in nutrient composition of oilseed meals and to develop prediction equations for amino acid concentrations. Energy and nutrient contents were determined in a total of 1,380 feed ingredient samples including copra byproducts, corn distillers, dried grains with solubles, palm kernel byproducts, and soybean meal. The ingredient samples were imported to the Republic of Korea between 2006 and 2015. Data were analyzed using the MIXED procedure of SAS. The regression procedure of SAS was used to generate the prediction equation for the lysine concentration using the crude protein (CP) concentration as an independent variable. The concentrations of moisture, gross energy, CP, ether extract, crude fiber, ash, calcium, phosphorus, lysine, methionine, cysteine, and threonine in tested oilseed meals differed (P<0.05) depending on producing countries. The prediction equations for amino acid concentrations (% as-is basis) in the oilseed meals are: lysine = -1.08 + 0.080 × CP (root mean square error = 0.244, R2 = 0.924, and P<0.001); threonine = -0.297 + 0.044 × CP (root mean square error = 0.099, R2 = 0.958, and P<0.001). In conclusion, energy and nutrient compositions vary in the oilseed meals depending on the producing countries. Moreover, the crude protein concentration can be used as a suitable independent variable for estimating lysine and threonine concentrations in the oilseed meals.

Classification Protein Subcellular Locations Using n-Gram Features (단백질 서열의 n-Gram 자질을 이용한 세포내 위치 예측)

  • Kim, Jinsuk
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.12-16
    • /
    • 2007
  • The function of a protein is closely co-related with its subcellular location(s). Given a protein sequence, therefore, how to determine its subcellular location is a vitally important problem. We have developed a new prediction method for protein subcellular location(s), which is based on n-gram feature extraction and k-nearest neighbor (kNN) classification algorithm. It classifies a protein sequence to one or more subcellular compartments based on the locations of top k sequences which show the highest similarity weights against the input sequence. The similarity weight is a kind of similarity measure which is determined by comparing n-gram features between two sequences. Currently our method extract penta-grams as features of protein sequences, computes scores of the potential localization site(s) using kNN algorithm, and finally presents the locations and their associated scores. We constructed a large-scale data set of protein sequences with known subcellular locations from the SWISS-PROT database. This data set contains 51,885 entries with one or more known subcellular locations. Our method show very high prediction precision of about 93% for this data set, and compared with other method, it also showed comparable prediction improvement for a test collection used in a previous work.

  • PDF

Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM (SVM 모델을 이용한 3차원 패치 기반 단백질 상호작용 사이트 예측기법)

  • Park, Sung-Hee;Hansen, Bjorn
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.21-28
    • /
    • 2012
  • Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.

Architectures of Convolutional Neural Networks for the Prediction of Protein Secondary Structures (단백질 이차 구조 예측을 위한 합성곱 신경망의 구조)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.5
    • /
    • pp.728-733
    • /
    • 2018
  • Deep learning has been actively studied for predicting protein secondary structure based only on the sequence information of the amino acids constituting the protein. In this paper, we compared the performances of the convolutional neural networks of various structures to predict the protein secondary structure. To investigate the optimal depth of the layer of neural network for the prediction of protein secondary structure, the performance according to the number of layers was investigated. We also applied the structure of GoogLeNet and ResNet which constitute building blocks of many image classification methods. These methods extract various features from input data, and smooth the gradient transmission in the learning process even using the deep layer. These architectures of convolutional neural networks were modified to suit the characteristics of protein data to improve performance.

Prediction of Protein Kinase Specific Phosphorylation Sites with Multiple SVMs

  • Lee, Won-Chul;Kim, Dong-Sup
    • Bioinformatics and Biosystems
    • /
    • v.2 no.1
    • /
    • pp.28-32
    • /
    • 2007
  • The protein phosphorylation is one of the important processes in the cell signaling pathway. A variety of protein kinase families are involved in this process, and each kinase family phosphorylates different kinds of substrate proteins. Many methods to predict the kinase-specific phosphoryrated sites or different types of phosphorylated residues (Serine/Threonine or Tyrosin) have been developed. We employed Supprot Vector Machine (SVM) to attempt the prediction of protein kinase specific phosphorylation sites. 10 different kinds of protein kinase families (PKA, PKC, CK2, CDK, CaM-KII, PKB, MAPK, EGFR) were considered in this study. We defined 9 residues around a phosphorylated residue as a deterministic instance from which protein kinases determine whether they act on. The subsets of PSI-BALST profile was converted to the numerical vectors to represent positive or negative instances. When SVM training, We took advantage of multiple SVMs because of the unbalanced training sets. Representative negative instances were drawn multiple times, and generated new traing sets with the same positive instances in the original traing set. When testing, the final decisions were made by the votes of those multiple SVMs. Generally, RBF kernel was used for the SVMs, and several parameters such as gamma and cost factor were tested. Our approach achieved more than 90% specificity throughout the protein kinase families, while the sensitivities recorded 60% on average.

  • PDF

AllEC: An Implementation of Application for EC Numbers Prediction based on AEC Algorithm

  • Park, Juyeon;Park, Mingyu;Han, Sora;Kim, Jeongdong;Oh, Taejin;Lee, Hyun
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.201-212
    • /
    • 2022
  • With the development of sequencing technology, there is a need for technology to predict the function of the protein sequence. Enzyme Commission (EC) numbers are becoming markers that distinguish the function of the sequence. In particular, many researchers are researching various methods of predicting the EC numbers of protein sequences based on deep learning. However, as studies using various methods exist, a problem arises, in which the exact prediction result of the sequence is unknown. To solve this problem, this paper proposes an All Enzyme Commission (AEC) algorithm. The proposed AEC is an algorithm that executes various prediction methods and integrates the results when predicting sequences. This algorithm uses duplicates to give more weights when duplicate values are obtained from multiple methods. The largest value, among the final prediction result values for each method to which the weight is applied, is the final prediction result. Moreover, for the convenience of researchers, the proposed algorithm is provided through the AllEC web services. They can use the algorithms regardless of the operating systems, installation, or operating environment.