DOI QR코드

DOI QR Code

Graph-based modeling for protein function prediction

단백질 기능 예측을 위한 그래프 기반 모델링

  • 황두성 (단국대학교 컴퓨터과학과) ;
  • 정재영 (한국전자통신연구원)
  • Published : 2005.04.01

Abstract

The use of protein interaction data is highly reliable for predicting functions to proteins without function in proteomics study. The computational studies on protein function prediction are mostly based on the concept of guilt-by-association and utilize large-scale interaction map from revealed protein-protein interaction data. This study compares graph-based approaches such as neighbor-counting and $\chi^2-statistics$ methods using protein-protein interaction data and proposes an approach that is effective in analyzing large-scale protein interaction data. The proposed approach is also based protein interaction map but sequence similarity and heuristic knowledge to make prediction results more reliable. The test result of the proposed approach is given for KDD Cup 2001 competition data along with those of neighbor-counting and $\chi^2-statistics$ methods.

단백질 상호작용 데이터는 현 생물정보학에서 기능이 알려져 있지 않은 단백질의 기능 예측에 높은 신뢰성이 있는 프로티오믹스의 계산 모델에 이용되고 있다. 단백질 기능 예측 관련 연구로는 guilt-by-association 개념을 바탕으로 대규모의 단순 2차원 단백질-단백질 상호작용 맵을 이용하고 있다. 본 논문에서는 단백질-단백질 상호작용 데이터를 이용한 그래프 기반 기능 예측 방법인 neighbor-counting, $\chi^2$-통계치 예측 모델을 살펴보고 대량의 상호작용 데이터로부터 빠른 기능예측에 효과적인 알고리즘을 제안한다. 제안하는 알고리즘은 단백질 상호작용 맵, 서열 유사성 및 경험적 전문가 지식을 이용하는 그래프 기반 모델이다. 제안된 알고리즘은 Yeast 단백질의 기능 예측을 수행하였으며, neighbor-counting, $\chi^2$-통계치 모델의 실험 결과와 비교되었다.

Keywords

References

  1. N. M. Luscombe et al., What is bioinformatics? An introduction and overview, International Medical Informatics Association Yearbook, p 83-100, 2001
  2. B. Schwikowski et al, A network of protein-protein interactions in yeast, Nature Biotechnology, p 1257-1261, no 3, vol 8, 2000 https://doi.org/10.1038/82360
  3. P. Baldi et al., Bioinformatics: The Machine Learning Approach, The MlT Press, 2003
  4. M. Fellenberg et al., Integrative Analysis of Protein Interaction Data, Intelligent Systems for Molecular Biology, AAAI Press, p 152-161, vol 8, 2000
  5. T. Ito et al., Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins, Proceedings of the National Academy of Sciences, p 4569-4574, vol 97, 2000 https://doi.org/10.1073/pnas.97.3.1143
  6. C. L. Tucker et al., Towards an understanding of complex protein networks, TRENDS in cell biology, p 102-106, no 3, vol 11, 2001 https://doi.org/10.1016/S0962-8924(00)01902-4
  7. J. Cheng et al, KDD Cup 2001 Report, SIGKDD Exploration, p 47-64, vol 3, 2001 https://doi.org/10.1145/507515.507523
  8. S. Oliver, Guilt-by-association goes global, Nature, p 601-603, vol 403, 2002 https://doi.org/10.1038/35001165
  9. T. Mitchell, Machine Learning, McGraw Hill, 1997
  10. H. Hishigaki et aI., Assessment of prediction accuracy of protein function from protein-protein interaction data, yeast, p 523-531, vol 18, 2001 https://doi.org/10.1002/yea.706
  11. A. Clare and R. D. King, Machine learning of functional class from phenotype data, Bioinformatics, p 160-166, vol 18, 2002 https://doi.org/10.1093/bioinformatics/18.1.160
  12. Minghua Deng, Shipra Mehta, Ting Chen, Fengzhu Sun, Predictions of protein function using protein-protein interaction data, The first IEEE Computer Society bioinformatics conference, CSB2002, 2002 https://doi.org/10.1109/CSB.2002.1039342
  13. MIPS Yeast Data, http://mips.gsf.de/proj/yeast/
  14. R. Saito et al., Interaction generality, a measurement to assess the reliability of a protein-protein interaction, Nucleic Acids Research, p 1163-1168, no 5, vol 30, 2002 https://doi.org/10.1093/nar/30.5.1163