DOI QR코드

DOI QR Code

Classifying Biomedical Literature Providing Protein Function Evidence

  • Lim, Joon-Ho (SW & Contents Research Laboratory, ETRI) ;
  • Lee, Kyu-Chul (Department of Computer Science & Engineering, Chungnam National University)
  • Received : 2014.01.29
  • Accepted : 2015.06.09
  • Published : 2015.08.01

Abstract

Because protein is a primary element responsible for biological or biochemical roles in living bodies, protein function is the core and basis information for biomedical studies. However, recent advances in bio technologies have created an explosive increase in the amount of published literature; therefore, biomedical researchers have a hard time finding needed protein function information. In this paper, a classification system for biomedical literature providing protein function evidence is proposed. Note that, despite our best efforts, we have been unable to find previous studies on the proposed issue. To classify papers based on protein function evidence, we should consider whether the main claim of a paper is to assert a protein function. We, therefore, propose two novel features - protein and assertion. Our experimental results show a classification performance with 71.89% precision, 90.0% recall, and a 79.94% F-measure. In addition, to verify the usefulness of the proposed classification system, two case study applications are investigated - information retrieval for protein function and automatic summarization for protein function text. It is shown that the proposed classification system can be successfully applied to these applications.

Keywords

References

  1. H. Lodish et al., "Molecular Cell Biology," 5th ed., New York, USA: W.H. Freeman, 2004.
  2. B. Rost et al., "Automatic Prediction of Protein Function," Cell Molecular Life Sci., vol. 60, no. 12, Dec. 2003, pp. 2637-2650. https://doi.org/10.1007/s00018-003-3114-8
  3. A. Kolchinsky et al., "Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features," IEEE/ACM Trans. Comput. Biology Bioinformat., vol. 7, no. 3, July-Sept. 2010, pp. 400-411. https://doi.org/10.1109/TCBB.2010.55
  4. Y. Chen, P. Hou, and B. Manderick, "An Ensemble Self-Training Protein Interaction Article Classifier," BioMed. Mater. Eng., vol. 24, no. 1, 2014, pp. 1323-1332.
  5. Y. Chen, Y. Sun, and B.-Q. Han, "Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection," BioMed Res. Int., Article ID 751646.
  6. F.C. Garcia et al., "Attribute Analysis in Biomedical Text Classification," Proc. BioCreative Challenge Evaluation Workshop, Madrid, Spain, Apr. 23-25, 2007, pp. 113-118.
  7. S. Matos and J.L. Oliveira, "Classification Methods for Finding Articles Describing Protein-Protein Interactions in PubMed," J. Integr. Bioinformat., vol. 8, no. 3, Sept. 2011, pp. 178-190.
  8. Y. Li, H. Lin, and Z. Yang, "Two Approaches for Biomedical Text Classification," Int. Conf. Bioinformat. Biomed. Eng., Wuhan, China, July 6-8, 2007, pp. 310-313.
  9. A.M. Cohen, "An Effective General Purpose Approach for Automated Biomedical Document Classification," AMIA Annual Symp. Proc., 2006, pp. 161-165.
  10. R.B. Dollah and M. Aono, "Ontology Based Approach for Classifying Biomedical Text Abstracts," Int. J. Data Eng., vol. 2, no. 1, 2011, pp. 1-15.
  11. C. Sibunruang and J. Polpinij, "Ontology-Based Text Classification for Filtering Cholangiocarcinoma Documents from PubMed," Int. Conf. Brain Informat. Health, Warsaw, Poland, Aug. 11-14, 2014, pp. 266-277.
  12. N. Polavarapu et al., "Investigation into Biomedical Literature Classification Using Support Vector Machines," IEEE Comput. Syst. Bioinformat. Conf., Stanford, CA, USA, Aug. 8-11, 2005, pp. 366-374.
  13. M. Krallinger, F. Leitner, and A. Valencia, "Retrieval and Discovery of Cell Cycle Literature and Proteins by Means of Machine Learning, Text Mining and Network Analysis," Int. Conf. Practical Appl. Comput. Biology Bioinformat., Salamanca, Spain, June 4-6, 2014, pp. 285-292.
  14. M. Conway et al., "Classifying Disease Outbreak Reports Using N-Grams and Semantic Features," Int. J. Med. Informat., vol. 78, no. 12, Dec. 2009, pp. e47-e58. https://doi.org/10.1016/j.ijmedinf.2009.03.010
  15. H.C. Jang et al., "Finding the Evidence for Protein-Protein Interactions from PubMed Abstracts," Bioinfomat., vol. 22, no. 14, 2006, pp. e220-e226. https://doi.org/10.1093/bioinformatics/btl203
  16. L. Li et al., "An Approach to Improve Kernel-Based Protein-Protein Interaction Extraction by Learning from Large-Scale Network Data," Methods, Apr. 2015.
  17. N. Papanikolaou et al., "Protein-Protein Interaction Predictions Using Text Mining Methods," Methods, vol. 74, no. 1, Mar. 2015, pp. 47-53. https://doi.org/10.1016/j.ymeth.2014.10.026
  18. D. Kwon et al., "Assisting Manual Literature Curation for Protein-Protein Interactions Using BioQRator," Database, vol. 2014, July 2014. pp. 1-7.
  19. H. Almeida et al., "Machine Learning for Biomedical Literature Triage," PLoS ONE, vol. 9, no. 12, Dec. 2014, pp. 1-21.
  20. UniProt Consortium, "Ongoing and Future Developments at the Universal Protein Resource," Nucleic Acids Res., vol. 39, Jan. 2011, pp. D214-D219. https://doi.org/10.1093/nar/gkq1020
  21. A.L. Berger, V.D. Pietra, and S.D. Pietra, "A Maximum Entropy Approach to Natural Language Processing," Comput. Linguistics, vol. 22, no. 1, Mar. 1996, pp. 39-71.
  22. A. Zaeri and M. Nematbakhsh, "A Framework for Semantic Interpretation of Noun Compounds Using Tratz Model and Binary Features," ETRI J., vol. 34, no. 5, Oct. 2012, pp. 743-752. https://doi.org/10.4218/etrij.12.0111.0673
  23. S. Lim et al., "Domain-Adaptation Technique for Semantic Role Labeling with Structural Learning," ETRI J., vol. 36, no. 3, June 2014, pp. 429-438. https://doi.org/10.4218/etrij.14.0113.0645
  24. Y. Bae, P. Ryu, and H. Kim, "Predicting the Lifespan and Retweet Times of Tweets Based on Multiple Feature Analysis," ETRI J., vol. 36, no. 3, June 2014, pp. 418-428. https://doi.org/10.4218/etrij.14.0113.0657
  25. D. Radev et al., "MEAD - a Platform for Multidocument Multilingual Text Summarization," Proc. Int. Conf. Language Resources Evaluation, Lisbon, Portugal, May 26-28, 2004, pp. 699-702.

Cited by

  1. Automated Confirmation of Protein Annotation Using NLP and the UniProtKB Database vol.11, pp.1, 2021, https://doi.org/10.3390/app11010024