생의학 텍스트 마이닝: 새로운 생의학 지식 발견 방법 연구 동향

  • Published : 2015.04.16

Abstract

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. MeSH, http://www.ncbi.nlm.nih.gov/mesh
  2. PubMed, http://www.ncbi.nlm.nih.gov/pubmed
  3. Hersh, W., Buckley, C., Leone, T. J., and Hickam, D., "OHSUMED: An interactive retrieval evaluation and new large test collection for research," In SIGIR'94, Springer London, pp. 192-201, Jan. 1994.
  4. GENIA, http://www.nactem.ac.uk/genia/
  5. TREC Genomics Track data, http://trec.nist.gov/data/genomics.html
  6. BioCreAtIve, http://www.biocreative.org/
  7. PennBioIE, http://curtis.ml.cmu.edu/w/courses/index.php/PennBioIE
  8. CALBC, http://www.ebi.ac.uk/Rebholz-srv/CALBC/corpora/semantic.html
  9. CRAFT, http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml
  10. Tanabe, L., Xie, N., Thom, L. H., Matten, W, and Wilbur, W J., "GENETAG: a tagged corpus for gene/protein named entity recognition," BMC bioinformatics, Vol 6. No. Suppl 1, S3, 2005.
  11. AnEM, http://www.nactem.ac.uk/anatomy/
  12. NCBI disease, http://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/
  13. DDI drug, http://omictools.com/ddi-corpus-s6306.html
  14. Metabolite and enzyme, http://www.nactem.ac.uk/metabolite-corpus/
  15. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., and McClosky, D., "The Stanford CoreNLP Natural Language Processing Toolkit," In Proceedings of 5 2nd Annual Meeting of the Association for Computational Linguistics, System Demonstrations, pp. 55-60.
  16. Apache OpenNLP, https://opennlp.apache.org/
  17. Tsuruoka, Y., and Tsujii, J. I., "Boosting precision and recall of dictionary-based protein name recognition," In Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine-Volume 13, Association for Computational Linguistics, pp. 41-48, July, 2003.
  18. Gaizauskas, R., Demetriou, G., Artymiuk, P. J., and Willett, P., "Protein structures and information extraction from biological texts: the PASTA system," Bioinformatics, Vol. 19, No. 1, pp. 135-143, 2003. https://doi.org/10.1093/bioinformatics/19.1.135
  19. Kazama, J. I., Makino, T., Ohta, Y, and Tsujii, J. I., "Tuning support vector machines for biomedical named entity recognition," In Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain-Volume 3, Association for Computational Linguistics, pp. 1-8. July. 2002.
  20. Collier, N., Nobata, C., and Tsujii, J. I., "Extracting the names of genes and gene products with a hidden Markov model," In Proceedings of the 18th conference on Computational linguistics-Volume 1, Association for Computational Linguistics, pp. 201-207, July. 2000.
  21. Settles, B., "Biomedical named entity recognition using conditional random fields and rich feature sets," In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, Association for Computational Linguistics, pp. 104-107, Aug. 2004.
  22. Chen, E. S., Hripcsak, G., Xu, H., Markatou, M., and Friedman, C., "Automated acquisition of disease - drug knowledge from biomedical and clinical documents: an initial study," Journal of the American Medical Informatics Association, Vol. 15, No.1, 87-98, 2008. https://doi.org/10.1197/jamia.M2401
  23. Saric, J., Jensen, L. J., Ouzounova, R., Rojas, I., and Bork, P., "Extraction of regulatory gene/protein networks from Medline," Bioinformatics, Vol. 22, No. 6, pp.645-650, 2006. https://doi.org/10.1093/bioinformatics/bti597
  24. Hakenberg, J., Plake, C., Leser, U., Kirsch, H., and Rebholz-Schuhmann, D., "LLL'05 challenge: Genic interaction extraction-identification of language patterns based on alignment and finite state automata," In Proceedings of the 4th Learning Language in Logic workshop (LLL05) pp. 38-45, Aug. 2005.
  25. Bundschus, M., Dejori, M., Stetter, M., Tresp, v., and Kriegel, H. P., "Extraction of semantic biomedical relations from text using conditional random fields," BMC bioinformatics, Vol 9, No. 1, 2008.
  26. Fundel, K., Kuffner, R., and Zimmer, R., "RelEx-Relation extraction using dependency parse trees. Bioinformatics," Vol 23, No. 3, pp. 365-371. 2007. https://doi.org/10.1093/bioinformatics/btl616
  27. Rinaldi, F., Schneider, G., Kaljurand, K., Hess, M., Andronis, C., Konstandi, O., and Persidis, A., "Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach," Artificial intelligence in medicine, Vol 39, No. 2, pp. 127-136, 2007. https://doi.org/10.1016/j.artmed.2006.08.005
  28. Jonnalagadda, S., Cohen, T., Wu, S., and Gonzalez, G., "Enhancing clinical concept extraction with distributional semantics," Journal of biomedical informatics, Vol. 45, No. 1, pp. 129-140, 2012. https://doi.org/10.1016/j.jbi.2011.10.007
  29. Dobrokhotov, P. B., Goutte, C., Veuthey, A. L., and Gaussier, E., "Assisting medical annotation in Swiss-Prot using statistical classifiers," International journal of medical informatics, Vol 74, No. 2, pp. 317-324, 2005. https://doi.org/10.1016/j.ijmedinf.2004.04.017
  30. Wu, Y., Liu, M., Zheng, W. J., Zhao, Z., and Xu, H., "Ranking gene-drug relationships in biomedical literature using latent dirichlet allocation," In Pacific symposium on biocomputing, pp. 422-433, 2012.
  31. Swanson, D. R, "Undiscovered public knowledge," The Library Quarterly, Vol 52, No.2, pp. 103-118, 1986a. https://doi.org/10.1086/601199
  32. Swanson, D. R., "Fish oil, Raynaud's syndrome, and undiscovered public knowledge," Perspectives in Biology and Medicine, Vol. 30, No. 1, pp. 7-18, 1986b. https://doi.org/10.1353/pbm.1986.0087
  33. DiGiacomo, A., Kremer, J. M., and Shah, D. M, "Fish oil dietary supplementation in patients with Raynaud's phenomenon: A doubleblind, controlled, prospective study," American Journal of Medicine, Vol 8, pp. 158 - 164, 1989.
  34. Swanson, D. R., "Migraine and magnesium: eleven neglected connections," Perspectives in biology and medicine, Vol. 31, No. 4, pp. 526-557, 1988. https://doi.org/10.1353/pbm.1988.0009
  35. Swanson, D. R, "A second example of mutually isolated medical literatures related by implicit, unnoticed connections," Journal of the American Society for Information Science, 40(6), pp. 432-435, 1989b. https://doi.org/10.1002/(SICI)1097-4571(198911)40:6<432::AID-ASI5>3.0.CO;2-#
  36. Gallai, V., Sarchielli, P., Coata, G., Firenze, C., Morucci, P., and Abbritti, G., "Serum and salivary magnesium levels in migraine. Results in a group of juvenile patients," Headache: The Journal of Head and Face Pain, Vo. 32, No. 3, pp. 132-135, 1992. https://doi.org/10.1111/j.1526-4610.1992.hed3203132.x
  37. Swanson, D. R., "Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspectives in biology and medicine, 33(2), 157, 1990a https://doi.org/10.1353/pbm.1990.0031
  38. Swanson, D. R, and Smalheiser, N. R, "An interactive system for finding complementary literatures: a stimulus to scientific discovery," Artificial intelligence, Vol. 91, No. 2, pp. 183-203, 1997. https://doi.org/10.1016/S0004-3702(97)00008-8
  39. Weeber, M., Klein, H., de long van den Berg, L., and Vos, R, "Using concepts in literature based discovery: Simulating Swanson's Raynaud - fish oil and migraine - magnesium discoveries," Journal of the American Society for Information Science and Technology, Vol. 52, No. 7, pp. 548-557. 2001. https://doi.org/10.1002/asi.1104
  40. Gordon, M. D., and Dumais, S., "Using latent semantic indexing for literature based discovery," Journal of the American Society for Information Science, Vol. 49, No. 8, pp. 674-685, 1998. https://doi.org/10.1002/(SICI)1097-4571(199806)49:8<674::AID-ASI2>3.0.CO;2-T
  41. Abate, F., Ficana, E., Acquaviva, A., and Macii, E., "Improving latent semantic analysis of biomedical literature integrating UMLS Metathesaurus and biomedical pathways databases," In Biomedical Engineering Systems and Technologies, Springer Berlin Heidelberg, pp. 173-187, 2013
  42. Swanson, D. R., Smalheiser, N. R., andTorvik, V. I., "Ranking indirect connections in literature based discovery: The role of medical subject headings," Journal of the American Society for Information Science and Technology, Vol. 57, No. 11, pp. 1427-1439, 2006. https://doi.org/10.1002/asi.20438
  43. Hristovski, D., Peterlin, B., Mitchell, J. A., and Humphrey, S. M., "Using literature-based discovery to identify disease candidate genes," International journal of medical informatics, Vol. 74, No.2, pp. 289-298, 2005. https://doi.org/10.1016/j.ijmedinf.2004.04.024
  44. Hristovski, D., Friedman, C., Rindflesch, T. C., and Peterlin, B., "Exploiting semantic relations for literature-based discovery". In AMlA annual symposium proceedings, American Medical Informatics Association, pp. 349-353, 2006.
  45. Hristovski, D., Kastrin, A., Peterlin, B., and Rindflesch, T. C., "Combining semantic relations and DNA microarray data for novel hypotheses generation". In Linking literature, information, and knowledge for biology, Springer Berlin Heidelberg, pp. 53-61, 2010.
  46. Hristovski, D., Rindflesch, T., and Peterlin, B., "Using literature-based discovery to identify novel therapeutic approaches," Cardiovascular and Hematological Agents in Medicinal Chemistry, Vol 11, NO. 1, pp. 14-24, 2013. https://doi.org/10.2174/1871525711311010005
  47. Wilkowski, B., Fiszman, M., Miller, C., Hristovski, D., Arabandi, S., Rosemblat, G., and Rindflesch, T., "Discovery browsing with semantic predications and graph theory". In AMlA Annual Symposium Proceedings, 2011.
  48. Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A. P., and Rindflesch, T. C., "A graph-based recovery and decomposition of swanson's hypothesis using semantic predications," Journal of Biomedical Informatics, Vol. 46, No. 2, pp. 238-251, 2013. https://doi.org/10.1016/j.jbi.2012.09.004
  49. Heo, G. E., and Song, M., "lnferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model," Journal of the Korean Society for information Management, Vol. 31, No. 1, pp. 231-250, 2014. (in Korean) https://doi.org/10.3743/KOSIM.2014.31.1.231
  50. Ahn, H., Song M. and Heo, G. E. "Inferring Undiscovered Public Knowledge by Using Text Mining Analysis and Main Path Analysis - The case of the Gene-Protein 'brings_about' Chains of Pancreatic Cancer," Korean Biblia Society for Library and Information Science, Vol. 26, No. 1, 2015. (in Korean)
  51. Ding, Y., Song, M., Han, J., Yu, Q., Yan, E., Lin, L., & Chambers, T., "Entitymetrics: Measuring the impact of entities," PloS one, e71416, Vol. 8, No. 8, 2013. https://doi.org/10.1371/journal.pone.0071416
  52. Song, M., Han, N. G., Kim, Y. H., Ding, Y., and Chambers, T., "Discovering implicit entity relation with the gene-citation-gene network," PloS one, e84639, Vol. 8, No. 12, 2013 https://doi.org/10.1371/journal.pone.0084639
  53. Yu, Q., Ding, Y., Song, M., Song, S., Liu, J., and Zhang, B., "Tracing database usage: Detecting main paths in database link networks," Journal of In for metrics, Vol. 9, No. 1, pp. 1-15, 2015.
  54. Lee, D., Kim, W. C., Charidimou, A., and Song, M., "A Bird's-Eye View of Alzheimer's Disease Research: Reflecting Different Perspectives of Indexers, Authors, or Citers in Mapping the Field," Journal of Alzheimer's Disease, 2015.
  55. Song, M., Kim, W. C., Lee, D., Heo, G. E., and Kang K. Y., "PKDE4J: Entity and Relation Extraction for Public Knowledge Discovery," 2015.
  56. Song, M., Heo, G. E. and Ding Y., "SemPathFinder: Semantic Path Analysis for Discovering Publicly Unknown Knowledge," 2015.
  57. Wei, C.H., Kao, H.Y., and Lu, Z., "PubTator: a web-based text mining tool for assisting biocuration," Nucleic Acids Research, Vol. 41, No. W1, pp. W518-W522, 2013. https://doi.org/10.1093/nar/gkt441
  58. Settles, B., "ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text," Bioinformatics, Vol. 21, No. 14, pp. 3191-3192, 2005. https://doi.org/10.1093/bioinformatics/bti475
  59. Leaman, R., and Gonzalez, G., "BANNER: an executable survey of advances in biomedical named entity recognition," Pacific Symposium on Biocomputing, Vol. 13, pp.652-663, 2008.
  60. Meehan, T. F., Masci, A M., Abdulla, A., Cowell, L. G., Blake, J. A., Mungall, C. J., and Diehl, A. D., (2011). "Logical development of the cell ontology," BMC bioinformatics, Vol 12, No. 1, pp. 6, 2011. https://doi.org/10.1186/1471-2105-12-6
  61. Ashburner, M., Ball, C. A., Blake, J. A, Botstein, D., Butler, H., Cherry, J. M., ... and Sherlock, G., "Gene Ontology: tool for the unification of biology," Nature genetics, Vol. 25, No. 1, pp. 25-29, 2000. https://doi.org/10.1038/75556
  62. Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A. C., Liu, Y., ... and Wishart, D. S., "DrugBank 4.0: shedding new light on drug metabolism," Nucleic acids research, Vol. 42, No. D1, pp. D1091-D1097, 2014. https://doi.org/10.1093/nar/gkt1068
  63. Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., ... and Scalbert, A., "HMDB 3.0-the human metabolome database in 2013," Nucleic acids research, gks1065, 2012.
  64. UniProt Consortium, "Activities at the universal protein resource (UniProt)," Nucleic acids research, Vol. 42, No. D1, D191-D198, 2014. https://doi.org/10.1093/nar/gkt1140
  65. Kanehisa, M., and Goto, S., "KEGG: kyoto encyclopedia of genes and genomes," Nucleic acids research, Vol. 28, No. 1, 27-30, 2000 https://doi.org/10.1093/nar/28.1.27
  66. Sud, M., Fahy, E., Cotter, D., Brown, A, Dennis, E. A., Glass, C. K., ... and Subramaniam, S., "Lmsd: lipid maps structure database," Nucleic acids research, Vol. 35, No. suppl 1, pp. D527-D532, 2007. https://doi.org/10.1093/nar/gkl838
  67. Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., ... and Nishioka, T., "MassBank: a public repository for sharing mass spectral data for life sciences. Journal of mass spectrometry," Vol 45, No.7, pp. 703-714, 2010. https://doi.org/10.1002/jms.1777