Browse > Article
http://dx.doi.org/10.9708/jksci.2012.17.2.139

Probabilistic filtering for a biological knowledge discovery system with text mining and automatic inference  

Lee, Hee-Jin (Dept. of Computer Science, KAIST)
Park, Jong-C. (Dept. of Computer Science, KAIST)
Abstract
In this paper, we discuss the structure of biological knowledge discovery system based on text mining and automatic inference. Given a set of biology documents, the system produces a new hypothesis in an integrated manner. The text mining module of the system first extracts the 'event' information of predefined types from the documents. The inference module then produces a new hypothesis based on the extracted results. Such an integrated system can use information more up-to-date and diverse than other automatic knowledge discovery systems use. However, for the success of such an integrated system, the precision of the text mining module becomes crucial, as any hypothesis based on a single piece of false positive information would highly likely be erroneous. In this paper, we propose a probabilistic filtering method that filters out false positives from the extraction results. Our proposed method shows higher performance over an occurrence-based baseline method.
Keywords
knowledge discovery system; text mining; automatic inference; probabilistic filtering;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 S.Povey, R.Lovering, E.Bruford, M.Wright, M.Lush and He.Wain, "The HUGO Gene Nomenclature Committee (HGNC)," Human Genetics Vol. 109, No. 6, pp.678-680, Oct. 2001.   DOI   ScienceOn
2 S.Leem, K.Wee, "Prediction of SNP interactions in complex diseases with mutual information and boolean algebra," Journal of The Korea Society of Computer and Information, Vol.15, No.11, pp.215-224, Nov. 2010.   과학기술학회마을   DOI   ScienceOn
3 H.Jeong, Y.Yoon, "Class prediction of an indepen dent sample using a set of gene modules consisting of gene-pairs which were condition(Tumor, Normal) specific," Journal of The Korea Society of Computer and Information, Vol.15, No.12, pp.197-207, Dec. 2010.   DOI   ScienceOn
4 D.R.Swanson, and N.R.Smalheiser, "An interactive system for finding complementary literatures: a stimulus to scientific discovery," Artif. Intell., Vol. 91, No. 2, pp.183--203, April 1997.   DOI   ScienceOn
5 K.Seiki and J.Mostafa, "Discovering implicit associations between gens and hereditary diseases," In Proceedings of the Pacific Symposium on Biocomputing 2007, Jan. 2007.
6 M.Yetisgen-Yildiz and W.Pratt, "Using statistical and knowledge based approaches for literature based discovery," Journal of Biomedical Informatics, Vol. 39, No. 6, pp.600-611, Jan. 2006.   DOI   ScienceOn
7 D.Hristovski, C.Friedman, T.C.Rindflesch, and B.Peterlin, "Exploiting semantic relations for literature based discovery," In AMIA Annual Symposium Proceedings, Nov. 2006.
8 L.Tari, S.Anwar, S.Liang, J.Cai, and C.Baral, "Discovering drug drug interactions: a text mining and reasoning approach based on properties of drug metabolism," Bioinformatics, Vol. 26, No. 18, pp.i547-i553, Sep. 2010.   DOI   ScienceOn
9 J.D.Kim, S.Kraines, W.Guo, and J.Tsujii. "Inference for bioie: Genia meets ekoss," In Proceedings of the 3rd International Symposium on Language in Biology and Medicine, Nov. 2009.
10 H.J.Lee and J.C.Park, "Towards Knowledge Discovery through Automatic Inference with Text Mining in Biology and Medicine," In Proceedings of the 3rd International Symposium on Semantic Mining in Biomedicine, Sep. 2008.
11 J.Bjorne, F.Ginter, J.Heimonen, A.Airola, T.Pahikkala and T.Salakoski, "Extracting Complex Biological Events with Rich Graph-Based Features Sets," In Proceedings of the BioNLP'09 Shared Task on Event Extraction, pp.10-18, June 2009.
12 A.Cimatti et al., "NuSMV 2: An opensource tool for symbolic model checking," In Proceedings of CAV 2002, pp.27-31. July 2002.
13 J.D. Kim, S.Pyysalo, T.Ohta, R.Bossy, N.Nguyen and J.Tsujii, "Overview of BioNLP Shared Task 2011," In Proceedings of BioNLP Shared Task 2011 Workshop, pp. 1-6, June 2011.
14 P.Zweigenbaum and D.Demner-Fushman, Advanced literature-mining tools, In J.E.Stajich, D.Edwards and D.Hansen, eitors, "Bioinformatics: Tools and Applications," pp.347-381, Springer, Sep. 2009.
15 E.Antezana, M.Kuiper, and V.Mironovm, "Biological knowledge management: the emerging role of the semantic web technologies," Briefings in Bioinformatics, Vol. 10, No. 4, pp.392-407, May 2009.   DOI   ScienceOn
16 T.Slater, C.Bouton, and E.S.Huang, "Beyond data integration," Drug Discovery Today, Vol. 13, No. 1314, pp.584-589, March 2008.   DOI   ScienceOn
17 D.R.Swanson, "Complementary structures in disjoint science literatures," In Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, Oct. 1991.
18 Q.Zhu, Y.Sun, S.Challa, Y.Ding, M.Lajiness, and D.Wild, "Semantic inference using chemogenomics data for drug discovery," BMC Bioinformatics, Vol. 12, No. 1, pp.256, June 2011.   DOI
19 C.B.Giles and J.D.Wren, "Large scale directional relationship extraction and resolution," BMC Bioinformatics, Vol. 9, No. suppl 9, pp.S11, Aug. 2008.   DOI
20 D.R.Swanson, "Two medical literatures that are logically but not bibliographically connected," Journal of the American Society for Information Science, Vol. 38, No. 4, pp.228-233, July 1987.   DOI