Browse > Article
http://dx.doi.org/10.3745/KTSDE.2016.5.3.145

TF-IDF Based Association Rule Analysis System for Medical Data  

Park, Hosik (아주대학교 컴퓨터공학과)
Lee, Minsu (이화여자대학교 컴퓨터공학과)
Hwang, Sungjin ((주)휴민텍 의료영상사업부)
Oh, Sangyoon (아주대학교 소프트웨어학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.5, no.3, 2016 , pp. 145-154 More about this Journal
Abstract
Because of the recent interest in the u-Health and development of IT technology, a need of utilizing a medical information data has been increased. Among previous studies that utilize various data mining algorithms for processing medical information data, there are studies of association rule analysis. In the studies, an association between the symptoms with specified diseases is the target to discover, however, infrequent terms which can be important information for a disease diagnosis are not considered in most cases. In this paper, we proposed a new association rule mining system considering the importance of each term using TF-IDF weight to consider infrequent but important items. In addition, the proposed system can predict candidate diagnoses from medical text records using term similarity analysis based on medical ontology.
Keywords
Association Rule; Medical Data; FP-Growth; TF-IDF;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. H. Kim, "Health IT Technology Trends," Electronics and Telecommunication Trens, Vol.25, No.6, pp.37-46, 2011.
2 Ottes, Leo, "Health 2.0 - It's up to You.," Medicine 2.0 Conference, JMIR Publication, 2010.
3 Jorge C. G. Ramirez, Lon A. Smith, and Lynn L. Peterson, "Medical Information Systems: Characterization and Challenges," ACM SIGMOD, Vol.23, No.3, pp.44-53, 1994.
4 Moon Koo Kim, Jong Hyun Park, and Young Hwan Joe, "A Study on the Key Success Factors of Big Data for Health Car," KSII, pp.239-240, 2013.
5 Hisham Al-Mubaid and Rajit K Singh, "A new text mining approach for finding protein-to-disease association," American Journal of Biochemistry and Biotechnology, Vol.1, No.3, pp.145-151, 2005.   DOI
6 J. Bjorne, Filip Ginter Heimonen, and Antti Airola, "Extracting complex biological events with rich graph-based feature sets," Proceedings of the Workshop on Current Rends in Biomedical Natural Language Processing: Shared Task. Association for Computational Linguistics, pp.10-18, 2009.
7 Kim Jung-jae, Piotr Pezik and Dietrich Rebholz-Schuhmann., "MedEvi: retrieving textual evidence of relations between biomedical concepts from Medline," Bioinformatics, Vol.24. No.11, pp.1410-1412, 2008.   DOI
8 Jeongkyun Kim and Jung-jae Kim, "DigSee: disease gene search engine with evidence sentences(version cancer)," Nucleic Acids Research, 41(Web Server issue), pp.510-517, 2013.   DOI
9 Abdullah Saad Almalaise Alghamdi, "Efficient Implementation of FP-Growth Algorithm-Data Mining on Medical Data," International Journal of Computer Science and Network Security, Vol.11, No.12, pp.7-16, 2011.
10 Dong Hoon Yang, Ji Hoon Kang, and Seoung Bum Kim, "Association Rule Mining and Network Analysis in Oriental Medicine," PLOS one, Vol.8, No.3, 2013.
11 Rakesh Agrawal and R. Srikant, "Fast algorithms for mining association rules," VLDB, Vol.1215, pp.287-499, 1994.
12 J. Han, J. Pei, and Y. Yun, "Mining frequent patterns without candidate generation," ACM SIGMOD Int. Conf. Manag. Data, Vol.29, No.2, pp.1-12, 2000.
13 Yanbo J. Wang, Q. Xin, and F. Coenen, "A Novel Rule Weighting Approach in Classification Association Rule Mining," Seventh IEEE International Conference on. IEEE, pp.271-276, 2007.
14 Dong Gyu Lee, Kwang Sun Ryu, Mohamed Bashir, Jang Whan Bae, and Keun Ho Ryu, "Discovering Medical Knowledge using Association Rule Mining in Young Adults with Acute Myocardial Infraction," Journal of Medical System, Vol.37, No.2, pp.1-10, 2013.
15 Sajid Mahmood, Muhammad Shahbaz, and Aziz Guergachi, "Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets," The Scientific World Journal, 2014.
16 MIMIC2 [Internet], https://physionet.org/.
17 Goldberger, Ary, Jeffrey M. Hausdorff, Joseph E. Mietus, and H. Eugene Stanley, "PhysioBank physiotoolkit, and physionet components of a new research resource for complex physiologic signals," Circulation, Vol.101, No.23, pp.215-220, 2000.   DOI
18 OBO Foundry [Internet], http://www.obofoundry.org.
19 Philip. Resnik, "Using information content to evaluate semantic similarity in a taxonomu," arXiv preprint cmp-lg/ 9511007, 1995.