Browse > Article
http://dx.doi.org/10.5859/KAIS.2012.21.3.137

Using Ontologies for Semantic Text Mining  

Yu, Eun-Ji (국민대학교 비즈니스IT전문대학원)
Kim, Jung-Chul (국민대학교 비즈니스IT전문대학원)
Lee, Choon-Youl (국민대학교 경영정보학부)
Kim, Nam-Gyu (국민대학교 경영정보학부)
Publication Information
The Journal of Information Systems / v.21, no.3, 2012 , pp. 137-161 More about this Journal
Abstract
The increasing interest in big data analysis using various data mining techniques indicates that many commercial data mining tools now need to be equipped with fundamental text analysis modules. The most essential prerequisite for accurate analysis of text documents is an understanding of the exact semantics of each term in a document. The main difficulties in understanding the exact semantics of terms are mainly attributable to homonym and synonym problems, which is a traditional problem in the natural language processing field. Some major text mining tools provide a thesaurus to solve these problems, but a thesaurus cannot be used to resolve complex synonym problems. Furthermore, the use of a thesaurus is irrelevant to the issue of homonym problems and hence cannot solve them. In this paper, we propose a semantic text mining methodology that uses ontologies to improve the quality of text mining results by resolving the semantic ambiguity caused by homonym and synonym problems. We evaluate the practical applicability of the proposed methodology by performing a classification analysis to predict customer churn using real transactional data and Q&A articles from the "S" online shopping mall in Korea. The experiments revealed that the prediction model produced by our proposed semantic text mining method outperformed the model produced by traditional text mining in terms of prediction accuracy such as the response, captured response, and lift.
Keywords
Classification; Data Mining; Ontology; Semantic; Text Mining;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 Witten, I. H., Text Mining, Practical Handbook of Internet Computing, edited by M. P. Singh, CRC Press, 2004.
2 Sebastiani, F., "Machine Learning in Automated Text Categorization," ACM Computing Surveys, Vol.34, No.1, 2002, pp.1-47.   DOI   ScienceOn
3 Sebastiani, F., Classification of Text, Automatic, The Encyclopedia of Language and Linguistics 14, 2nd edition, Elsevier Science Pub., 2006.
4 Shanks, G., Nuredini, J., Tobin, D., Moody, D., and Weber, R., "Representing Things and Properties in Conceptual Modelling: An Empirical Evaluation," Journal of Database Management, Vol.21, No.2, 2010, pp.1-25.
5 Shanks, G., Tansley, E., Nuredini, J., Tobin, D., and Weber, R., "Representing Part-Whole Relations in Conceptual Modeling: An Empricial Evaluation," MIS Quarterly, Vol.32, No.3, 2008, pp.553-573.
6 Spasic, I., Ananiadou, S., Mcnaught, J., and Kumar, A., "Text Mining and Ontologies in Biomedicine: Making Sense of Raw Text," Briefing in Bioinformatics, Vol.6, No.3, 2005, pp.239-251.   DOI   ScienceOn
7 Spyns, P., Meersman, R., and Jarrar, M., "Data Modelling versus Ontology Engineering," ACM SIGMOD Record, Vol.31, No.4, 2002, pp.12-17.   DOI   ScienceOn
8 Stanvrianou, A., Andritsos, P., and Nicoloyannis, N., "Overview and Semantic Issues of Text Mining," ACM SIGMOD Record, Vol.36, No.3, 2007, pp.23-34,   DOI   ScienceOn
9 Storey, V. C., "Comparing Relationships in Conceptual Modeling: Mapping to Semantic Classifications," IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.11, 2005, pp.1478-1489.   DOI   ScienceOn
10 Wand, Y., Monarchi, D. E., Parsons, J., and Woo, C. C., "Theoretical Foundations for Conceptual Modelling in Information Systems Development," Decision Support Systems, Vol.15, No.4, 1995, pp.285-304.   DOI   ScienceOn
11 Wand, Y., and Weber, R., "On the Ontological Expressiveness of Information System Analysis and Design Grammars," Journal of Information Systems, Vol.3, No.4, 1993, pp.217-237.   DOI
12 Wand, Y., and Weber, R., "On the Deep Structure of Information Systems," Information System Journal, Vol.5, No.3, 1995, pp.203-223.   DOI
13 Hitzler, P., Krotzsch, M., and Rudolph, S., Foundations of Semantic Web Technologies, CRC Press, 2009.
14 Horridge, M., A Practical Guide To Building OWL Ontologies Using Protege 4 and CO-ODE Tools, The University of Manchester, 2011.
15 Jones, A. B., and Weber, R., "Understanding Relationships with Attributes in Entity-Relatioship Diagrams," in Proceedings of the 20th International Conference on Information Systems(ICIS), 1999, pp.241-228.
16 Maedche, A., Staab, S., Stojanovic, N., Studer, R., and Sure, Y., "SEAL-A Framework for Developing Semantic Web PortALs," in Proceedings of British National Conference on Databases, Vol.2097, 2001, pp.1-22.
17 Masahide, K., 시맨틱 웹을 위한 RDF/OWL 입문, 홍릉과학출판사, 2008.
18 Mckinsey, Big Data: The Next Frontier for Innovation, Competition, and Productivity, Mckinsey Global Institute, 2011.
19 Metzler, D., Bernstein, Y., Crofit, W. B., Moffat, A., and Zobel, J., "Similarity Measures for Tracking Information Flow," in Proceedings of CIKM, 2005, pp.517-524.
20 Mooney, R. J., and Bunescu, R., "Mining Knowledge from Text using Information Extraction," ACM SIGKDD Explorations, Vol.7, No.1, 2006, pp.3-10.
21 Rijsbergen, C. J. V., Information Retrieval, 2nd edition, Butterworth, London, 1979.
22 Salton, G., Wong, A., and Yang, C. S., "A Vector Space Model for Automatic Indexing," Communications of the ACM, Vol.18, No.11, pp. 613 - 620, 1975.   DOI   ScienceOn
23 SAS, Text Analytics with SAS Text Miner Course Notes, SAS Institute Inc., 2010.
24 Hearst, M. A., "Untangling Text Data Mining," In Proceedings of ACL, 1999, pp.3-10.
25 정윤수, 이춘열, 김남규, "토픽맵의 다중역할 토픽 보존을 위한 관계형 데이터베이스 구조," 정보시스템연구, 제18권, 제3호, 2009, pp.327-349.
26 최광선, "SNS 시대의 하이브리드 빅데이터 분석 기술 및 사례," 2012 Big Data 검색 분석 기술 Insight, 보고서, 2012.
27 홍준석, "시맨틱 웹에서의 효율적인 온톨로지 추론을 위한 개선방법에 관한 연구," 한국전자거래학회지, 제13권, 제3호, 2008, pp.85-101.   과학기술학회마을
28 홍태호, 김진완, "데이터 마이닝의 비대칭 오류비용을 이용한 지능형 침입탐지시스템 개발," 정보시스템연구, 제15권, 제4호, 2006, pp.211-224.   과학기술학회마을
29 Albright, R., Taming Text with the SVD, SAS Institute Inc., 2006.
30 Antoniou, G., and Harmelen, F. V. V., A Semantic Web Primer, 2nd edition, The MIT Press, 2008.
31 Bunge, M. A., Treatise on Basic Philosophy (Volume 3): Ontology I, The Future of the World, D. Reidel Publishing Company, Boston, 1977.
32 Bunge, M. A., Treatise on Basic Philosophy (Volume 4): Ontology II, A World of Systems, D. Reidel Publishing Company, Boston, 1979.
33 Fan, W., Wallace, W., Rich, S., and Zhang, Z., "Tapping the Power of Text Mining," Communications of the ACM, Vol.49, No.9, 2006. pp.76-82.   DOI   ScienceOn
34 Gartner, Hype Cycle for Emerging Technologies, 2011, Gartner, 2011.
35 김형도, 김종우, "기업간 비즈니스 프로세스 메타 데이터 온톨로지 설계," 한국IT서비스학회 2006년 추계학술대회, 2006.
36 Gemino, A., and Wand, Y., "Complexity and Clarity in Conceptual Modeling: Comparison of Mandatory and Optional Properties," Data & Knowledge Engineering, Vol.55, No.3, 2005, pp.301-326.   DOI   ScienceOn
37 Han, J., and Kamber, M., Data Mining: Concepts and Techniques, 2nd, Morgan Kaufmann Publishers, 2006.
38 김인현, "빅데이터 가치와 도입 전략," 2012 Big Data 검색 분석 기술 Insight, 보고서, 2012.
39 노상규, 박진수, 인터넷 진화의 열쇠 온톨로지, 가즈토이, 2007.
40 손윤호, 김인규, 김남규, "연관규칙 마이닝을 활용한 개념적 데이터베이스 설계 자동화 기법," 정보시스템연구, 제18권, 제4호, 2009, pp.59-86.   과학기술학회마을   DOI   ScienceOn
41 안성준, 김우주, 박상언, "최적 온톨로지 매핑 방법론에 관한 연구," 한국지능정보시스템학회 2007년 추계학술대회 논문집, 2007. pp.457-462.   과학기술학회마을
42 유지연, "세계경제포럼(WEF)을 통해 본 빅데이터 논의 동향과 함의," 정보통신정책연구원 방송통신정책, 제24권, 제4호, 2012.
43 이동훈, 김남규, 정인환, "온톨로지와 개체관계 모델의 상호운용성에 대한 연구," Journal of Information Technology Applications and Management, 제18권, 제4호, 2011. pp.95-118.   과학기술학회마을