Browse > Article
http://dx.doi.org/10.9708/jksci.2021.26.08.055

A Method for Compound Noun Extraction to Improve Accuracy of Keyword Analysis of Social Big Data  

Kim, Hyeon Gyu (Div. of Computer Science and Engineering, Sahmyook University)
Abstract
Since social big data often includes new words or proper nouns, statistical morphological analysis methods have been widely used to process them properly which are based on the frequency of occurrence of each word. However, these methods do not properly recognize compound nouns, and thus have a problem in that the accuracy of keyword extraction is lowered. This paper presents a method to extract compound nouns in keyword analysis of social big data. The proposed method creates a candidate group of compound nouns by combining the words obtained through the morphological analysis step, and extracts compound nouns by examining their frequency of appearance in a given review. Two algorithms have been proposed according to the method of constructing the candidate group, and the performance of each algorithm is expressed and compared with formulas. The comparison result is verified through experiments on real data collected online, where the results also show that the proposed method is suitable for real-time processing.
Keywords
Big data analysis; Social reviews; Keyword extraction; Morphological analysis; Compound noun extraction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Kim, "The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean," Journal of the Korea Society of Computer and Information, Vol. 24, No. 11, pp. 41-49, Nov. 2019.
2 Z. Jin and K Tanaka-Ishii, "Unsupervised Segmentation of Chinese Text by Use of Branching Entropy," The Journal of Korea Navigation Institute, pp. 428-435, Jul. 2006.
3 H. J. Kim and S. J. Cho, "Cleansing Noisy Text Using Corpus Extraction and String Match," MS. Thesis, Seoul National University, 2013.
4 Cohesion Score, https://lovit.github.io/nlp/2018/04/09/cohesion_ltokenizer/
5 H. G. Kim, "Efficient Keyword Extraction from Social Big Data Based on Cohesion Scoring," Journal of the Korea Society of Computer and Information, Vol. 25, No. 10, pp. 87-94, Oct. 2020.   DOI
6 C. Lee, D. Choi, S. Kim, and J. Kang, "Classification and Analysis of Emotion in Korean Microblog Texts," Journal of KIISE, Vol. 40, No. 3, pp. 159-167, Jun. 2013.
7 C. Park and C. Lee, "Korean Movie Review Sentimental Analysis using RNN-based Variational Inference," Proceedings of the 2018 Korea Software Congress, pp. 587-589, December 2018.
8 H. G. Seo and H. W. Park, "Design and Implementation of Potential Advertisement Keyword Extraction System Using SNS," Journal of the Korea Convergence Society, Vol. 9, No. 7, pp. 14-24, 2018.
9 O. J. Lee, S. B. Park, D. Chung, and E. S. You, "Movie Box-Office Analysis Using Social Big Data," Journal of the Korea Contents Society, Vol. 14, No. 10, pp. 527-538, 2014.
10 J. Y. Chang, "A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in Online Shop ping Mall," Vol. 14, No. 4, pp. 19-32, 2009.
11 Y. Oh, M. Kim, and W. Kim, "Korean Movie Review Sentiment analysis Using Parallel Stacked Bidirectional LSTM Model," Proceedings of the 2018 Korea Computer Congress, pp. 823-825, June 2018.
12 Y. W. Yu and H. G. Kim, "Interactive Morphological Analysis to Improve Accuracy of Keyword Extraction Based on Cohesion Scoring," Journal of the Korea Society of Computer and Information, Vol. 25, No. 12, pp. 145-153, Dec. 2020.   DOI
13 E. F. Cardoso, R. M. Silva, and T. A. Almeida, "Towards Automatic Filtering of Fake Review," Journal of Neurocomputing, Vol. 309, pp. 106-116, May 2018.   DOI
14 Hannanum, http://semanticweb.kaist.ac.kr/hannanum/index.html
15 IDC Korea, https://www.idc.com/getdoc.jsp?containerId=prAP 4593 8720
16 M. Kim, S. Hong, and I. H. Suh, "Convolutional Neural Network Based Filtering-Scoring System for Rating Prediction of Travel Attractions using Social Media," Journal of the Institute of Electronics and Information Engineers, Vol. 56, No. 9, pp. 891-897, Sep. 2019.
17 J. Yeon, J. Myung, J. Shim, and S. Lee, "Characteristic Set and Collaborative Filtering for Review Selection," Proceedings of the 2012 Korea Computer Congress, pp. 43-45, 2012.
18 A. Mousa and B. Schuller, "Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis," Proceedings of the 15 th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 1023-1032, Valencia, Spain, April 2017.
19 W. L. Kang, H. G. Kim, and Y, J. Lee, "Reducing IO Cost in OLAP Query Processing with MapReduce," IEICE Trans. Inf. & Syst, Vol. E98-D, No. 2, pp. 444-447, Feb. 2015.   DOI
20 K. H. Lee et al., "Parallel Data Processing with MapReduce: a Survey," ACM SIGMOD Record, Vol. 40, No. 4, pp. 11-20, 2012.   DOI
21 H. G. Kim, "Developing a Big Data Analysis Platform for Small and Medium-Sized Enterprises," Journal of the Korea Society of Computer and Information, Vol. 25, No. 8, pp. 65-72, Aug. 2020.   DOI
22 Kokoma, http://kkma.snu.ac.kr/documents/index.jsp