DOI QR코드

DOI QR Code

Investigations on Techniques and Applications of Text Analytics

텍스트 분석 기술 및 활용 동향

  • Kim, Namgyu (Kookmin University School of MIS) ;
  • Lee, Donghoon (Kookmin University The Graduate School of Business Information Technology) ;
  • Choi, Hochang (Kookmin University The Graduate School of Business Information Technology) ;
  • Wong, William Xiu Shun (Kookmin University The Graduate School of Business Information Technology)
  • Received : 2017.01.09
  • Accepted : 2017.02.16
  • Published : 2017.02.28

Abstract

The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

최근 데이터의 양 자체가 해결해야 할 문제의 일부분이 되는 빅데이터(Big Data) 분석에 대한 수요와 관심이 급증하고 있다. 빅데이터는 기존의 정형 데이터 뿐 아니라 이미지, 동영상, 로그 등 다양한 형태의 비정형 데이터 또한 포함하는 개념으로 사용되고 있으며, 다양한 유형의 데이터 중 특히 정보의 표현 및 전달을 위한 대표적 수단인 텍스트(Text) 분석에 대한 연구가 활발하게 이루어지고 있다. 텍스트 분석은 일반적으로 문서 수집, 파싱(Parsing) 및 필터링(Filtering), 구조화, 빈도 분석 및 유사도 분석의 순서로 수행되며, 분석의 결과는 워드 클라우드(Word Cloud), 워드 네트워크(Word Network), 토픽 모델링(Topic Modeling), 문서 분류, 감성 분석 등의 형태로 나타나게 된다. 특히 최근 다양한 소셜미디어(Social Media)를 통해 급증하고 있는 텍스트 데이터로부터 주요 토픽을 파악하기 위한 수요가 증가함에 따라, 방대한 양의 비정형 텍스트 문서로부터 주요 토픽을 추출하고 각 토픽별 해당 문서를 묶어서 제공하는 토픽 모델링에 대한 연구 및 적용 사례가 다양한 분야에서 생성되고 있다. 이에 본 논문에서는 텍스트 분석 관련 주요 기술 및 연구 동향을 살펴보고, 토픽 모델링을 활용하여 다양한 분야의 문제를 해결한 연구 사례를 소개한다.

Keywords

References

  1. G. Salton, The SMART retrieval system-experiments in automatic document processing, Prentice-Hall, 1971.
  2. G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Commun. ACM, vol. 18, no. 11, pp. 613-620, Nov. 1975. https://doi.org/10.1145/361219.361220
  3. H. P. Luhn, "A statistical approach to mechanized encoding and searching of literary information," IBM J. Res. Develop., vol. 1, no. 4, pp. 309-317, Oct. 1957. https://doi.org/10.1147/rd.14.0309
  4. K. S. Jones, "A statistical interpretation of term specificity and its application in retrieval," J. Documentation, vol. 28, no. 1, pp. 11-21, Jan. 1972. https://doi.org/10.1108/eb026526
  5. G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Inf. Process. & Management, vol. 24, no. 5, pp. 513-523, Dec. 1988. https://doi.org/10.1016/0306-4573(88)90021-0
  6. D. L. Lee, H. Chuang and K. Seamons, "Document ranking and the vector-space model," IEEE Softw., vol. 14, no. 2, pp. 67-75, Mar. 1997. https://doi.org/10.1109/52.582976
  7. K. Pearson, "On lines and planes of closest fit to systems of point in space," Philosophical Mag., vol. 2, pp. 559-572, 1901. https://doi.org/10.1080/14786440109462720
  8. H. Hotelling, "Analysis of a complex of statistical variables into principal components," J. Educational Psychol., vol. 24, no.6, pp. 417, Sept. 1933. https://doi.org/10.1037/h0071325
  9. I. Jolliffe, Principal Component Analysis, John Wiley & Sons, 2002.
  10. J. E. Jackson, A User's Guide to Principal Components, John Wiley & Sons, 2005.
  11. G. Saporta and N. Niang, Principal component analysis: application to statistical process control, Data Analysis, 2009.
  12. G. W. Stewart, "On the early history of the singular value decomposition," SIAM Rev., vol. 35, no. 4, pp. 551-566, Dec. 1993. https://doi.org/10.1137/1035134
  13. D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, Oct. 1999. https://doi.org/10.1038/44565
  14. D. L. Lee, H. Chuang, and K. Seamons, "Document ranking and the vector-space model," IEEE Softw., vol. 14, no. 2, pp. 67-75, Mar. 1997. https://doi.org/10.1109/52.582976
  15. M. B. Koll, "WEIRD: An approach to concept-based information retrieval," ACM SIGIR Forum, vol. 13, no. 4, pp. 32-50, Apr. 1979.
  16. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391-407, Sept. 1990. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  17. T. Hofmann, "Probabilistic latent semantic indexing," in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Inf. Retrieval, pp. 50-57, Berkeley, USA, Aug. 1999.
  18. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. Machine Learning Res., vol. 3, pp. 993-1022, Jan. 2003.
  19. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint, arXiv:1301.3781, Jan. 2013.
  20. Y. Goldberg and O. Levy, "Word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method," arXiv preprint, arXiv:1402.3722, Feb. 2014.
  21. O. N. Park and H. J. Park, "A study on the international research trends in electronic records management: InterPARES 3 and ITrust achievements," J. Records Management & Archives Soc. Korea, vol. 16, no. 1, pp. 89-120, Feb. 2016. https://doi.org/10.14404/JKSARM.2016.16.1.089
  22. S. JU and M. S. Myoung, "The analysis of core contents in consumer area from 1st to 2009 revised middle school home economics textbooks," J. Korean Home Econ. Edu. Assoc., vol. 27, no. 4, pp. 37-50, Dec. 2015. https://doi.org/10.19031/jkheea.2015.12.27.4.37
  23. V. Ingle and S. Deshmukh, "Live news streams extraction for visualization of stock market trends," in Proc. Int. Conf. Sign., Netw., Comput., Syst., pp. 297-301, New Delhi, India, Feb. 2016.
  24. W. Cui, Y. Wu, S. Liu, F. Wei, M. X. Zhou, and H. Qu, "Context-preserving, dynamic word cloud visualization," IEEE Comput. Graphics Appl., vol. 30, no. 6, pp. 42-53, Nov. 2010. https://doi.org/10.1109/MCG.2010.102
  25. D. Scanfeld, V. Scanfeld, and E. L. Larson, "Dissemination of health information through social networks: Twitter and antibiotics," Am. J. Infection Control, vol. 38, no. 3, pp. 182-188, Apr. 2010. https://doi.org/10.1016/j.ajic.2009.11.004
  26. W. Seo, H. Park, and J. Yoon, "An exploratory study on the korean national R&D trends using co-word analysis," J. Inf. Technol. Appl. & Management, vol. 19, no. 4, pp. 1-18, Dec. 2012.
  27. G. E. Heo and M. Song, "Examining the intellectual structure of a medical informatics journal with author co-citation analysis and co-word analysis," J. Korean Soc. Inf. Management, vol. 30, no. 2, pp. 107-225, Jun. 2013.
  28. S. Seo and E. Chung, "Domain analysis on the field of open access by co-word analysis," J. Korean Biblia Soc. Library Inf. Sci., vol. 24, no. 1, pp. 207-228, Mar. 2013. https://doi.org/10.14699/kbiblia.2013.24.1.207
  29. B. Kang and J. H. Park, "Profiling and co-word analysis of teaching korean as a foreign language domain," J. Korean Soc. Inf. Management, vol. 30, no. 4, pp. 195-213, Dec. 2013. https://doi.org/10.3743/KOSIM.2013.30.4.195
  30. H. Choi and H. Varian, "Predicting the present with google trends," Econ. Record, vol. 88, no. 1, pp. 2-9, Jun. 2012. https://doi.org/10.1111/j.1475-4932.2012.00809.x
  31. C. Graeme, "Googling the present," The Labour Gazette, vol. 4, no. 12, pp. 59-95, Dec. 2010.
  32. N. Askitas and K. F. Zimmermann, "Google econometrics and unemployment forecasting," Appl. Econ. Quarterly, vol. 55, no. 2, pp. 107-120, Jun. 2009. https://doi.org/10.3790/aeq.55.2.107
  33. N. Khanh-Ly, B. J. Shin, and S. J. Yoo, "Hot topic detection and technology trend tracking for patents utilizing term frequency and proportional document frequency and semantic information," in Proc. BigComp, pp. 223-230, Hong Kong, China, Jan. 2016.
  34. R. Kaushik, S. A. Chandra, D. Mallya, J. N. V. K. Chaitanya and S. S. Kamath, "Sociopedia: An interactive system for event detection and trend analysis for twitter data," in Proc. Int. Conf. Advanced Comput., Netw. Informatics, pp. 63-70, Bhubaneswar, India, Sept. 2015.
  35. J. B. Yi, C. K. Lee, and K. J. CHA, "An analysis of IT trends using tweet data," J. Intell. Inf. Syst., vol. 21, no. 1, pp. 143-159, Mar. 2015. https://doi.org/10.13088/JIIS.2015.21.1.143
  36. H. Steinhaus, "Sur la division des corps materiels en parties," Bull. Acad. Polon. Sci., vol. 4, no. 12, pp. 801-804, Oct. 1956.
  37. J. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proc. 5th Berkeley Symp. Mathematical Statistics Probability, vol. 1, no. 14, pp. 281-297, Berkeley, USA, Jun. 1967.
  38. K. Bae, J. Hwang, Y. Ko, and J. Kim, "A search-result clustering method based on word clustering for effective browsing of the paper retrieval results," J. KISS : Software and Appl., vol. 37, no. 3, pp. 214-221, Mar. 2010.
  39. S. Jung, S. H. Lim, J. H. Jeon, B. M. Kim, and H. A. Lee, "Web search result clustering using snippets," J. KISS : Databases, vol. 39, no. 5, pp. 321-331, Oct. 2012.
  40. J. H. Kim, J. S. Lee, M. Lee, W. Kim, and J. S. Hong, "Term mapping methodology between everyday words and legal terms for law information search system," J. Intell. Inf. Syst., vol. 18, no. 3, pp. 137-152, Sept. 2012. https://doi.org/10.13088/JIIS.2012.18.3.137
  41. S. Han, "A comparative study on clustering methods for grouping related tags," J. Korean Soc. Library Inf. Sci., vol. 43, no. 3, pp. 399-416, Sept. 2009. https://doi.org/10.4275/KSLIS.2009.43.3.399
  42. M. Steyvers and T. Griffiths, Probabilistic topic models : Handbook of latent semantic analysis, Lawrence Erlbaum Associates, 2007.
  43. T. L. Griffiths and M. Steyvers, "Finding scientific topics," in Proc. National Academy Sci., vol. 101, no. suppl 1, pp. 5228-5235, Apr. 2004. https://doi.org/10.1073/pnas.0307752101
  44. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, "Hierarchical dirichlet processes," J. Am. Statistical Assoc., vol. 101, no. 476, pp. 1566-1581, Jan. 2012. https://doi.org/10.1198/016214506000000302
  45. Q. Yao, Z. Song, and C. Peng, "Research on text categorization based on LDA," Comput. Eng. Appl., vol. 47, no. 13, pp. 150-153, May 2011.
  46. S. Y. Yu, "Exploratory study of developing a synchronization-based approach for multi-step discovery of knowledge structures," J. Inf. Sci. Theory Practice, vol. 2, no. 2, pp. 16-32, Jun. 2014. https://doi.org/10.1633/JISTaP.2014.2.2.2
  47. M. Lim and N. Kim, "Investigating dynamic mutation process of issues using unstructured text analysis," J. Intell. Inf. Syst., vol. 22, no. 1, pp. 1-18, Mar. 2016. https://doi.org/10.13088/JIIS.2016.22.1.01
  48. S. A. Jin, C. E. Heo, Y. K. Jeong, and M. Song, "Topic-network based topic shift detection on twitter," J. Korean Soc. Inf. Management, vol. 30, no. 1, pp. 285-302, Mar. 2013. https://doi.org/10.3743/KOSIM.2013.30.1.285
  49. J. H. Park and M. Song, "A study on the research trends in library & information science in korea using topic modeling," J. Korean Soc. Inf. Management, vol. 30, no. 1, pp. 7-32, Mar. 2013. https://doi.org/10.3743/KOSIM.2013.30.1.007
  50. J. S. Oh, "Identifying research opportunities in the convergence of transportation and ICT using text mining techniques," J. Transport Res., vol. 22, no. 4, pp. 93-110, Dec. 2015. https://doi.org/10.34143/jtr.2015.22.4.93
  51. S. T. Na, J. H. Kim, M. H. Jung, and J. E. Ahn, "Trend analysis using topic modeling for simulation studies," J. Korea Soc. Simulation, vol. 25, no. 3, pp. 107-116, Sept. 2016. https://doi.org/10.9709/JKSS.2016.25.3.107
  52. J. Bae, N. Han and M. Song, "Twitter issue tracking system by topic modeling techniques," J. Intell. Inf. Syst., vol. 20, no. 2, pp. 109-122, Jun. 2014. https://doi.org/10.13088/JIIS.2014.20.2.109
  53. J. Bae, J. Son, and M. Song, "Analysis of twitter for 2012 south korea presidential election by text mining techniques," J. Intell. Inf. Syst., vol. 19, no. 3, pp. 141-156, Sept. 2013. https://doi.org/10.13088/jiis.2013.19.3.141
  54. X. Wang and A. McCallum, "Topics over time: A non-Markov continuous-time model of topical trends," in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 424-433, Philadelphia, USA, Aug. 2006.
  55. S. A. Jin and M. Song, "Topic modeling based interdisciplinarity measurement in the informatics related journals," J. Korean Soc. Inf. Management, vol. 33, no. 1, pp. 7-32, Mar. 2016. https://doi.org/10.3743/KOSIM.2016.33.1.007
  56. D. Lee, H. Choi, and N. Kim, "A method for evaluating news value based on supply and demand of information using text analysis," J. Intell. Inf. Syst., vol. 22, no. 4, pp. 45-67, Dec. 2016. https://doi.org/10.13088/JIIS.2016.22.4.045
  57. H. J. Hwang, H. R. Shim, and J. Choi, "Exploration of user experience research method with big data analysis : Focusing on the online review analysis of echo," J. Korea Contents Assoc., vol. 16, no. 8, pp. 517-528, Aug. 2016. https://doi.org/10.5392/JKCA.2016.16.08.517
  58. G. Kim and H. Yun, "Topic modeling approach to understand changes in customer perceptions on hotel services in seoul," J. Korea Serv. Management Soc., vol. 17, no. 3, pp. 217-231, Sept. 2016. https://doi.org/10.15706/jksms.2016.17.3.010
  59. J. D. Park, "A study on mapping users' topic interest for question routing for community-based q&a service," J. Korean Soc. Inf. Management, vol. 32, no. 3, pp. 397-412, Sept. 2015. https://doi.org/10.3743/KOSIM.2015.32.3.397
  60. D. Jeong, J. Kim, G. Kim, J. U. Heo, B. W. On, and M. Kang, "A proposal of a keyword extraction system for detecting social issues," J. Intell. Inf. Syst., vol. 19, no. 3, pp. 1-23, Sept. 2013. https://doi.org/10.13088/jiis.2013.19.3.001
  61. B. Noh, Z. Xu, J. Lee, D. Park, and Y. Chung, "Keyword network based repercussion effect analysis of foot-and-mouth disease using online news," J. Korean Inst. Inf. Technol., vol. 14, no. 9, pp. 143-152, Sept. 2013.
  62. J. An, K. Ahn, and M. Song, "Text mining driven content analysis of ebola on news media and scientific publications," J. Korean Soc. Library Inf. Sci., vol. 50, no. 2, pp. 289- 307, May 2016. https://doi.org/10.4275/KSLIS.2016.50.2.289
  63. J. Chang, S. Gerrish, C. Wang, J. L. Boyd-Graber, and D. M. Blei, "Reading tea leaves: How humans interpret topic models," in Proc. Advances in Neural Inf. Process. Syst., pp. 288-296, Vancouver, Canada, Dec. 2009.
  64. W. X. Zhao, J. Jiang, J. Weng, J. He, E. P. Lim, H. Yan, and X. Li, "Comparing twitter and traditional media using topic models," in Proc. Eur. Conf. Inf. Retrieval, pp. 338-349, Dublin, Ireland, Apr. 2011.
  65. C. Apte, F. Damerau, and S. M. Weiss, "Automated learning of decision rules for text categorization," ACM Trans. Inf. Syst. (TOIS), vol. 12, no. 3, pp. 233-251, Jul. 1994. https://doi.org/10.1145/183422.183423
  66. G. Weikum, "Foundations of statistical natural language processing," ACM SIGMOD, vol. 31, no. 3, pp. 37-38, Sept. 2002.
  67. F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys (CSUR), vol. 34, no. 1, pp. 1-47, Mar. 2002. https://doi.org/10.1145/505282.505283
  68. S. Dumais, J. Platt, D. Heckerman, and M. Sahami, "Inductive learning algorithms and representations for text categorization," in Proc. 7th Int. Conf. Inf. Knowledge Management, pp. 148-155, Maryland, USA, Nov. 1998.
  69. J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques, Elsevier, 2011.
  70. V. Vapnik, Estimation of Dependences Based on Empirical Data, Springer Verlag, 1982.
  71. T. Joachims, "Text categorization with support vector machines: Learning with many relevant features," in Proc. Eur. Conf. Machine Learning, pp. 137-142, London, UK, Apr. 1998.
  72. Y. H. Kang and Y. B. Park, "Design of automatic document classifier for IT documents based on SVM," J. IKEEE, vol. 8, no. 2, pp. 186-194, Dec. 2012.
  73. C. Lee, S. Lim, and H. Kim, "Korean semantic role labeling using structured SVM," J. KIISE, vol. 42, no. 2, pp. 220-226, Feb. 2015. https://doi.org/10.5626/JOK.2015.42.2.220
  74. D. D. Lewis and M. Ringuette, "A comparison of two learning algorithms for text categorization," in Proc. 3rd Annu. Symp. Document Anal. Inf. Retrieval, pp. 81-93, Las Vegas, USA, Apr. 1994.
  75. J. H. Roh, H. Kim, and J. Y. Chang, "Improving hypertext classification systems through WordNet-based feature abstraction," The J. Soc. e-Business Stud., vol. 18, no. 2, pp. 95-110, May 2013. https://doi.org/10.7838/jsebs.2013.18.2.095
  76. K. P. Kim and Y. S. Kwon, "Performance comparison of naive bayesian learning and centroid-based classification for e-mail classification," IE Interfaces, vol. 18, no. 1, pp. 10-21, Mar. 2005.
  77. M. Kam and M. Song, "A study on differences of contents and tones of arguments among newspapers using text mining analysis," J. Intell. Inf. Syst., vol. 18, no. 3, pp. 53-77, Sept. 2012. https://doi.org/10.13088/JIIS.2012.18.3.053
  78. H. Lim and D. W. Kim, "Using mutual information for selecting features in multi-label classification," J. KISS : Softw. Appl., vol. 39, no. 10, pp. 806-811, Oct. 2012.
  79. J. Yoon, J. Lee, and D. W. Kim, "Feature selection in multi-label classification using NSGA-II algorithm," J. KISS : Softw. Appl., vol. 40, no. 3, pp. 133-140, Mar. 2013.
  80. J. S. Hong, N. Kim, and S. Lee, "A methodology for automatic multi - categorization of single - categorized documents," J. Intell. Inf. Syst., vol. 20, no. 3, pp. 77-92, Sept. 2014. https://doi.org/10.13088/jiis.2014.20.3.077
  81. J. W. Seo, T. S. Shon, J. T. Seo, and J. S. Moon, "A study on the filtering of spam e-mail using n-Gram indexing and support vector machine," J. Korea Inst. Inf. Security & Cryptology, vol. 14, no. 2, pp. 23-33, Apr. 2004.
  82. I. Joe and H. T. Shim, "A SVM-based spam filtering system for short message service (SMS)," The J. Korean Inst. Commun. Inf. Sci., vol. 34, no. 9, pp. 908-913, Sept. 2009.
  83. M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, "A bayesian approach to filtering junk e-mail," in Proc. AAAI Workshop on Learning for Text Categorization, pp. 55-62, Wisconsin, USA, Jul. 1998.
  84. X. Jia, K. Zheng, W. Li, T. Liu, and L. Shang, "Three-way decisions solution to filter spam email: An empirical study," in Proc. Int. Conf. Rough Sets and Current Trends in Comput., pp. 287-296, Chengdu, China, Aug. 2012.
  85. H. J. Kim, J. J. Jung, and G. S. Jo, "Spam - mail filtering system using weighted bayesian classifier," J. KISS : Softw. Appl., vol. 31, no. 8, pp. 1092-1100, Aug. 2004.
  86. H. S. Lee, J. I. Cho, M. H. Jung, and J. S. Moon, "An approach to detect spam e-mail with abnormal character composition," J. Korea Inst. Inf. Security & Cryptology, vol. 18, no. 6A, pp. 129-137, Dec. 2008.
  87. S. Lee, "A splog detection system using support vector systems," J. Korea Inst. Inf. Commun. Eng., vol. 15, no. 1, pp. 163-168, Jan. 2011. https://doi.org/10.6109/jkiice.2011.15.1.163
  88. J. Jung and M. Yoo, "Tag search system using the keyword extraction and similarity evaluation," The J. Korean Inst. Commun. Inf. Sci., vol. 40, no. 12, pp. 2458-2487, Dec. 2015.
  89. B. Liu, "Sentiment analysis and opinion mining," Synthesis Lectures on Human Lang. Technol., vol. 5, no. 1, pp. 1-167, May. 2012. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
  90. B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: Sentiment classification using machine learning techniques," in Proc. ACL- 02 Conf. Empirical Methods in Natural Lang. Process.-Volume 10, pp. 79-86, Philadelphia, USA, Jul. 2002.
  91. P. D. Turney, "Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews," in Proc. 40th Annu. Meeting Assoc. Computational Linguistics, pp. 417-424, Philadelphia, USA, Jul. 2002.
  92. J. M. Wiebe, R. F. Bruce, and T. P. O'Hara, "Development and use of a gold-standard data set for subjectivity classifications," in Proc. 37th Annu. Meeting of the Assoc. Computational Linguistics on Computational Linguistics, pp. 246-253, College Park, USA, Jun. 1999.
  93. M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 168-177, Seattle, USA, Aug. 2004.
  94. H. Chen and D. Zimbra, "AI and opinion mining," IEEE Intell. Syst., vol. 25, no. 3, pp. 74-80, May 2010. https://doi.org/10.1109/MIS.2010.75
  95. N. Jindal and B. Liu, "Mining comparative sentences and relations," in Proc. AAAI, pp. 1331-1336, Boston, USA, Jul. 2006.
  96. J. Kamps, M. J. Marx, R. J. Mokken, and M. Rijke, "Using wordnet to measure semantic orientations of adjectives," in Proc. LREC 2004, pp. 1115-1118, Lisbon, Portugal, May 2004.
  97. S. M. Kim and E. Hovy, "Determining the sentiment of opinions," in Proc. 20th Int. Conf. Computational Linguistics. Association for Computational Linguistics, pp. 1367-1367, Geneva, Switzerland, Aug. 2004.
  98. V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," in Proc. 8th Conf. Eur. Chapter of the Assoc. Computational Linguistics, pp. 147- 181, Madrid, Spain, Jul. 1997.
  99. X. Ding, B. Liu, and L. Zhang, "Entity discovery and assignment for opinion mining applications," in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 1125-1134, Paris, France, Jul. 2009.
  100. E. Yu, Y. Kim, N. Kim, and S. R. Jeong, "Predicting the direction of the stock index by using a domain-specific sentiment dictionary," J. Intell. Inf. Syst., vol. 19, no. 1, pp. 95-110, Mar. 2013.
  101. E. C. Dragut, H. Wang, P. Sistla, C. Yu, and W. Meng, "Polarity consistency checking for domain independent sentiment dictionaries," in Proc. 50th Annu. Meeting of the Assoc. for Computational Linguistics: Long Papers- Volume 1, pp. 997-1005, Jeju Island, Korea, Jul. 2012.
  102. S. Park, W. Lee, and I. C. Moon, "Efficient extraction of domain specific sentiment lexicon with active learning," Pattern Recognition Lett., vol. 56, no. 15, pp. 38-44, Apr. 2015. https://doi.org/10.1016/j.patrec.2015.01.004
  103. A. Go, R. Bhayani, and L. Huang, Twitter sentiment classification using distant supervision, CS224N Project Report, Stanford, 2009.
  104. D. Davidov, O. Tsur, and A. Rappoport, "Enhanced sentiment learning using twitter hashtags and smileys," in Proc. 23rd Int. Conf. Computational Linguistics: Posters, pp. 241-249, Beijing, China, Aug. 2010.
  105. X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, "Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach," in Proc. 20th ACM Int. Conf. Information and Knowledge Management, pp. 1031-1040, Glasgow, UK, Oct. 2011.
  106. J. Bollen, H. Mao, and X. Zeng, "Twitter mood predicts the stock market," J. Computational Sci., vol. 2, no. 1, pp. 1-8, Mar. 2011. https://doi.org/10.1016/j.jocs.2010.12.007
  107. J. S. Lim and J. M. Kim, "An empirical comparison of machine learning models for classifying emotions in korean twitter," J. Korea Multimedia Soc., vol. 17, no. 2, pp. 232-239, Feb. 2014. https://doi.org/10.9717/kmms.2014.17.2.232
  108. Y. Jung, Y. Choi, and S. H. Myaeng, "Determining mood for a blog by combining multiple sources of evidence," in Proc. IEEE/WIC/ACM Web Intell., Int. Conf., pp. 271-274, California, USA, Nov. 2007.
  109. F. Keshtkar and D. Inkpen, "Using sentiment orientation features for mood classification in blogs," in Proc. IEEE NLP-KE 2009, pp. 1-6, Dalian, China, Sept. 2009.
  110. K. H. Y. Lin, C. Yang, and H. H. Chen, "Emotion classification of online news articles from the reader's perspective," in Proc. 2008 IEEE/WIC/ACM Int. Conf. Web Intell. and Intell. Agent Technol.-Volume 01, pp. 220-226, Washington, USA, Dec. 2008.
  111. M. Nam, E. Lee, and J. Shin, "A method for user sentiment classification using instagram hashtags," J. Korea Multimedia Soc., vol. 18, no. 11, pp. 1391-1399, Nov. 2015. https://doi.org/10.9717/kmms.2015.18.11.1391
  112. T. N. Phan and M. Yoo, "Facebook fan page evaluation system based on user opinion mining," The J. Korean Inst. Commun. and Inf. Sci., vol. 40, no. 12, pp. 2488-2490, Dec. 2015. https://doi.org/10.7840/kics.2015.40.12.2488
  113. Y. Kim and M. Song, "A study on analyzing sentiments on movie reviews by multi-level sentiment classifier," J. Intell. Inf. Syst., vol. 22, no. 3, pp. 71-89, Sept. 2016. https://doi.org/10.13088/JIIS.2016.22.3.071
  114. C. Lee, D. Choi, S. Kim, and J. Kang, "Classification and analysis of emotion in korean microblog texts," J. KISS : Databases, vol. 40, no. 3, pp. 159-167, Jun. 2013.
  115. J. W Hwang and Y. Ko, "A korean sentence and document sentiment classification system using sentiment features," J. KIISE : Comput. Practices and Lett., vol. 14, no. 3, pp. 336-340, May 2008.
  116. J. An, J. Bae, N. Han, and M. Song, "A study of 'Emotion Trigger' by text mining techniques," J. Intell. Inf. Syst., vol. 21, no. 2, pp. 69-92, Jun. 2015. https://doi.org/10.13088/jiis.2015.21.2.69
  117. J. Moon, I. Jang, Y. C. Choe, J. G. Kim, and G. Bock, "Case study of big data-based agri-food recommendation system according to types of customers," The J. Korean Inst. Commun. Inf. Sci., vol. 40, no. 5, pp. 903-913, May 2015. https://doi.org/10.7840/kics.2015.40.5.903
  118. D. Kim, W. X. S. Wong, M. Lim, C. Liu, N. Kim, J. Park, W. Kil, and H. Yoon, "A methodology for analyzing public opinion about science and technology issues using text analysis," J. Inf. Technol. Serv., vol. 14, no. 3, pp. 33-48, Sept. 2015.
  119. S. Byun, D. Lee, and N. Kim, "A methodology for identifying issues of user reviews from the perspective of evaluation criteria: Focus on a hotel information site," J. Intell. Inf. Syst., vol. 22, no. 3, pp. 23-43, Sept. 2016. https://doi.org/10.13088/JIIS.2016.22.3.023
  120. D. Kim and N. Kim, "Mapping categories of heterogeneous sources using text analytics," J. Intell. Inf. Syst., vol. 22, no. 4, pp. 193-215, Dec. 2016. https://doi.org/10.13088/JIIS.2016.22.4.193

Cited by

  1. Exploring the Direction of Home Economics Education in Preparation for the Generalization of a One-Person Household vol.57, pp.1, 2019, https://doi.org/10.6115/fer.2019.006
  2. 전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안 vol.23, pp.3, 2017, https://doi.org/10.13088/jiis.2017.23.3.069
  3. 기술 성숙도 및 의존도의 네트워크 분석을 통한 유망 융합 기술 발굴 방법론 vol.24, pp.1, 2017, https://doi.org/10.13088/jiis.2018.24.1.101
  4. 스마트제조를 위한 머신러닝 기반의 설비 오류 발생 패턴 도출 프레임워크 vol.23, pp.2, 2017, https://doi.org/10.7838/jsebs.2018.23.2.097
  5. 완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법 vol.24, pp.2, 2018, https://doi.org/10.13088/jiis.2018.24.2.125
  6. 텍스트마이닝을 이용한 정보보호 연구동향 분석 vol.14, pp.2, 2018, https://doi.org/10.17662/ksdim.2018.14.2.019
  7. Structuring of Unstructured SNS Messages on Rail Services using Deep Learning Techniques vol.23, pp.7, 2017, https://doi.org/10.9708/jksci.2018.23.07.019
  8. 용어 사전의 특성이 문서 분류 정확도에 미치는 영향 연구 vol.37, pp.4, 2017, https://doi.org/10.29214/damis.2018.37.4.003
  9. 설비 오류 유형 구조화를 위한 인공신경망 기반 구절 네트워크 구축 방법 vol.19, pp.6, 2017, https://doi.org/10.7472/jksii.2018.19.6.21
  10. 빅데이터 연구동향 분석: 토픽 모델링을 중심으로 vol.15, pp.1, 2017, https://doi.org/10.17662/ksdim.2019.15.1.001
  11. 프리츠커 건축상과 도시사회에 관한 연구 vol.35, pp.3, 2017, https://doi.org/10.5659/jaik_pd.2019.35.3.21
  12. 머신러닝 및 딥러닝 연구동향 분석: 토픽모델링을 중심으로 vol.15, pp.2, 2017, https://doi.org/10.17662/ksdim.2019.15.2.019
  13. 토픽 모델링을 활용한 도서관, 기록관, 박물관간의 연구 주제 분석 vol.50, pp.4, 2017, https://doi.org/10.16981/kliss.50.4.201912.339
  14. 자기주도학습을 위한 이러닝 콘텐츠 검색 지원 시스템 설계 vol.12, pp.1, 2017, https://doi.org/10.14702/jpee.2020.073
  15. 텍스트마이닝을 활용한 사용자 요구사항 우선순위 도출 방법론 : 온라인 게임을 중심으로 vol.43, pp.3, 2020, https://doi.org/10.11627/jkise.2020.43.3.112
  16. 온라인 뉴스를 이용한 기업평판 구성요인 탐색 및 지수 개발 연구 : 감성분석과 AHP적용 vol.19, pp.6, 2017, https://doi.org/10.9716/kits.2020.19.6.145
  17. 텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로 vol.27, pp.1, 2017, https://doi.org/10.13088/jiis.2021.27.1.047
  18. 토픽 모델링을 이용한 지속가능패션 연구 동향 분석 vol.29, pp.4, 2017, https://doi.org/10.29049/rjcc.2021.29.4.538