Browse > Article
http://dx.doi.org/10.17703/JCCT.2021.7.4.745

A Study on Data Cleansing Techniques for Word Cloud Analysis of Text Data  

Lee, Won-Jo (Dept. of Industrial Management Eng., Ulsan College)
Publication Information
The Journal of the Convergence on Culture Technology / v.7, no.4, 2021 , pp. 745-750 More about this Journal
Abstract
In Big data visualization analysis of unstructured text data, raw data is mostly large-capacity, and analysis techniques cannot be applied without cleansing it unstructured. Therefore, from the collected raw data, unnecessary data is removed through the first heuristic cleansing process and Stopwords are removed through the second machine cleansing process. Then, the frequency of the vocabulary is calculated, visualized using the word cloud technique, and key issues are extracted and informationalized, and the results are analyzed. In this study, we propose a new Stopword cleansing technique using an external Stopword set (DB) in Python word cloud, and derive the problems and effectiveness of this technique through practical case analysis. And, through this verification result, the utility of the practical application of word cloud analysis applying the proposed cleansing technique is presented.
Keywords
Big Data; Text Analysis; Word Cloud; Python; Stop Words; Visualization; Data Cleansing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Lee, A Study on Word Cloud Techniques for Analysis of Unstructured Text Data, JCCT, vol. 6, No. 3, pp. 337-341, 2021.
2 I. Chun, D. Park, Y. Kang, Python and data science, Saengneun Publishing, pp. 222-233, 2019.
3 M. Chi, S. Lin, S. Chen, C. Lin, T. Lee, Morphable word Clouds for Time-Varying Text Data Visualization, IEEE, 2015.
4 M. Han, Y. Kim, C. Lee, Analysis of News Regarding New southeastem Airport Using Text Mining Techniques, Smart Media Journal, Vol. 6, No. 1, 2017.
5 Jong Suk Lee and 3 others, Big data analysis of civil complaint texts using R language, 2020.
6 Dongnyeok Sim, Research on ICT issue detection and analysis methodology using text data, 2020.
7 Software Engineering Center Webzine Materials, Big data purification process, 2019.
8 Jongyong LEE, A Study on Tourism Analysis in Uijeongbu Region Using Big Data, JCCT, vol. 6, No. 1, pp. 413-419, 2020.
9 text mining, Biochemistry Encyclopedia
10 Sunghuk Moon, Big data environment analysis and research on ways to secure global competitiveness, JCCT, vol. 5 No. 2, pp. 361-367
11 Sejong Oh, R data analysis for everyone, R data analysis for everyone, Hanbit Media, 2019.
12 Web Mining, IT Glossary, Korea Information and Communication Technology Association
13 J. Lee, D. Yun, S. O, C. Lee, A Big Data Analysis of Civel Complaint Texts Using R Language, KIICE, 2020.
14 Dictionary of current affairs.
15 Kumar, P. Thakur, K. Gupta, and A. Pal, 2015, Text mining approach to analyse the relation between obesity and breast cancer data, ILNS
16 Insun Lee and 1 others, Unstructured data analysis and visualization, Korean Psychology Association, 2018.
17 Giseop Noh, An Analysis on Internet Information using Real Time Search Words, JCCT, vol. 4, No. 4, pp. 337-341, 2018.