• Title/Summary/Keyword: Word Cloud Technique

Search Result 33, Processing Time 0.026 seconds

A Study on Word Cloud Techniques for Analysis of Unstructured Text Data (비정형 텍스트 테이터 분석을 위한 워드클라우드 기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.715-720
    • /
    • 2020
  • In Big data analysis, text data is mostly unstructured and large-capacity, so analysis was difficult because analysis techniques were not established. Therefore, this study was conducted for the possibility of commercialization through verification of usefulness and problems when applying the big data word cloud technique, one of the text data analysis techniques. In this paper, the limitations and problems of this technique are derived through visualization analysis of the "President UN Speech" using the R program word cloud technique. In addition, by proposing an improved model to solve this problem, an efficient method for practical application of the word cloud technique is proposed.

A Study on Data Cleansing Techniques for Word Cloud Analysis of Text Data (텍스트 데이터 워드클라우드 분석을 위한 데이터 정제기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.745-750
    • /
    • 2021
  • In Big data visualization analysis of unstructured text data, raw data is mostly large-capacity, and analysis techniques cannot be applied without cleansing it unstructured. Therefore, from the collected raw data, unnecessary data is removed through the first heuristic cleansing process and Stopwords are removed through the second machine cleansing process. Then, the frequency of the vocabulary is calculated, visualized using the word cloud technique, and key issues are extracted and informationalized, and the results are analyzed. In this study, we propose a new Stopword cleansing technique using an external Stopword set (DB) in Python word cloud, and derive the problems and effectiveness of this technique through practical case analysis. And, through this verification result, the utility of the practical application of word cloud analysis applying the proposed cleansing technique is presented.

Analyzing XR(eXtended Reality) Trends in South Korea: Opportunities and Challenges

  • Sukchang Lee
    • International Journal of Advanced Culture Technology
    • /
    • v.12 no.2
    • /
    • pp.221-226
    • /
    • 2024
  • This study used text mining, a big data analysis technique, to explore XR trends in South Korea. For this research, I utilized a big data platform called BigKinds. I collected data focusing on the keyword 'XR', spanning approximately 14 years from 2010 to 2024. The gathered data underwent a cleansing process and was analyzed in three ways: keyword trend analysis, relational analysis, and word cloud. The analysis identified the emergence and most active discussion periods of XR, with XR devices and manufacturers emerging as key keywords.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

Intelligent Wordcloud Using Text Mining (텍스트 마이닝을 이용한 지능적 워드클라우드)

  • Kim, Yeongchang;Ji, Sangsu;Park, Dongseo;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.325-326
    • /
    • 2019
  • This paper proposes an intelligent word cloud by improving the existing method of representing word cloud by examining the frequency of nouns with text mining technique. In this paper, we propose a method to visually show word clouds focused on other parts, such as verbs, by effectively adding newly-coined words and the like to a dictionary that extracts noun words in text mining. In the experiment, the KoNLP package was used for extracting the frequency of existing nouns, and 80 new words that were not supported were added manually by examining frequency.

  • PDF

Malware Analysis Mechanism using the Word Cloud based on API Statistics (API 통계 기반의 워드 클라우드를 이용한 악성코드 분석 기법)

  • Yu, Sung-Tae;Oh, Soo-Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.10
    • /
    • pp.7211-7218
    • /
    • 2015
  • Tens of thousands of malicious codes are generated on average in a day. New types of malicious codes are surging each year. Diverse methods are used to detect such codes including those based on signature, API flow, strings, etc. But most of them are limited in detecting new malicious codes due to bypass techniques. Therefore, a lot of researches have been performed for more efficient detection of malicious codes. Of them, visualization technique is one of the most actively researched areas these days. Since the method enables more intuitive recognition of malicious codes, it is useful in detecting and examining a large number of malicious codes efficiently. In this paper, we analyze the relationships between malicious codes and Native API functions. Also, by applying the word cloud with text mining technique, major Native APIs of malicious codes are visualized to assess their maliciousness. The proposed malicious code analysis method would be helpful in intuitively probing behaviors of malware.

Publication Trends and Citation Impact of Tribology Research in India: A Scientometric Study

  • Rajendran, P.;Elango, B.;Manickaraj, J.
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.22-34
    • /
    • 2014
  • This paper analyzes India's contribution to world tribology research during the period 2001-2012 based on SCOPUS records. India's global publication share, annual output, and its citation impact of Indian contribution, partner countries, leading contributors, leading institutes, and highly cited papers were analyzed. Additionally, a cloud technique is used to map frequently used single words in titles. It is observed that India ranks in the $7^{th}$ position with a global publication share of 3.83% and an annual average growth rate of 25.58% during the period 2001-2012. The citation impact of India's contribution is 6.05 which decreased from 12.74 during 2001-2006 to 4.62 during 2007-2012. 17.4% of India's total research output was published with international collaboration.

Reexamination of Failure Type in Medical Service: Recoverable and Irrecoverable Service (의료서비스 실패유형 재조명: 복구 가능과 복구 불가능 서비스)

  • Yoon, Sung-Wook;Seo, Mi-Ok
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.11
    • /
    • pp.72-82
    • /
    • 2016
  • Various studies have been done in medical service area but they have just focused on the examination of the relationships between cause and effect variables. This study, thus, empirically analyzed qualitative data regarding medical service problems using word cloud technique. The major results of the paper are as follows. The data reveal ten sources in medical service - forced treatment, excess inspection, misdiagnosis, carelessness, inexperienced service, waiting for emergency, reservation problem, unkindness, process problem, and inconvenience. Major words in the category of irrecoverable service failure are misdiagnosis, careless treatment, and inexperienced service whereas those in recoverable service failure are unkind attitude and negative experience in reservation system. Those who experienced a medical service problem are usually engaged in a public act and they make public protests and legal action against very severe problems. The conclusion of this study also suggests a summary, implication, and agenda of the research.

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

An Analysis of Artificial Intelligence Education Research Trends Based on Topic Modeling

  • You-Jung Ko
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.197-209
    • /
    • 2024
  • This study aimed to analyze recent research trends in Artificial Intelligence (AI) education within South Korea with the overarching objective of exploring the future direction of AI education. For this purpose, an analysis of 697 papers related to AI education published in Research Information Sharing Service (RISS) from 2016 to November 2023 were analyzed using word cloud and Latent Dirichlet Allocation (LDA) topic modeling technique. As a result of the analysis, six major topics were identified: generative AI utilization education, AI ethics education, AI convergence education, teacher perceptions and roles in AI utilization, AI literacy development in university education, and AI-based education and research directions. Based on these findings, I proposed several suggestions, (1) including expanding the use of generative AI in various subjects, (2) establishing ethical guidelines for AI use, (3) evaluating the long-term impact of AI education, (4) enhancing teachers' ability to use AI in higher education, (5) diversifying the curriculum of AI education in universities, (6) analyzing the trend of AI research, and developing an educational platform.