• Title/Summary/Keyword: Word Cloud Method

Search Result 59, Processing Time 0.03 seconds

A Study on Word Cloud Techniques for Analysis of Unstructured Text Data (비정형 텍스트 테이터 분석을 위한 워드클라우드 기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.715-720
    • /
    • 2020
  • In Big data analysis, text data is mostly unstructured and large-capacity, so analysis was difficult because analysis techniques were not established. Therefore, this study was conducted for the possibility of commercialization through verification of usefulness and problems when applying the big data word cloud technique, one of the text data analysis techniques. In this paper, the limitations and problems of this technique are derived through visualization analysis of the "President UN Speech" using the R program word cloud technique. In addition, by proposing an improved model to solve this problem, an efficient method for practical application of the word cloud technique is proposed.

Intelligent Wordcloud Using Text Mining (텍스트 마이닝을 이용한 지능적 워드클라우드)

  • Kim, Yeongchang;Ji, Sangsu;Park, Dongseo;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.325-326
    • /
    • 2019
  • This paper proposes an intelligent word cloud by improving the existing method of representing word cloud by examining the frequency of nouns with text mining technique. In this paper, we propose a method to visually show word clouds focused on other parts, such as verbs, by effectively adding newly-coined words and the like to a dictionary that extracts noun words in text mining. In the experiment, the KoNLP package was used for extracting the frequency of existing nouns, and 80 new words that were not supported were added manually by examining frequency.

  • PDF

Evaluation of Facilitating Factors for Cloud Service by Delphi Method (델파이 기법을 이용한 클라우드 서비스의 개념 정의와 활성화 요인 분석)

  • Suh, Jung-Han;Chang, Suk-Gwon
    • Journal of Information Technology Services
    • /
    • v.11 no.2
    • /
    • pp.107-118
    • /
    • 2012
  • Recently, as the clouding computing begins to receive a great attention from people all over the world, it became the most popular buzz word in recent IT magazines or journal and heard it in many different services or different fields. However, a notion of the cloud service is defined vaguely compared to increasing attentions from others. Generally the cloud service could be understood as a specific service model base on the clouding computing, but the cloud, the cloud computing, the cloud computing service and cloud service, these four all terms are often used without any distinction of its notions and characteristics so that it's difficult to define the exact nature of the cloud service. To explore and analyze the cloud service systematically, an accurate conception and scope have to be preceded. Therefore this study is to firstly clarify its definition by Delpi method using expert group and then tries to provide the foundation needed to enable relative research such as establishing business model or value chain and policies for its activation to set off. For the Delpi, 16 experts participated in several surveys from different fields such industry, academy and research sector. As a result of the research, Characteristics of the Cloud Service are followings : Pay per use, Scalability, Internet centric Virtualization. And the scope as defined including Grid Computing, Utility Computing, Server Based Computing, Network Computing.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

Analysis of Inauguration Address of Previous Korean Presidents Based on Network (네트워크 기반 대한민국 역대 대통령 취임사 분석)

  • Kim, Hak Yong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.11-19
    • /
    • 2021
  • The presidential inaugural address is a very useful means of presenting the national vision and conveying the president's political philosophy and policy direction to the people. For this reason, analyzing the address will help to understand the president him/herself and the presidential times. The address can be analyzed in various academic fields, but in this study, it was considered as only content and analyzed based on the network. It is widely used for word cloud analysis based on the frequency of words appearing in the address. If it is analyzed based on a network, it will be a useful method because it is possible to derive the context contained in the sentence. The entire network of the addresses of past presidents of the Republic of Korea was established and structural factors were presented. The president and political direction were derived by comparatively analyzing the key words derived from the network and the word cloud. The characteristics of the address were presented by comparing and analyzing key words and closeness centrality, which is a structural factor of the network, by constructing a network of each president's inaugural address. It is expected that the network-based analysis of past presidential inaugural addresses can ultimately be used as data for understanding and evaluating presidents.

Malware Analysis Mechanism using the Word Cloud based on API Statistics (API 통계 기반의 워드 클라우드를 이용한 악성코드 분석 기법)

  • Yu, Sung-Tae;Oh, Soo-Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.10
    • /
    • pp.7211-7218
    • /
    • 2015
  • Tens of thousands of malicious codes are generated on average in a day. New types of malicious codes are surging each year. Diverse methods are used to detect such codes including those based on signature, API flow, strings, etc. But most of them are limited in detecting new malicious codes due to bypass techniques. Therefore, a lot of researches have been performed for more efficient detection of malicious codes. Of them, visualization technique is one of the most actively researched areas these days. Since the method enables more intuitive recognition of malicious codes, it is useful in detecting and examining a large number of malicious codes efficiently. In this paper, we analyze the relationships between malicious codes and Native API functions. Also, by applying the word cloud with text mining technique, major Native APIs of malicious codes are visualized to assess their maliciousness. The proposed malicious code analysis method would be helpful in intuitively probing behaviors of malware.

Analysis of Laughter Therapy Trend Using Text Network Analysis and Topic Modeling

  • LEE, Do-Young
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.5 no.4
    • /
    • pp.33-37
    • /
    • 2022
  • Purpose: This study aims to understand the trend and central concept of domestic researches on laughter therapy. For the analysis, this study used total 72 theses verified by inputting the keyword 'laughter therapy' from 2007 to 2021. Research design, data and methodology: This study performed the development and analysis of keyword co-occurrence network, analyzed the types of researches through topic modeling, and verified the visualized word cloud and sociogram. The keyword data that was cleaned through preprocessing, was analyzed in the method of centrality analysis and topic modeling through the 1-mode matrix conversion process by using the NetMiner (version 4.4) Program. Results: The keywords that most appeared for last 14 years were laughter therapy, depression, the elderly, and stress. The five topics analyzed in thesis data from 2007 to 2021 were therapy, cognitive behavior, quality of life, stress, and the elderly. Conclusions: This study understood the flow and trend of research topics of domestic laughter therapy for last 14 years, and there should be continuous researches on laughter therapy, which reflects the flow of time in the future.

Fuzzy Keyword Search Method over Ciphertexts supporting Access Control

  • Mei, Zhuolin;Wu, Bin;Tian, Shengli;Ruan, Yonghui;Cui, Zongmin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.11
    • /
    • pp.5671-5693
    • /
    • 2017
  • With the rapid development of cloud computing, more and more data owners are motivated to outsource their data to cloud for various benefits. Due to serious privacy concerns, sensitive data should be encrypted before being outsourced to the cloud. However, this results that effective data utilization becomes a very challenging task, such as keyword search over ciphertexts. Although many searchable encryption methods have been proposed, they only support exact keyword search. Thus, misspelled keywords in the query will result in wrong or no matching. Very recently, a few methods extends the search capability to fuzzy keyword search. Some of them may result in inaccurate search results. The other methods need very large indexes which inevitably lead to low search efficiency. Additionally, the above fuzzy keyword search methods do not support access control. In our paper, we propose a searchable encryption method which achieves fuzzy search and access control through algorithm design and Ciphertext-Policy Attribute-based Encryption (CP-ABE). In our method, the index is small and the search results are accurate. We present word pattern which can be used to balance the search efficiency and privacy. Finally, we conduct extensive experiments and analyze the security of the proposed method.

Text Mining of Successful Casebook of Agricultural Settlement in Graduates of Korea National College of Agriculture and Fisheries - Frequency Analysis and Word Cloud of Key Words - (한국농수산대학 졸업생 영농정착 성공 사례집의 Text Mining - 주요단어의 빈도 분석 및 word cloud -)

  • Joo, J.S.;Kim, J.S.;Park, S.Y.;Song, C.Y.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.20 no.2
    • /
    • pp.57-72
    • /
    • 2018
  • In order to extract meaningful information from the excellent farming settlement cases of young farmers published by KNCAF, we studied the key words with text mining and created a word cloud for visualization. First, in the text mining results for the entire sample, the words 'CEO', 'corporate executive', 'think', 'self', 'start', 'mind', and 'effort' are the words with high frequency among the top 50 core words. Their ability to think, judge and push ahead with themselves is a result of showing that they have ability of to be managers or managers. And it is a expression of how they manages to achieve their dream without giving up their dream. The high frequency of words such as "father" and "parent" is due to the high ratio of parents' cooperation and succession. Also 'KNCAF', 'university', 'graduation' and 'study' are the results of their high educational awareness, and 'organic farming' and 'eco-friendly' are the result of the interest in eco-friendly agriculture. In addition, words related to the 6th industry such as 'sales' and 'experience' represent their efforts to revitalize farming and fishing villages. Meanwhile, 'internet', 'blog', 'online', 'SNS', 'ICT', 'composite' and 'smart' were not included in the top 50. However, the fact that these words were extracted without omission shows that young farmers are increasingly interested in the scientificization and high-tech of agriculture and fisheries Next, as a result of grouping the top 50 key words by crop, the words 'facilities' in livestock, vegetables and aquatic crops, the words 'equipment' and 'machine' in food crops were extracted as main words. 'Eco-friendly' and 'organic' appeared in vegetable crops and food crops, and 'organic' appeared in fruit crops. The 'worm' of eco-friendly farming method appeared in the food crops, and the 'certification', which means excellent agricultural and marine products, appeared only in the fishery crops. 'Production', which is related to '6th industry', appeared in all crops, 'processing' and 'distribution' appeared in the fruit crops, and 'experience' appeared in the vegetable crops, food crops and fruit crops. To visualize the extracted words by text mining, we created a word cloud with the entire samples and each crop sample. As a result, we were able to judge the meaning of excellent practices, which are unstructured text, by character size.

Research on Satisfaction Evaluation Based on Tourist Big Data

  • Guo, Hanwen;Liu, Ziyang;Jiao, Zeyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.231-244
    • /
    • 2022
  • With the improvement of people's living standards and the development of tourism, tourists have greater freedom in choosing destinations. Therefore, as an indicator of satisfaction with scenic spots, tourist comments are becoming increasingly prominent. This paper aims to compare and analyze the landscape image of the Five Great Mountains in China and provide specific strategies for its development. The online reviews of tourists on the Online Travel Agency (OTA) website about the Five Great Mountains from 2015 to 2018 are collected as research samples. The text analysis method and R language are used to analyze the content of the tourist reviews, while the high-frequency words in the word cloud are used for visual display. In addition, the entropy weight method is used to determine the index weight and tourist satisfaction is evaluated to understand the weaknesses of those scenic spots. The results of the study show that firstly, the tourist satisfaction with the Five Great Mountains is basically consistent with its popularity. Secondly, through weight analysis, tourists pay special attention to the landscape features and environmental health of the scenic area, so that relevant departments should focus on building the landscape characteristics and improving the environmental health of the scenic area. At the same time, the accommodation and service management of the scenic spot cannot be ignored. Finally, according to the analysis results, suggestions are made on how to improve the tourist satisfaction with the Five Great Mountains.