• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.024 seconds

Analysis of Trends of Critical Issues and Topics in the Service Sector: Comparing YouTube Videos and Research Publications (서비스 분야의 주요 이슈와 주제에 대한 흐름 분석: 유튜브 동영상과 학술연구 비교)

  • EuiBeom Jeong;DonHee Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.59-76
    • /
    • 2023
  • This study examines critical issues and topics related to services using YouTube videos and research publications. We analyzed 2,853 YouTube videos and 19,973 research papers related to services, released during the 2013-June, 2023 period, using text mining and network analysis. In addition, the collected data was divided into pre- and post-COVID-19 pandemic periods to explore how key issues and topics regarding services have changed. These papers were sequentially analyzed through text mining and network construction and procedures. The results indicate that the central themes of YouTube videos were IT, data, and solution, while academic research focused on service quality, quality, and customer satisfaction. Regarding ego network analysis, the key issues in YouTube video contents revolved primarily around words related to the service industry. Although it was found that they generally lacked specific industry fields, academic papers explored diverse issues in various service fields. The results of this study can be utilized to understand changes in customer concerns in the service industry from practical and academic perspectives.

Development of SVM-based Construction Project Document Classification Model to Derive Construction Risk (건설 리스크 도출을 위한 SVM 기반의 건설프로젝트 문서 분류 모델 개발)

  • Kang, Donguk;Cho, Mingeon;Cha, Gichun;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.841-849
    • /
    • 2023
  • Construction projects have risks due to various factors such as construction delays and construction accidents. Based on these construction risks, the method of calculating the construction period of the construction project is mainly made by subjective judgment that relies on supervisor experience. In addition, unreasonable shortening construction to meet construction project schedules delayed by construction delays and construction disasters causes negative consequences such as poor construction, and economic losses are caused by the absence of infrastructure due to delayed schedules. Data-based scientific approaches and statistical analysis are needed to solve the risks of such construction projects. Data collected in actual construction projects is stored in unstructured text, so to apply data-based risks, data pre-processing involves a lot of manpower and cost, so basic data through a data classification model using text mining is required. Therefore, in this study, a document-based data generation classification model for risk management was developed through a data classification model based on SVM (Support Vector Machine) by collecting construction project documents and utilizing text mining. Through quantitative analysis through future research results, it is expected that risk management will be possible by being used as efficient and objective basic data for construction project process management.

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

Comparative Study of User Reactions in OTT Service Platforms Using Text Mining (텍스트 마이닝을 활용한 OTT 서비스 플랫폼별 사용자 반응 비교 연구)

  • Soonchan Kwon;Jieun Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.43-54
    • /
    • 2024
  • This study employs text mining techniques to compare user responses across various Over-The-Top (OTT) service platforms. The primary objective of the research is to understand user satisfaction with OTT service platforms and contribute to the formulation of more effective review strategies. The key questions addressed in this study involve identifying prominent topics and keywords in user reviews of different OTT services and comprehending platform-specific user reactions. TF-IDF is utilized to extract significant words from positive and negative reviews, while BERTopic, an advanced topic modeling technique, is employed for a more nuanced and comprehensive analysis of intricate user reviews. The results from TF-IDF analysis reveal that positive app reviews exhibit a high frequency of content-related words, whereas negative reviews display a high frequency of words associated with potential issues during app usage. Through the utilization of BERTopic, we were able to extract keywords related to content diversity, app performance components, payment, and compatibility, by associating them with content attributes. This enabled us to verify that the distinguishing attributes of the platforms vary among themselves. The findings of this study offer significant insights into user behavior and preferences, which OTT service providers can leverage to improve user experience and satisfaction. We also anticipate that researchers exploring deep learning models will find our study results valuable for conducting analyses on user review text data.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Study on the social issue sentiment classification using text mining (텍스트마이닝을 이용한 사회 이슈 찬반 분류에 관한 연구)

  • Kang, Sun-A;Kim, Yoo Sin;Choi, Sang Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1167-1173
    • /
    • 2015
  • The development of information and communication technology like SNS, blogs, and bulletin boards, was provided a variety of places where you can express your thoughts and comments and allowing Big Data to grow, many people reveal the opinion of the social issues in SNS such as Twitter. In this study, we would like to pre-built sentimental dictionary about social issues and conduct a sentimental analysis with structured dictionary, to gather opinions on social issues that are created on twitter. The data that I used is "bikini", "nakkomsu" including tweet. As the result of analysis, precision is 61% and F1- score is 74%. This study expect to suggest the standard of dictionary construction allowing you to classify positive/negative opinion on specific social issues.

Comparative Analysis of Happiness and Unhappiness using Topic Modeling: Korea, U.S., U.K., and Brazil (토픽모델링을 이용한 국가간 행복과 불행 토픽 비교 분석 : 한국, 미국, 영국, 브라질)

  • Lee, So-Hyun;Lee, Yun-Kyung;Song, Eui-ryung;Kim, Hee-Woong
    • Knowledge Management Research
    • /
    • v.18 no.3
    • /
    • pp.101-124
    • /
    • 2017
  • Recently, 'happiness' has become a major issue of national level, exceeding the matter of personal issue. Especially, Korea has actually increased its GDP by focusing on the economic growth for decades, and now it has achieved the economic/technical development as an IT power. However, Korean people's satisfaction with life called 'happiness index' is moving back every year. Even though there have been continuous efforts to enhance the national happiness by mentioning it as an essential issue in the national level, there are not many researches related to it. This study drew measures to enhance happiness by extracting happiness factors and unhappiness factors of Korea through social network service. Especially, it aims to analyze, compare, and apply happiness factors and unhappiness factors of three countries such as the US, UK, and Brazil with higher happiness indexes than Korea. For this, through the topic modeling of text mining technique, postings including keywords about happiness and unhappiness were collected/analyzed from Twitter of Korea, the US, UK, and Brazil. The significance of this study is to discuss measures to increase happiness and to decrease unhappiness by mining/analyzing the actual public opinions about happiness and unhappiness in four countries like Korea, the US, UK, and Brazil by using the topic modeling. Through this, the quality of life of Korean people could be improved by suggesting measures to enhance happiness and to decrease unhappiness in the level of individual, family, society, and government.

Project Failure Main Factors Analysis using Text Mining in Audit Evaluation (감리결과에 텍스트마이닝 기법을 적용한 프로젝트 실패 주요요인 분석)

  • Jang, Kyoungae;Jang, Seong Yong;Kim, Woo-Je
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.468-474
    • /
    • 2015
  • Corporations should make efforts to recognize the importance of projects, identify their failure factors, prevent risks in advance, and raise the success rates, because the corporations need to make quick responses to rapid external changes. There are some previous studies on success and failure factors of projects, however, most of them have limitations in terms of objectivity and quantitative analysis based on data gathering through surveys, statistical sampling and analysis. This study analyzes the failure factors of projects based on data mining to find problems with projects in an audit report, which is an objective project evaluation report. To do this, we identified the texts in the paragraph of suggestions about improvement. We made use of the superior classification algorithms in this study, which were NaiveBayes, SMO and J48. They were evaluated in terms of data of Recall and Precision after performing 10-fold-cross validation. In the identified texts, the failure factors of projects were analyzed so that they could be utilized in project implementation.

A SNS Data-driven Comparative Analysis on Changes of Attitudes toward Artificial Intelligence (SNS 데이터 분석을 기반으로 인공지능에 대한 인식 변화 비교 분석)

  • Yun, You-Dong;Yang, Yeong-Wook;Lim, Heui-Seok
    • Journal of Digital Convergence
    • /
    • v.14 no.12
    • /
    • pp.173-182
    • /
    • 2016
  • AI (Artificial Intelligence) has attracted interest as a key element for technological advancement in various fields. In Korea, internet companies are leading the development of AI business technology. Active government funding plans for AI technology has also drawn interest. But not everyone is optimistic about AI. Both positive and negative opinions coexist about AI. However, attempts on analyzing people's opinions about AI in a quantitative way was scarce. In this study, we used text mining on SNS (Social Networking Service) to collect opinions about AI. And then we performed a comparative analysis about whether people view it as a positive thing or a negative thing and performed a comparative analysis to recognize popular key-words. Based on the results, it was confirmed that the change of key-words and negative posts have increased through time. And through these results, we were able to predict trend about AI.

A Text Mining Approach to the Analysis of Key Factors for Cosmetic Plastic Surgery (텍스트마이닝을 이용한 미용성형 주요 요인에 관한 연구)

  • Lee, So-Hyun;Shon, Saeah;Kim, Hee-Woong
    • Knowledge Management Research
    • /
    • v.20 no.1
    • /
    • pp.45-75
    • /
    • 2019
  • Recently, the growth of beauty industry such as plastic surgery and beauty is continued every year in Korea. With the increased interest in appearance based on the improvement of life standard and the development of media, people's perception of cosmetic plastic surgery is changing. Now, as the service for consumer satisfaction based on their desire, the perception of plastic surgery medical service is changed to the high value-added industry with the high growth potential. Thus, this study aims to suggest the strategies for providing the medical service that could satisfy customers, by drawing the factors cognized as important when customers aim to get the cosmetic plastic surgery, and then additionally analyzing the relationships of those factors. On top of performing the topic modeling based on customers' comments data of social commerce related to cosmetic plastic surgery, this study also conducted the network analysis for visualizing the relations of each keywords. The drawn main factors were divided by applying the sub-categories of the SERVQUAL theory, and the additional characteristics of plastic surgery were shown by referring the relevant previous researches. Moreover, the interview with the cosmetic plastic surgery specialists (plastic surgeons) and customers who actually received the plastic surgery, helped the understanding of the interpretation of each factor and the actual relevant phenomenons. The significance of this study is to draw and discuss the main factors that should be observed by Korean cosmetic plastic surgery medical institutes, by mining and analyzing the opinions of customers interested in the cosmetic plastic surgery and procedure with the use of topic modeling. In other words, the quality of medical service of cosmetic plastic surgery could be improved by presenting the key factors that could be considered by the cosmetic plastic surgery medical service suppliers and also the actual strategies.