• Title/Summary/Keyword: text analytics

Search Result 109, Processing Time 0.023 seconds

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Analyzing Comments of YouTube Video to Measure Use and Gratification Theory Using Videos of Trot Singer, Cho Myung-sub (YouTube 동영상 의견분석을 통한 사용과 충족 이론 측정 : 트로트 가수 조명섭 동영상을 중심으로)

  • Hong, Han-Kook;Leem, Byung-hak;Kim, Sam-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.9
    • /
    • pp.29-42
    • /
    • 2020
  • The purpose of this study is to present a qualitative research method for extracting and analyzing the comments written by YouTube video users. To do this, we used YouTube users' feedback to measure the hedonic, social, and utilitarian gratification of use and gratification theory(UGT) through by using analysis and topic modeling. The result of the measurement found that the first reason why users watch the trot singer, Cho Myung-sub's video in the KBS Korean broadcasting channel is to achieve hedonic gratification with high frequency. In word-document network analysis, the degree of centrality was high in words, such as 'cheering', 'thank you', 'fighting', and 'best'. Betweenness centrality is similar to the degree of centrality. Eigenvector centrality also shows that words such as 'love', 'heart', and 'thank you' are the most influential words of users' opinions. The results of the centrality analysis present that the majority of video users show their 'love', 'heart' and 'thank you' for the video. it indicates that the high words in centrality analysis is consistent with the high frequency words of hedonic and social gratification dimension of the UGT. The study has research methodological implication that shed light on the motivations for watching YouTube videos with UGT using text mining techniques that automate qualitative analysis, rather than following a survey-based structural equation model.

Comparing Corporate and Public ESG Perceptions Using Text Mining and ChatGPT Analysis: Based on Sustainability Reports and Social Media (텍스트마이닝과 ChatGPT 분석을 활용한 기업과 대중의 ESG 인식 비교: 지속가능경영보고서와 소셜미디어를 기반으로)

  • Jae-Hoon Choi;Sung-Byung Yang;Sang-Hyeak Yoon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.347-373
    • /
    • 2023
  • As the significance of ESG (Environmental, Social, and Governance) management amplifies in driving sustainable growth, this study delves into and compares ESG trends and interrelationships from both corporate and societal viewpoints. Employing a combination of Latent Dirichlet Allocation Topic Modeling (LDA) and Semantic Network Analysis, we analyzed sustainability reports alongside corresponding social media datasets. Additionally, an in-depth examination of social media content was conducted using Joint Sentiment Topic Modeling (JST), further enriched by Semantic Network Analysis (SNA). Complementing text mining analysis with the assistance of ChatGPT, this study identified 25 different ESG topics. It highlighted differences between companies aiming to avoid risks and build trust, and the general public's diverse concerns like investment options and working conditions. Key terms like 'greenwashing,' 'serious accidents,' and 'boycotts' show that many people doubt how companies handle ESG issues. The findings from this study set the foundation for a plan that serves key ESG groups, including businesses, government agencies, customers, and investors. This study also provide to guide the creation of more trustworthy and effective ESG strategies, helping to direct the discussion on ESG effectiveness.

A Study on Sentiment Score of Healthcare Service Quality on the Hospital Rating (의료 서비스 리뷰의 감성 수준이 병원 평가에 미치는 영향 분석)

  • Jee-Eun Choi;Sodam Kim;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.20 no.2
    • /
    • pp.111-137
    • /
    • 2018
  • Considering the increase in health insurance benefits and the elderly population of the baby boomer generation, the amount consumed by health care in 2020 is expected to account for 20% of US GDP. As the healthcare industry develops, competition among the medical services of hospitals intensifies, and the need of hospitals to manage the quality of medical services increases. In addition, interest in online reviews of hospitals has increased as online reviews have become a tool to predict hospital quality. Consumers tend to refer to online reviews even when choosing healthcare service providers and after evaluating service quality online. This study aims to analyze the effect of sentiment score of healthcare service quality on hospital rating with Yelp hospital reviews. This study classifies large amount of text data collected online primarily into five service quality measurement indexes of SERVQUAL theory. The sentiment scores of reviews are then derived by SERVQUAL dimensions, and an econometric analysis is conducted to determine the sentiment score effects of the five service quality dimensions on hospital reviews. Results shed light on the means of managing online hospital reputation to benefit managers in the healthcare and medical industry.

The Effect of Mobile Advertising Platform through Big Data Analytics: Focusing on Advertising, and Media Characteristics (빅데이터 분석을 통한 모바일 광고플랫폼의 광고효과 연구: 광고특성, 매체특성을 중심으로)

  • Bae, Seong Deok;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.37-57
    • /
    • 2018
  • With the spread of smart phones, interest in mobile media is on the increase as useful media recently. Mobile media is assessed as having differentiated advantages from existing media in that not only can they provide consumers with desired information anytime and anywhere but also real-time interaction is possible in them. So far, studies on mobile advertising were mostly researches analyzing satisfaction with, and acceptance of, mobile advertising based on survey, researches focusing on the factors affecting acceptance of mobile advertising messages and researches verifying the effect of mobile advertising on brand recall, advertising attitude and brand attitude through experiments. Most of the domestic mobile advertising studies related to advertisement effect and advertisement attitude have been conducted through experiments and surveys. The advertising effectiveness measure of the mobile ad used the attitude of the advertisement, purchase intention, etc. To date, there have been few studies on the effects of mobile advertising on actual advertising data to prove the characteristics of the advertising platform and to prove the relationship between the factors influencing the advertising effect and the factors. In order to explore advertising effect of mobile advertising platform currently commercialized, this study defined advertising characteristics and media characteristics from the perspective of advertiser, advertising platform and publisher and analyzed the influence of each characteristic on advertising effect. As the advertisement characteristics, we classified advertisement format classified by bar type and floating type, and advertisement material classified by image and text. We defined advertisement characteristics of advertisement platform as Hedonic and Utilitarian media characteristics. As a dependent variable, we use CTR, which is the ratio of response (click) to ad exposure. The theoretical background and the analysis of the mobile advertising business, the hypothesis that the advertisement effect is different according to the advertisement specification, the advertisement material, In the ad standard, bar ads are classified as static framing, Floating ads can be categorized as dynamic framing, and the hypothetical definition of floating advertisements, which are high-profile dynamic framing ads, is highly responsive. In advertising, images with high salience are defined to have higher ad response than text. In the media characteristics classified as practical / hedonic type, it is defined that the hedonic type media has a more relaxed tendency than the practical media, and there is a high possibility of receiving various information because there is no clear target. In addition, image material and hedonic media are defined to be highly effective in the interaction between advertisement specification and advertisement material, advertisement specifications and media characteristics, and advertisement material and media characteristics. As the result of regression analysis on each characteristic, material standard, which is a characteristic of mobile advertisement, and media characteristics separated into 'Hedonic' and 'Utilitarian' had significant influence on advertisement effect and mutual interaction effect was also confirmed. In the mobile advertising standard, the advertising effect of the floating advertisement is higher than that of the bar advertisement, Floating ads were more effective than text ads for image ads. In addition, it was confirmed that the advertising effect is higher in the practical media than the hedonic media. The research was carried out with the big data collected from the mobile advertising platform, and it was possible to grasp the advertising effect of the measure index standard which is used in the practical work which could not be grasped in the previous research. In other words, the study was conducted using the CTR, which is a measure of the effectiveness of the advertisement used in the online advertisement and the mobile advertisement, which are not dependent on the attitude of the ad, the attitude of the brand, and the purchase intention. This study suggests that CTR is used as a dependent variable of advertising effect based on actual data of mobile ad platform accumulated over a long period of time. The results of this study is expected to contribute to establishment of optimum advertisement strategy such as creation of advertising materials and planning of media which suit advertised products at the time of mobile advertisement.

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

A Study on the Conceptual Changes of Extra-solar Planet in University Students Using Text-Mining Techniques (텍스트마이닝을 활용한 대학생들의 외계행성 개념 변화 연구)

  • Han, Shin;Kim, Yong-Ki;Kim, Hyoungbum
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.13 no.3
    • /
    • pp.305-316
    • /
    • 2020
  • This study aimed to analyze the conception of an extra-solar planet perceived by university students. To conduct this, we developed an extra-solar planet education program and questionnaires which help to figure out changes between before and after the program, and then applied them to the targeted students. The results of the study are as follows. First, as to the conception of an extra-solar planet, participants understood it merely as a planet outside the solar system before they got training. However, they expanded it to the one revolving around a star that appears outside the solar system based on keywords after the training. Second, they gave brief responses regarding exploration strategies (e.g., observing the extra-solar planet by using the Doppler effect, dietary phenomenon, and gravitational lens) based on indirect experiences they encountered in the media. The responses indicated their lack of concept of the extra-solar planet exploration methods. However, their recognition of the extra-solar planet observation became concrete while students learned about the exploration of the extra-solar planet. Third, they were expanding the importance of the exoplanet observation simply beyond the discovery of extraterrestrial life to the creative process and research methods, including the solar system and the development of humanity. Fourth, they recognized that exoplanet education is necessary for curriculum as it will be able to bring about students' interest and curiosity as well as scientific knowledge if contents related to the extra-solar planet appear in the earth science curriculum.

Proposal of Promotion Strategy of Mobile Easy Payment Service Using Topic Modeling and PEST-SWOT Analysis (모바일 간편 결제 서비스 활성화 전략 : 토픽 모델링과 PEST - SWOT 분석 방법론을 기반으로)

  • Park, Seongwoo;Kim, Sehyoung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.365-385
    • /
    • 2022
  • The easy payment service is a payment and remittance service that uses a simple authentication method. As online transactions have increased due to COVID-19, the use of an easy payment service is increasing. At the same time, electronic financial industries such as Naver Pay, Kakao Pay, and Toss are diversifying the competition structure of the easy payment market; meanwhile overseas fintech companies PayPal and Alibaba have a unique market share in their own countries, while competition is intensifying in the domestic easy payment market, as there is no unique market share. In this study, the participants in the easy payment market were classified as electronic financial companies, mobile phone manufacturers, and financial companies, and a SWOT analysis was conducted on the representative services in each industry. The analysis examined the user reviews of Google Play Store via a topic modeling analysis, and it employed positive topics as strengths and negative topics as weaknesses. In addition, topic modeling was conducted by dividing news articles into political, economic, social, and technology (PEST) articles to derive the opportunities and threats to easy payment services. Through this research, we intend to confirm the service capabilities of easy payment companies and propose a service activation strategy that allows gaining the upper hand in the market.

Analyzing TripAdvisor application reviews to enable smart tourism : focusing on topic modeling (스마트 관광 활성화를 위한 트립어드바이저 애플리케이션 리뷰 분석 : 토픽 모델링을 중심으로)

  • YuNa Lee;MuMoungCho Han;SeonYeong Yu;MeeQi Siow;Mijin Noh;YangSok Kim
    • Smart Media Journal
    • /
    • v.12 no.8
    • /
    • pp.9-17
    • /
    • 2023
  • The development of information and communication technology and the improvement of the development and dissemination of smart devices have caused changes in the form of tourism, and the concept of smart tourism has since emerged. In this regard, researches related to smart tourism has been conducted in various fields such as policy implementation and surveys, but there is a lack of research on application reviews. This study collects Trip Advisor application review data in the Google Play Store to identify usage of the application and user satisfaction through Latent Dirichlet Allocation (LDA) topic modeling. The analysis results in four topics, two of which are positive and the other two are negative. We found that users were satisfied with the application's recommendation system, but were dissatisfied when the filters they set during search were not applied or that reviews were not published after updates of the application. We suggest more categories can be added to the application to provide users with different experiences. In addition, it is expected that user satisfaction can be improved by identifying problems within the application, including the filter function, and checking the application environment and resolving the error occurring during the application usage.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.