• Title/Summary/Keyword: latent dirichlet allocation (LDA)

Search Result 175, Processing Time 0.022 seconds

A Text Mining Analysis of Attributes for Satisfaction and Effect of Consumer Ratings to Korea and China Duty Free Stores - Focusing on Chinese Tourists - (텍스트 마이닝을 통한 한국과 중국 시내면세점 만족 속성과 소비자 평점에 미치는 영향 분석 -중국인 관광객을 중심으로)

  • Yang, DaSom;Kim, Jong Uk
    • Journal of Digital Convergence
    • /
    • v.18 no.8
    • /
    • pp.1-9
    • /
    • 2020
  • This study aims to find new attributes by analyzing Korea and China duty free store online reviews and examine the influence of these attributes on star ratings(satisfaction)of duty free store. For study, we used Dazhong Dianping that largest online review site in China. Using R, we analyzed 5,659 reviews of Korea duty free store and 4,051 reviews of China duty free store. According to the analysis, Sale, Food and Membership attributes had a positive effect on star rating of Korea duty free store. Sale, Product, Airport, Food and Membership had a positive effect on star rating of China duty free store. This study has identified new factors such as food that showed the importance of providing space of restaurants while shopping at duty free store. This study has contributed to the existing literature by finding new attribute such as food. Practically, this finding will help to duty free industry workers better understand the impact of providing space of restaurants on duty free store.

A Topic Analysis of College Education Using Big Data of News Articles (뉴스 빅데이터를 통해 검토한 대학교육의 토픽 분석)

  • Yang, Ji-Yeon;Koo, Jeong-Ho
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.11-20
    • /
    • 2021
  • This study extracts topics related to university education through newspaper articles and analyzes the characteristics of each topic and the reporting patterns of each newspaper. The 9 topics were discovered using LDA. Topic 1 and Topic 3 are related to university support projects for education, but Topic 3 is focused on local universities. Topic 2 is about university education after COVID-19, Topic 4 teaching-learning methods, Topic 5 government policies, Topic 6 the high school education contribution university support projects, Topic 7 the university education vision, Topic 8 internationalization, and Topic 9 the entrance exam. The Chosun Ilbo, Kyunghyang, and Hankyoreh reported a lot of articles associated to lectures after COVID-19, government policies, and comments on university education. Relevant articles since 2016 have been analyzed by newspaper type and before/after COVID-19 through which differences in the topics were studied and discussed. These findings would suggest a basic policy guideline for university education and imply that the positive and negative effects of the media need to be considered.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Research Trend Analysis of Publications in the Journal of Home Economics Education Association Using Network Text Analysis (네트워크 텍스트 분석을 이용한 한국가정과교육학회지 논문의 연구 동향 분석)

  • Lee, Yoon-Jung;Kim, Eun Jeung;Kim, Ji sun
    • Journal of Korean Home Economics Education Association
    • /
    • v.31 no.4
    • /
    • pp.1-18
    • /
    • 2019
  • The purpose of this study was to analyze the research trend in home economics education using network text analysis method. The 586 research articles published in the Journal of Home Economics Education Association between July, 2003 and December 2018 were examined using Neckinger 4, a social network analysis software. The frequency and centrality measures(degree centrality, closeness centrality, and betweenness centrality) were calculated for the words appeared throughout the whole period, and the centrality analysis and LAD(Latent Dirichlet Allocation) were conducted for the four sub-periods. The results are as follows: first, the most frequently appeared words are parents, culture, unit, health, career, consumption, practicality, etc. The words such as parents and management scored high in degree centrality; parents and male students in closeness centrality; and male students and units in betweenness centrality. Second, when divided into four periods, the words such as education, family, purpose, class, middle school, and school appeared most frequently across the periods; but some words such as 'purpose' (in period 3 and 4), or 'process' (in period 4) were salient only in certain periods. Third, the words with high centrality were consistent regardless of the types of centrality within each period. Fourth, the topic analysis using LAD showed that curriculum, textbook, family healthiness, teaching-learning, evaluation, dietary life, appearance management, and consumption were the topics consistently appeared across all periods. The topics have become diversified and deepened. New topics such as teacher training and safety appeared in later periods, possibly due to the curriculum and national policy changes, and housing as a less represented topic is suggested as an area that needs further research attention. This study has implication in that it allows researchers to identify the major research interests and the trends in research by researchers in home economic education.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.