• Title/Summary/Keyword: Review data mining

Search Result 275, Processing Time 0.026 seconds

Analysis of Text Mining of Consumer's Personality Implication Words in Review of Used Transaction Application (중고거래 어플리케이션 <당근마켓> 리뷰텍스트에 나타난 소비자의 인성 함축단어 텍스트마이닝 분석)

  • Jung, Yea-Rin;Ju, Young-Ae
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This study analyzes the use and meaning of consumer personality implication words in the review text of the Used Transaction Application . From of May 2021, the data were collected for the past six months by our Web crawler in Seoul and Gyeonggi Province, and a total of 1368 cases were collected first by random sampling, and finally 570 cases were preprocessed. The results are as follows. First, 48.2% of review texts were related to the personality of consumers even though it was a commercial platform of products. Second, the review text is mainly positive, which formed a text network structure based on the keyword 'gratitude'. Third, the review text, which implies consumer character, was divided into two groups: 'extrovert personality' and 'introvert personality' of consumers. And the individuality of the two groups worked together on the platform. In conclusion, we would like to suggest that consumer personality plays an important role in the platform transaction process, that consumer personality will play a role in the services of the platform in the future, and that consumer personality should be studied from various perspectives.

Applying Keyword Analysis to Predicting Agriculture Product Price Index: The Case of the Chinese Farming Market

  • Wang, Zhi-yuan;Kwon, Ohbyung;Liu, Fan
    • Asia Pacific Journal of Business Review
    • /
    • v.1 no.1
    • /
    • pp.1-22
    • /
    • 2016
  • The prediction of prices of agricultural products in the agriculture IT sector plays a significant role in the economic life of consumers and anyone engaged in agricultural business, and as these prices fluctuate more often than do other prices, the prediction of these prices holds a great deal of research promise. For this reason, academic literature has provided studies on the factors influencing the prices of agricultural products and the price index. However, as these factors vary, they are difficult to predict, resulting in the challenge of acquiring quantitative data. China is one example of a country without a reliable prediction system for prices of agricultural products. Fortunately, disclosed heterogeneous data can be found on the Internet, which allows for the effective collection of factors related to the prediction of these product prices through the use of text mining. The data provided online is valuable in that they reflect the opinions of the general public in real-time. Accordingly, this study aims to use heterogeneous data from the Internet and suggest a model predicting the prices of agricultural products before functional analyses. Toward this end, data analyses were conducted on the Chinese agricultural products market, one of the largest markets in the world.

Latent class model for mixed variables with applications to text data (혼합모드 잠재범주모형을 통한 텍스트 자료의 분석)

  • Shin, Hyun Soo;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.837-849
    • /
    • 2019
  • Latent class models (LCM) are useful tools to draw hidden information from categorical data. This model can also be interpreted as a mixture model with multinomial component distributions. In some cases, however, an available dataset may contain both categorical and count or continuous data. For such cases, we can extend the LCM to a mixture model with both multinomial and other component distributions such as normal and Poisson distributions. In this paper, we consider a LCM for the data containing categorical and count data to analyze the Drug Review dataset which contains categorical responses and text review. From this data analysis, we show that we can obtain more specific hidden inforamtion than those from the LCM only with categorical responses.

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

  • Byun, Sungho;Lee, Donghoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.23-43
    • /
    • 2016
  • As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.

A Study on the Effects of Online Word-of-Mouth on Game Consumers Based on Sentimental Analysis (감성분석 기반의 게임 소비자 온라인 구전효과 연구)

  • Jung, Keun-Woong;Kim, Jong Uk
    • Journal of Digital Convergence
    • /
    • v.16 no.3
    • /
    • pp.145-156
    • /
    • 2018
  • Unlike the past, when distributors distributed games through retail stores, they are now selling digital content, which is based on online distribution channels. This study analyzes the effects of eWOM (electronic Word of Mouth) on sales volume of game sold on Steam, an online digital content distribution channel. Recently, data mining techniques based on Big Data have been studied. In this study, emotion index of eWOM is derived by emotional analysis which is a text mining technique that can analyze the emotion of each review among factors of eWOM. Emotional analysis utilizes Naive Bayes and SVM classifier and calculates the emotion index through the SVM classifier with high accuracy. Regression analysis is performed on the dependent variable, sales variation, using the emotion index, the number of reviews of each game, the size of eWOM, and the user score of each game, which is a rating of eWOM. Regression analysis revealed that the size of the independent variable eWOM and the emotion index of the eWOM were influential on the dependent variable, sales variation. This study suggests the factors of eWOM that affect the sales volume when Korean game companies enter overseas markets based on steam.

Exploratory research based on big data for Improving the revisit rate of foreign tourists and invigorating consumption (외국인 관광객 재방문율 향상과 소비 활성화를 위한 빅데이터 기반의 탐색적 연구)

  • An, Sung-Hyun;Park, Seong-Taek
    • Journal of Industrial Convergence
    • /
    • v.18 no.6
    • /
    • pp.19-25
    • /
    • 2020
  • Big data analytics are indispensable today in various industries and public sectors. Therefore, in this study, we will utilize big data analysis to search for improvement plans for domestic tourism services using the LDA analysis method. In particular, we have tried an exploratory approach that can improve tourist satisfaction, which can improve revisit and service, especially in Seoul, which has the largest number of foreign tourists. In this study, we collected and analyzed statistical data of Seoul City and Korea Tourism Organization and Internet information such as SNS via R. And we utilized text mining methods including LDA. As a result of the analysis, one of the purposes of visiting South Korea by foreigners was gastronomic tourism. We will try to derive measures to improve the quality of services centered on gastronomic tourism.

A Study on the Development and Implementation of a Data-mining Based Prototype for Hospital Bill Claim Reduction System (데이터마이닝 기법을 활용한 의료보험 진료비청구 삭감분석시스템 개발 및 구현에 관한 연구)

  • Yoo, Sang-Jin;Park, Mun-Ro
    • Information Systems Review
    • /
    • v.7 no.1
    • /
    • pp.275-295
    • /
    • 2005
  • Changes in business environment caused by globalization of the world economy and the beginning of the knowledge society forced hospitals to equip with tools for the enhanced competitiveness. In other words, hospitals must aim three targets such as acquisition of advanced medical skills and equipments, improvement of service level for patients, and achievement of superior managerial performance simultaneously. This study has been done to suggest a way to reduce the possibility of hospital bill claim reduction as an alternative for the achievement of superior managerial performance. If the reduction rate of hospital bill claim is high, it will put negative impact on the hospital's revenue stream and hospital's reliability. Thus, if they want to stay competitive, hospitals need to device ways to cut the reduction rate as much as possible. In this study, a prototype system has been developed and implemented to check the possibility to cut the reduction rate through deep analysis of causes of reduction. The prototype first developed utilizing data mining techniques and the relation rules algorithm. Then the prototype was tested its performance using the D hospital's live data.

Data-Mining in Business Performance Database Using Explanation-Based Genetic Algorithms (설명기반 유전자알고리즘을 활용한 경영성과 데이터베이스이 데이터마이닝)

  • 조성훈;정민용
    • Korean Management Science Review
    • /
    • v.18 no.1
    • /
    • pp.135-145
    • /
    • 2001
  • In recent environment of dynamic management, there is growing recognition that information and knowledge management systems are essential for efficient/effective decision making by CEO. To cope with this situation, we suggest the Data-Miming scheme as a key component of integrated information and knowledge management system. The proposed system measures business performance by considering both VA(Value-Added), which represents stakeholder’s point of view and EVA (Economic Value-Added), which represents shareholder’s point of view. To mine the new information & Knowledge discovery, we applied the improved genetic algorithms that consider predictability, understandability (lucidity) and reasonability factors simultaneously, we use a linear combination model for GAs learning structure. Although this model’s predictability will be more decreased than non-linear model, this model can increase the knowledge’s understandability that is meaning of induced values. Moreover, we introduce a random variable scheme based on normal distribution for initial chromosomes in GAs, so we can expect to increase the knowledge’s reasonability that is degree of expert’s acceptability. the random variable scheme based on normal distribution uses statistical correlation/determination coefficient that is calculated with training data. To demonstrate the performance of the system, we conducted a case study using financial data of Korean automobile industry over 16 years from 1981 to 1996, which is taken from database of KISFAS (Korea Investors Services Financial Analysis System).

  • PDF

The Product Recommender System Combining Association Rules and Classification Models: The Case of G Internet Shopping Mall (연관규칙기법과 분류모형을 결합한 상품 추천 시스템: G 인터넷 쇼핑몰의 사례)

  • Ahn, Hyun-Chul;Han, In-Goo;Kim, Kyoung-Jae
    • Information Systems Review
    • /
    • v.8 no.1
    • /
    • pp.181-201
    • /
    • 2006
  • As the Internet spreads, many people have interests in e-CRM and product recommender systems, one of e-CRM applications. Among various approaches for recommendation, collaborative filtering and content-based approaches have been investigated and applied widely. Despite their popularity, traditional recommendation approaches have some limitations. They require at least one purchase transaction per user. In addition, they don't utilize much information such as demographic and specific personal profile information. This study suggests new hybrid recommendation model using two data mining techniques, association rule and classification, as well as intelligent agent to overcome these limitations. To validate the usefulness of the model, it was applied to the real case and the prototype web site was developed. We assessed the usefulness of the suggested recommendation model through online survey. The result of the survey showed that the information of the recommendation was generally useful to the survey participants.

An Intelligent Recommendation System by Integrating the Attributes of Product and Customer in the Movie Reviews (영화 리뷰의 상품 속성과 고객 속성을 통합한 지능형 추천시스템)

  • Hong, Taeho;Hong, Junwoo;Kim, Eunmi;Kim, Minsu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.1-18
    • /
    • 2022
  • As digital technology converges into the e-commerce market across industries, online transactions have activated, and the use of online has increased. With the recent spread of infectious diseases such as COVID-19, this market flow is accelerating, and various product information can be provided to customers online. Providing a variety of information provides customers with various opportunities but causes difficulties in decision-making. The recommendation system can help customers to make a decision more effectively. However, the previous research on recommendation systems is limited to only quantitative data and does not reflect detailed factors of products and customers. In this study, we propose an intelligent recommendation system that quantifies the attributes of products and customers by applying text mining techniques to qualitative data based on online reviews and integrates the existing objective indicators of total star rating, sentiment, and emotion. The proposed integrated recommendation model showed superior performance to the overall rating-oriented recommendation model. It expects the new business value to be created through the recommendation result reflecting detailed factors of products and customers.