• Title/Summary/Keyword: Review data mining

Search Result 275, Processing Time 0.025 seconds

Perception and Appraisal of Urban Park Users Using Text Mining of Google Maps Review - Cases of Seoul Forest, Boramae Park, Olympic Park - (구글맵리뷰 텍스트마이닝을 활용한 공원 이용자의 인식 및 평가 - 서울숲, 보라매공원, 올림픽공원을 대상으로 -)

  • Lee, Ju-Kyung;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • The study aims to grasp the perception and appraisal of urban park users through text analysis. This study used Google review data provided by Google Maps. Google Maps Review is an online review platform that provides information evaluating locations through social media and provides an understanding of locations from the perspective of general reviewers and regional guides who are registered as members of Google Maps. The study determined if the Google Maps Reviews were useful for extracting meaningful information about the user perceptions and appraisals for parks management plans. The study chose three urban parks in Seoul, South Korea; Seoul Forest, Boramae Park, and Olympic Park. Review data for each of these three parks were collected via web crawling using Python. Through text analysis, the keywords and network structure characteristics for each park were analyzed. The text was analyzed, as were park ratings, and the analysis compared the reviews of residents and foreign tourists. The common keywords found in the review comments for the three parks were "walking", "bicycle", "rest" and "picnic" for activities, "family", "child" and "dogs" for accompanying types, and "playground" and "walking trail" for park facilities. Looking at the characteristics of each park, Seoul Forest shows many outdoor activities based on nature, while the lack of parking spaces and congestion on weekends negatively impacted users. Boramae Park has the appearance of a city park, with various facilities providing numerous activities, but reviewers often cited the park's complexity and the negative aspects in terms of dog walking groups. At Olympic Park, large-scale complex facilities and cultural events were frequently mentioned, emphasizing its entertainment functions. Google Maps Review can function as useful data to identify parks' overall users' experiences and general feelings. Compared to data from other social media sites, Google Maps Review's data provides ratings and understanding factors, including user satisfaction and dissatisfaction.

A Study on analysis of severity-adjustment length of stay in hospital for community-acquired pneumonia (지역사회획득 폐렴 환자의 중증도 보정 재원일수 분석)

  • Kim, Yoo-Mi;Choi, Yun-Kyoung;Kang, Sung-Hong;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.3
    • /
    • pp.1234-1243
    • /
    • 2011
  • Our study was carried out to develop the severity-adjustment model for length of stay in hospital for community-acquired pneumonia so that we analysed the factors on the variation in length of stay(LOS). The subjects were 5,353 community-acquired pneumonia inpatients of the Korean National Hospital Discharge In-depth Injury Survey data from 2004 through 2006. The data were analyzed using t-test and ANOVA and the severity-adjustment model was developed using data mining technique. There are differences according to gender, age, type of insurance, type of admission, but there is no difference of whether patients died in hospital. After yielding the standardized value of the difference between crude and expected length of stay, we analysed the variation of length of stay for community-acquired pneumonia. There was variation of LOS in regional differences and insurance type, though there was no variation according whether patients receive their care in their residences. The variation of length of stay controlling the case mix or severity of illness can be explained the factors of provider. This supply factors in LOS variations should be more studied for individual practice style or patient management practices and healthcare resources or environment. We expect that the severity-adjustment model using administrative databases should be more adapted in other diseases in practical.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

The Importance of Employee's Perceptions When Conducting a Company's CSR Strategy : The Concept of 'Authenticity' (조직의 CSR 전략 이행과정에서 직원 인식 중요성 : '진정성' 개념을 바탕으로)

  • Jung, Ji-Young;Kim, Sang-Joon
    • Korean small business review
    • /
    • v.43 no.4
    • /
    • pp.27-57
    • /
    • 2021
  • How does authenticity influence the process that conducts a company's CSR Strategy? Authenticity, an internal/external alignment condition that an employee feels in relation to an organization, means the decision on how true and beneficial to employees through their experiences, such as thoughts and emotions. Also, it can be understood as a process of meaning formation between the organization's strategy to conduct CSR and the perception of employees conducting CSR. To prove the relation between authenticity and CSR clearly, we used various techniques like Text Mining, Topic Modeling and Semantic network analysis about O corporation's 657 review data, from 2015 to 2021. As a result of the analysis, we find out the special issues and types. The analysis shows that the issue concerning the 'external image' is the biggest characteristic of authenticity perception in other conditions. Furthermore, the types of authenticity perception evaluations are largely divided into acceptance and rejection, in detail, five categories. This study indicates that organizations should consider both external and internal conditions when establishing CSR strategies. In addition, it is necessary to be an interactive circular relationship between the organization and employee, collecting and reflecting employee's perceptions. Finally, this study proposes ways to overcome problems related to interaction.

A Study on the Countmeasures of the Korean Pharmaceutical/Bio Industry to the EU Corporate Sustainability Due Diligence Directive, by using Text Mining (텍스트 마이닝을 활용한 국내 제약·바이오 업종의 EU 공급망 실사법 대응 방안 연구)

  • Sori Kim;Joonhak Ki
    • Information Systems Review
    • /
    • v.26 no.1
    • /
    • pp.93-117
    • /
    • 2024
  • In February 2022, the EU announced a draft of the EU Corporate Sustainability Due Diligence Directive requiring due diligence and disclosure of information on environmental and human rights risks in corporate supply chains. This study evaluated the ability of 13 Korean pharmaceutical/bio companies to respond to the EU's demand for due diligence in the supply chain and compared it to 13 globally leading pharmaceutical/bio companies which are considered good in environmental and human rights risk management. For comparative analysis, text mining analysis was performed using R. Basic word frequency and concurrent words were analyzed and topic modeling was performed by applying Latent Dirichlet Allocation. As a result of the analysis, it was found that compared to advanced companies, domestic pharmaceutical and bio companies lack negative issue reporting and identification systems and supply chain due diligence implementation processes, and require advancement of data management for environmental and human rights information disclosure. Accordingly, domestic pharmaceutical and bio companies need to prepare differentiated support measures to systematically identify and reduce risks in the supply chain of small and medium-sized businesses beyond simply providing financial support. It is also desirable for the government to provide policy support by mandating Korea's own supply chain environment and human rights due diligence system, along with support for strengthening the ability to respond to due diligence of domestic pharmaceutical and bio companies, such as expert consulting and financial support.

A Study on Smartwatch review data of SNS and sentiment analytical using opinion mining (스마트워치 SNS 리뷰 데이터와 오피니언 마이닝을 통한 감성 분석 처리에 대한 연구)

  • Shin, Donghyun;Choi, YongLak
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.1047-1050
    • /
    • 2015
  • Wearable device, along with IoT(Internet of Things), is considered the core of upcoming generation's convergence technology. Companies are intensely competing one another for prior occupation in the smartwatch market. Consumers that use smartwatch express their preferences by sharing their opinions through SNS(Social Networking Service). Through this study, emotions dictionary is built, which consists of attributes and emotional words related to smartwatch. Based on the emotions dictionary, SNS data has been categorized according to the attributes through opinion data model. Afterwards, overall polarity and attribute polarity of collected data are distinguished through natural language parsing, followed by an analysis of smartwatch reviews. This study will contribute to determination of which attributes of smartwatch to be improved, to arise consumer's interest for individual smartwatch.

  • PDF

Development of Mining model through reproducibility assessment in Adverse drug event surveillance system (약물부작용감시시스템에서 재현성 평가를 통한 마이닝 모델 개발)

  • Lee, Young-Ho;Yoon, Young-Mi;Lee, Byung-Mun;Hwang, Hee-Joung;Kang, Un-Gu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.183-192
    • /
    • 2009
  • ADESS(Adverse drug event surveillance system) is the system which distinguishes adverse drug events using adverse drug signals. This system shows superior effectiveness in adverse drug surveillance than current methods such as volunteer reporting or char review. In this study, we built clinical data mart(CDM) for the development of ADESS. This CDM could obtain data reliability by applying data quality management and the most suitable clustering number(n=4) was gained through the reproducibility assessment in unsupervised learning techniques of knowledge discovery. As the result of analysis, by applying the clustering number(N=4) K-means, Kohonen, and two-step clustering models were produced and we confirmed that the K-means algorithm makes the most closest clustering to the result of adverse drug events.

Exploring the Performance of Multi-Label Feature Selection for Effective Decision-Making: Focusing on Sentiment Analysis (효과적인 의사결정을 위한 다중레이블 기반 속성선택 방법에 관한 연구: 감성 분석을 중심으로)

  • Jong Yoon Won;Kun Chang Lee
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.47-73
    • /
    • 2023
  • Management decision-making based on artificial intelligence(AI) plays an important role in helping decision-makers. Business decision-making centered on AI is evaluated as a driving force for corporate growth. AI-based on accurate analysis techniques could support decision-makers in making high-quality decisions. This study proposes an effective decision-making method with the application of multi-label feature selection. In this regard, We present a CFS-BR (Correlation-based Feature Selection based on Binary Relevance approach) that reduces data sets in high-dimensional space. As a result of analyzing sample data and empirical data, CFS-BR can support efficient decision-making by selecting the best combination of meaningful attributes based on the Best-First algorithm. In addition, compared to the previous multi-label feature selection method, CFS-BR is useful for increasing the effectiveness of decision-making, as its accuracy is higher.

Big Data News Analysis in Healthcare Using Topic Modeling and Time Series Regression Analysis (토픽모델링과 시계열 회귀분석을 활용한 헬스케어 분야의 뉴스 빅데이터 분석 연구)

  • Eun-Jung Kim;Suk-Gwon Chang;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.163-177
    • /
    • 2023
  • This research aims to identify key initiatives and a policy approach to support the industrialization of the sector. The research collected a total of 91,873 news data points relating to healthcare between 2013 to 2022. A total of 20 topics were derived through topic modeling analysis, and as a result of time series regression analysis, 4 hot topics (Healthcare, Biopharmaceuticals, Corporate outlook·Sales, Government·Policy), 3 cold topics (Smart devices, Stocks·Investment, Urban development·Construction) derived a significant topic. The research findings will serve as an important data source for government institutions that are engaged in the formulation and implementation of Korea's policies.