• Title/Summary/Keyword: Customers'

Search Result 7,548, Processing Time 0.035 seconds

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

Analyzing the User Intention of Booth Recommender System in Smart Exhibition Environment (스마트 전시환경에서 부스 추천시스템의 사용자 의도에 관한 조사연구)

  • Choi, Jae Ho;Xiang, Jun-Yong;Moon, Hyun Sil;Choi, Il Young;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.153-169
    • /
    • 2012
  • Exhibitions have played a key role of effective marketing activity which directly informs services and products to current and potential customers. Through participating in exhibitions, exhibitors have got the opportunity to make face-to-face contact so that they can secure the market share and improve their corporate images. According to this economic importance of exhibitions, show organizers try to adopt a new IT technology for improving their performance, and researchers have also studied services which can improve the satisfaction of visitors through analyzing visit patterns of visitors. Especially, as smart technologies make them monitor activities of visitors in real-time, they have considered booth recommender systems which infer preference of visitors and recommender proper service to them like on-line environment. However, while there are many studies which can improve their performance in the side of new technological development, they have not considered the choice factor of visitors for booth recommender systems. That is, studies for factors which can influence the development direction and effective diffusion of these systems are insufficient. Most of prior studies for the acceptance of new technologies and the continuous intention of use have adopted Technology Acceptance Model (TAM) and Extended Technology Acceptance Model (ETAM). Booth recommender systems may not be new technology because they are similar with commercial recommender systems such as book recommender systems, in the smart exhibition environment, they can be considered new technology. However, for considering the smart exhibition environment beyond TAM, measurements for the intention of reuse should focus on how booth recommender systems can provide correct information to visitors. In this study, through literature reviews, we draw factors which can influence the satisfaction and reuse intention of visitors for booth recommender systems, and design a model to forecast adaptation of visitors for booth recommendation in the exhibition environment. For these purposes, we conduct a survey for visitors who attended DMC Culture Open in November 2011 and experienced booth recommender systems using own smart phone, and examine hypothesis by regression analysis. As a result, factors which can influence the satisfaction of visitors for booth recommender systems are the effectiveness, perceived ease of use, argument quality, serendipity, and so on. Moreover, the satisfaction for booth recommender systems has a positive relationship with the development of reuse intention. For these results, we have some insights for booth recommender systems in the smart exhibition environment. First, this study gives shape to important factors which are considered when they establish strategies which induce visitors to consistently use booth recommender systems. Recently, although show organizers try to improve their performances using new IT technologies, their visitors have not felt the satisfaction from these efforts. At this point, this study can help them to provide services which can improve the satisfaction of visitors and make them last relationship with visitors. On the other hands, this study suggests that they managers along the using time of booth recommender systems. For example, in the early stage of the adoption, they should focus on the argument quality, perceived ease of use, and serendipity, so that improve the acceptance of booth recommender systems. After these stages, they should bridge the differences between expectation and perception for booth recommender systems, and lead continuous uses of visitors. However, this study has some limitations. We only use four factors which can influence the satisfaction of visitors. Therefore, we should development our model to consider important additional factors. And the exhibition in our experiments has small number of booths so that visitors may not need to booth recommender systems. In the future study, we will conduct experiments in the exhibition environment which has a larger scale.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

    • Park, Hyun-Jung;Shin, Kyung-Shik
      • Journal of Intelligence and Information Systems
      • /
      • v.20 no.3
      • /
      • pp.19-43
      • /
      • 2014
    • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.

    Intelligent VOC Analyzing System Using Opinion Mining (오피니언 마이닝을 이용한 지능형 VOC 분석시스템)

    • Kim, Yoosin;Jeong, Seung Ryul
      • Journal of Intelligence and Information Systems
      • /
      • v.19 no.3
      • /
      • pp.113-125
      • /
      • 2013
    • Every company wants to know customer's requirement and makes an effort to meet them. Cause that, communication between customer and company became core competition of business and that important is increasing continuously. There are several strategies to find customer's needs, but VOC (Voice of customer) is one of most powerful communication tools and VOC gathering by several channels as telephone, post, e-mail, website and so on is so meaningful. So, almost company is gathering VOC and operating VOC system. VOC is important not only to business organization but also public organization such as government, education institute, and medical center that should drive up public service quality and customer satisfaction. Accordingly, they make a VOC gathering and analyzing System and then use for making a new product and service, and upgrade. In recent years, innovations in internet and ICT have made diverse channels such as SNS, mobile, website and call-center to collect VOC data. Although a lot of VOC data is collected through diverse channel, the proper utilization is still difficult. It is because the VOC data is made of very emotional contents by voice or text of informal style and the volume of the VOC data are so big. These unstructured big data make a difficult to store and analyze for use by human. So that, the organization need to automatic collecting, storing, classifying and analyzing system for unstructured big VOC data. This study propose an intelligent VOC analyzing system based on opinion mining to classify the unstructured VOC data automatically and determine the polarity as well as the type of VOC. And then, the basis of the VOC opinion analyzing system, called domain-oriented sentiment dictionary is created and corresponding stages are presented in detail. The experiment is conducted with 4,300 VOC data collected from a medical website to measure the effectiveness of the proposed system and utilized them to develop the sensitive data dictionary by determining the special sentiment vocabulary and their polarity value in a medical domain. Through the experiment, it comes out that positive terms such as "칭찬, 친절함, 감사, 무사히, 잘해, 감동, 미소" have high positive opinion value, and negative terms such as "퉁명, 뭡니까, 말하더군요, 무시하는" have strong negative opinion. These terms are in general use and the experiment result seems to be a high probability of opinion polarity. Furthermore, the accuracy of proposed VOC classification model has been compared and the highest classification accuracy of 77.8% is conformed at threshold with -0.50 of opinion classification of VOC. Through the proposed intelligent VOC analyzing system, the real time opinion classification and response priority of VOC can be predicted. Ultimately the positive effectiveness is expected to catch the customer complains at early stage and deal with it quickly with the lower number of staff to operate the VOC system. It can be made available human resource and time of customer service part. Above all, this study is new try to automatic analyzing the unstructured VOC data using opinion mining, and shows that the system could be used as variable to classify the positive or negative polarity of VOC opinion. It is expected to suggest practical framework of the VOC analysis to diverse use and the model can be used as real VOC analyzing system if it is implemented as system. Despite experiment results and expectation, this study has several limits. First of all, the sample data is only collected from a hospital web-site. It means that the sentimental dictionary made by sample data can be lean too much towards on that hospital and web-site. Therefore, next research has to take several channels such as call-center and SNS, and other domain like government, financial company, and education institute.

    Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

    • Park, Yoon-Joo;Kim, Kyoung-jae
      • Journal of Intelligence and Information Systems
      • /
      • v.23 no.3
      • /
      • pp.29-44
      • /
      • 2017
    • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

    A Study on the Regional Characteristics of Broadband Internet Termination by Coupling Type using Spatial Information based Clustering (공간정보기반 클러스터링을 이용한 초고속인터넷 결합유형별 해지의 지역별 특성연구)

    • Park, Janghyuk;Park, Sangun;Kim, Wooju
      • Journal of Intelligence and Information Systems
      • /
      • v.23 no.3
      • /
      • pp.45-67
      • /
      • 2017
    • According to the Internet Usage Research performed in 2016, the number of internet users and the internet usage have been increasing. Smartphone, compared to the computer, is taking a more dominant role as an internet access device. As the number of smart devices have been increasing, some views that the demand on high-speed internet will decrease; however, Despite the increase in smart devices, the high-speed Internet market is expected to slightly increase for a while due to the speedup of Giga Internet and the growth of the IoT market. As the broadband Internet market saturates, telecom operators are over-competing to win new customers, but if they know the cause of customer exit, it is expected to reduce marketing costs by more effective marketing. In this study, we analyzed the relationship between the cancellation rates of telecommunication products and the factors affecting them by combining the data of 3 cities, Anyang, Gunpo, and Uiwang owned by a telecommunication company with the regional data from KOSIS(Korean Statistical Information Service). Especially, we focused on the assumption that the neighboring areas affect the distribution of the cancellation rates by coupling type, so we conducted spatial cluster analysis on the 3 types of cancellation rates of each region using the spatial analysis tool, SatScan, and analyzed the various relationships between the cancellation rates and the regional data. In the analysis phase, we first summarized the characteristics of the clusters derived by combining spatial information and the cancellation data. Next, based on the results of the cluster analysis, Variance analysis, Correlation analysis, and regression analysis were used to analyze the relationship between the cancellation rates data and regional data. Based on the results of analysis, we proposed appropriate marketing methods according to the region. Unlike previous studies on regional characteristics analysis, In this study has academic differentiation in that it performs clustering based on spatial information so that the regions with similar cancellation types on adjacent regions. In addition, there have been few studies considering the regional characteristics in the previous study on the determinants of subscription to high-speed Internet services, In this study, we tried to analyze the relationship between the clusters and the regional characteristics data, assuming that there are different factors depending on the region. In this study, we tried to get more efficient marketing method considering the characteristics of each region in the new subscription and customer management in high-speed internet. As a result of analysis of variance, it was confirmed that there were significant differences in regional characteristics among the clusters, Correlation analysis shows that there is a stronger correlation the clusters than all region. and Regression analysis was used to analyze the relationship between the cancellation rate and the regional characteristics. As a result, we found that there is a difference in the cancellation rate depending on the regional characteristics, and it is possible to target differentiated marketing each region. As the biggest limitation of this study and it was difficult to obtain enough data to carry out the analyze. In particular, it is difficult to find the variables that represent the regional characteristics in the Dong unit. In other words, most of the data was disclosed to the city rather than the Dong unit, so it was limited to analyze it in detail. The data such as income, card usage information and telecommunications company policies or characteristics that could affect its cause are not available at that time. The most urgent part for a more sophisticated analysis is to obtain the Dong unit data for the regional characteristics. Direction of the next studies be target marketing based on the results. It is also meaningful to analyze the effect of marketing by comparing and analyzing the difference of results before and after target marketing. It is also effective to use clusters based on new subscription data as well as cancellation data.

    Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

    • Park, Ho-yeon;Kim, Kyoung-jae
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.4
      • /
      • pp.141-154
      • /
      • 2019
    • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

    Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

    • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
      • Journal of Intelligence and Information Systems
      • /
      • v.26 no.2
      • /
      • pp.57-78
      • /
      • 2020
    • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

    A Study on the Eco-Cultural Assessment Indicator for Buddhist Temple Forest - Focused on Mt. Jogye Songgwang-sa Temple - (사찰림의 생태문화적 평가지표에 관한 연구 - 조계산 송광사를 중심으로 -)

    • Jang, Young-Whan;Koo, Bon-Hak
      • Journal of the Korean Institute of Traditional Landscape Architecture
      • /
      • v.37 no.2
      • /
      • pp.74-88
      • /
      • 2019
    • This study developed the Assessment Indicator evaluating eco-cultural value of temple forest in Korea and applied the developed Assessment Indicator to Songgwang-sa(also known as Seungbo-sachal), one of the Three Jewels Temple. Literature reviews and the draft of Assessment Indicator were drawn from brainstorming(including 2 forest therapy experts, 1 Buddhist monk expert, 1 landscape architect, 1 forest expert, and 6 researchers). After that, the Assessment Indicator drawn from the group of experts(the 1st in-depth interview: 32 people, the 2nd in-depth interview: 30 people) was verified and revised. The final Assessment Indicator, which was composed of 4 parts and 20 items, was developed. The results are as follows. The eco-cultural Assessment Indicator of temple forest was composed of 4 parts, which were Historical Cultural value, Ecological value, Recreatory Visitational value, and Educational Useful value, and 20 items and each item had 5 points. Historical Cultural value had 5 items and its total points were 25. Ecological value had 5 items and had total 25 points. Recreatory Visitational value had 6 items, 30 total points. Educational Useful value had 4 items, 20 total points. The total points of the eco-cultural Assessment Indicator were 100 points. As a result of applying the developed Assessment Indicator to the target place, Songgwang-sa in Mt. Jogye, Historical Cultural value of temple forest was calculated as 23 points(out of 25). Ecological value was 21 point(out of 25), Recreatory Visitational value, 22 points(out of 30), and Educational Useful value, 16 points(out of 20). The total points were 82(out of 100). Consequently, this study is meaningful based on the following 5 aspects. Firstly, this study challenged the development of the eco-cultural Assessment Indicator of temple forest for the first time. It is significant because the developed Assessment Indicator can be a useful resource for the eco-cultural value of temple forest. Secondly, the result showed that Educational Useful value and Recreatory Visitational value of forest temple were very low. Therefore, the supports for leisure, tour, education, and use of temple forest are needed from Korea Forest Service, Ministry of Environment, Cultural Heritage Administration and other government agencies since they acknowledge the temple forest as the best customers in Korea. Thirdly, the excellence or for eco-cultural value of temple forest needs to be extended in a national level. It is possible to make a Korean National Bran(e.g., the Therapy at the Temple) by blending temple stay, which is only in temples, and therapy, and is also possible to be a global tour industry. Fourthly, this study suggested legal definition about the necessary of legal definition for temple forest because there is no legal definition on temple forest in the current situation. When the definition of temple forest is legally arranaged, it would be a foundation for conserving eco-cultural value of temple forest, for organizing exclusively responsible departments in governmental institutions, and further for registering temple forest as World Natural Heritage. Lastly, the developed eco-cultural Assessment Indicators of temple forest from this study would be applied to "the 7 Sansa, Buddhist Mountain Monasteries in Korea(Sansa)" and the characteristics of each 7 temple are drawn. This study would be a basic data for temples' management and use with the eco-cultural Assessment Indicator of temple forest.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.