• Title/Summary/Keyword: Business Item

Search Result 552, Processing Time 0.029 seconds

Enhancing Predictive Accuracy of Collaborative Filtering Algorithms using the Network Analysis of Trust Relationship among Users (사용자 간 신뢰관계 네트워크 분석을 활용한 협업 필터링 알고리즘의 예측 정확도 개선)

  • Choi, Seulbi;Kwahk, Kee-Young;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.113-127
    • /
    • 2016
  • Among the techniques for recommendation, collaborative filtering (CF) is commonly recognized to be the most effective for implementing recommender systems. Until now, CF has been popularly studied and adopted in both academic and real-world applications. The basic idea of CF is to create recommendation results by finding correlations between users of a recommendation system. CF system compares users based on how similar they are, and recommend products to users by using other like-minded people's results of evaluation for each product. Thus, it is very important to compute evaluation similarities among users in CF because the recommendation quality depends on it. Typical CF uses user's explicit numeric ratings of items (i.e. quantitative information) when computing the similarities among users in CF. In other words, user's numeric ratings have been a sole source of user preference information in traditional CF. However, user ratings are unable to fully reflect user's actual preferences from time to time. According to several studies, users may more actively accommodate recommendation of reliable others when purchasing goods. Thus, trust relationship can be regarded as the informative source for identifying user's preference with accuracy. Under this background, we propose a new hybrid recommender system that fuses CF and social network analysis (SNA). The proposed system adopts the recommendation algorithm that additionally reflect the result analyzed by SNA. In detail, our proposed system is based on conventional memory-based CF, but it is designed to use both user's numeric ratings and trust relationship information between users when calculating user similarities. For this, our system creates and uses not only user-item rating matrix, but also user-to-user trust network. As the methods for calculating user similarity between users, we proposed two alternatives - one is algorithm calculating the degree of similarity between users by utilizing in-degree and out-degree centrality, which are the indices representing the central location in the social network. We named these approaches as 'Trust CF - All' and 'Trust CF - Conditional'. The other alternative is the algorithm reflecting a neighbor's score higher when a target user trusts the neighbor directly or indirectly. The direct or indirect trust relationship can be identified by searching trust network of users. In this study, we call this approach 'Trust CF - Search'. To validate the applicability of the proposed system, we used experimental data provided by LibRec that crawled from the entire FilmTrust website. It consists of ratings of movies and trust relationship network indicating who to trust between users. The experimental system was implemented using Microsoft Visual Basic for Applications (VBA) and UCINET 6. To examine the effectiveness of the proposed system, we compared the performance of our proposed method with one of conventional CF system. The performances of recommender system were evaluated by using average MAE (mean absolute error). The analysis results confirmed that in case of applying without conditions the in-degree centrality index of trusted network of users(i.e. Trust CF - All), the accuracy (MAE = 0.565134) was lower than conventional CF (MAE = 0.564966). And, in case of applying the in-degree centrality index only to the users with the out-degree centrality above a certain threshold value(i.e. Trust CF - Conditional), the proposed system improved the accuracy a little (MAE = 0.564909) compared to traditional CF. However, the algorithm searching based on the trusted network of users (i.e. Trust CF - Search) was found to show the best performance (MAE = 0.564846). And the result from paired samples t-test presented that Trust CF - Search outperformed conventional CF with 10% statistical significance level. Our study sheds a light on the application of user's trust relationship network information for facilitating electronic commerce by recommending proper items to users.

Registration and Description of Public Records in Korea : A Comparative Analysis of Korean Recordskeeping System with the International Standards (한국의 기록물 둥록 및 기술에 대한 기록관리적 접근)

  • Si, Kwi-Sun
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.3 no.1
    • /
    • pp.69-92
    • /
    • 2003
  • Registration and description of records are important elements of processing which provide with the background information of production of records and business-related information. They also enable to search and use the records. In this paper, I examined the Korean registration and description system defined in the Public Records Management Act which directs the records creating agency to register records in creating offices and directs the "professional archives" to make "basic registrations" and "detailed registrations" of the records. In the analysis and comparison of two different registration and description systems with the known international standards of records and archives management, such as ISO15489 and ISAD(G), I intended to evaluate the Korean records and archives management system and suggested recommendations for the renovation of the Korean recordskeeping system. Despite we have unique office business procedures and the culture of officialdom, and despite we have developed our system based on the established business procedures and office culture, it would be preferable to adopt or follow the international standards and established best practices. After the comparative analysis, I recommended some innovations in the filed of registration and description. For instance, in the basic registration. we would better to install an item of "simple contents summary." We may also need the multiple-level description. The fonds level description and the series level description should be introduced to our archival automated management system. We need to establish a Korean standard of description adopting the rules of the ISAD(G) and ISAAR(CPF). Essential requirements for electronic records management, such as contextual and structural information, should be incorporated in the new standard. Documentation of records disposition also should be reinforced to guarantee the authenticity of records and to ensure control of the records. To implement the recommendations for the standard, we need to amend the Public Records Management Act and its Regulations and Rules. Also it is imperative to redesign the GARS integrated archival automated management system.

A Study on the Performance Evaluation of G2B Procurement Process Innovation by Using MAS: Korea G2B KONEPS Case (멀티에이전트시스템(MAS)을 이용한 G2B 조달 프로세스 혁신의 효과평가에 관한 연구 : 나라장터 G2B사례)

  • Seo, Won-Jun;Lee, Dae-Cheor;Lim, Gyoo-Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.157-175
    • /
    • 2012
  • It is difficult to evaluate the performance of process innovation of e-procurement which has large scale and complex processes. The existing evaluation methods for measuring the effects of process innovation have been mainly done with statistically quantitative methods by analyzing operational data or with qualitative methods by conducting surveys and interviews. However, these methods have some limitations to evaluate the effects because the performance evaluation of e-procurement process innovation should consider the interactions among participants who are active either directly or indirectly through the processes. This study considers the e-procurement process as a complex system and develops a simulation model based on MAS(Multi-Agent System) to evaluate the effects of e-procurement process innovation. Multi-agent based simulation allows observing interaction patterns of objects in virtual world through relationship among objects and their behavioral mechanism. Agent-based simulation is suitable especially for complex business problems. In this study, we used Netlogo Version 4.1.3 as a MAS simulation tool which was developed in Northwestern University. To do this, we developed a interaction model of agents in MAS environment. We defined process agents and task agents, and assigned their behavioral characteristics. The developed simulation model was applied to G2B system (KONEPS: Korea ON-line E-Procurement System) of Public Procurement Service (PPS) in Korea and used to evaluate the innovation effects of the G2B system. KONEPS is a successfully established e-procurement system started in the year 2002. KONEPS is a representative e-Procurement system which integrates characteristics of e-commerce into government for business procurement activities. KONEPS deserves the international recognition considering the annual transaction volume of 56 billion dollars, daily exchanges of electronic documents, users consisted of 121,000 suppliers and 37,000 public organizations, and the 4.5 billion dollars of cost saving. For the simulation, we analyzed the e-procurement of process of KONEPS into eight sub processes such as 'process 1: search products and acquisition of proposal', 'process 2 : review the methods of contracts and item features', 'process 3 : a notice of bid', 'process 4 : registration and confirmation of qualification', 'process 5 : bidding', 'process 6 : a screening test', 'process 7 : contracts', and 'process 8 : invoice and payment'. For the parameter settings of the agents behavior, we collected some data from the transactional database of PPS and some information by conducting a survey. The used data for the simulation are 'participants (government organizations, local government organizations and public institutions)', 'the number of bidding per year', 'the number of total contracts', 'the number of shopping mall transactions', 'the rate of contracts between bidding and shopping mall', 'the successful bidding ratio', and the estimated time for each process. The comparison was done for the difference of time consumption between 'before the innovation (As-was)' and 'after the innovation (As-is).' The results showed that there were productivity improvements in every eight sub processes. The decrease ratio of 'average number of task processing' was 92.7% and the decrease ratio of 'average time of task processing' was 95.4% in entire processes when we use G2B system comparing to the conventional method. Also, this study found that the process innovation effect will be enhanced if the task process related to the 'contract' can be improved. This study shows the usability and possibility of using MAS in process innovation evaluation and its modeling.

Text Mining and Association Rules Analysis to a Self-Introduction Letter of Freshman at Korea National College of Agricultural and Fisheries (1) (한국농수산대학 신입생 자기소개서의 텍스트 마이닝과 연관규칙 분석 (1))

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.22 no.1
    • /
    • pp.113-129
    • /
    • 2020
  • In this study we examined the topic analysis and correlation analysis by text mining to extract meaningful information or rules from the self introduction letter of freshman at Korea National College of Agriculture and Fisheries in 2020. The analysis items are described in items related to 'academic' and 'in-school activities' during high school. In the text mining results, the keywords of 'academic' items were 'study', 'thought', 'effort', 'problem', 'friend', and the key words of 'in-school activities' were 'activity', 'thought', 'friend', 'club', 'school' in order. As a result of the correlation analysis, the key words of 'thinking', 'studying', 'effort', and 'time' played a central role in the 'academic' item. And the key words of 'in-school activities' were 'thought', 'activity', 'school', 'time', and 'friend'. The results of frequency analysis and association analysis were visualized with word cloud and correlation graphs to make it easier to understand all the results. In the next study, TF-IDF(Term Frequency-Inverse Document Frequency) analysis using 'frequency of keywords' and 'reverse of document frequency' will be performed as a method of extracting key words from a large amount of documents.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • A Regression-Model-based Method for Combining Interestingness Measures of Association Rule Mining (연관상품 추천을 위한 회귀분석모형 기반 연관 규칙 척도 결합기법)

    • Lee, Dongwon
      • Journal of Intelligence and Information Systems
      • /
      • v.23 no.1
      • /
      • pp.127-141
      • /
      • 2017
    • Advances in Internet technologies and the proliferation of mobile devices enabled consumers to approach a wide range of goods and services, while causing an adverse effect that they have hard time reaching their congenial items even if they devote much time to searching for them. Accordingly, businesses are using the recommender systems to provide tools for consumers to find the desired items more easily. Association Rule Mining (ARM) technology is advantageous to recommender systems in that ARM provides intuitive form of a rule with interestingness measures (support, confidence, and lift) describing the relationship between items. Given an item, its relevant items can be distinguished with the help of the measures that show the strength of relationship between items. Based on the strength, the most pertinent items can be chosen among other items and exposed to a given item's web page. However, the diversity of the measures may confuse which items are more recommendable. Given two rules, for example, one rule's support and confidence may not be concurrently superior to the other rule's. Such discrepancy of the measures in distinguishing one rule's superiority from other rules may cause difficulty in selecting proper items for recommendation. In addition, in an online environment where a web page or mobile screen can provide a limited number of recommendations that attract consumer interest, the prudent selection of items to be included in the list of recommendations is very important. The exposure of items of little interest may lead consumers to ignore the recommendations. Then, such consumers will possibly not pay attention to other forms of marketing activities. Therefore, the measures should be aligned with the probability of consumer's acceptance of recommendations. For this reason, this study proposes a model-based approach to combine those measures into one unified measure that can consistently determine the ranking of recommended items. A regression model was designed to describe how well the measures (independent variables; i.e., support, confidence, and lift) explain consumer's acceptance of recommendations (dependent variables, hit rate of recommended items). The model is intuitive to understand and easy to use in that the equation consists of the commonly used measures for ARM and can be used in the estimation of hit rates. The experiment using transaction data from one of the Korea's largest online shopping malls was conducted to show that the proposed model can improve the hit rates of recommendations. From the top of the list to 13th place, recommended items in the higher rakings from the proposed model show the higher hit rates than those from the competitive model's. The result shows that the proposed model's performance is superior to the competitive model's in online recommendation environment. In a web page, consumers are provided around ten recommendations with which the proposed model outperforms. Moreover, a mobile device cannot expose many items simultaneously due to its limited screen size. Therefore, the result shows that the newly devised recommendation technique is suitable for the mobile recommender systems. While this study has been conducted to cover the cross-selling in online shopping malls that handle merchandise, the proposed method can be expected to be applied in various situations under which association rules apply. For example, this model can be applied to medical diagnostic systems that predict candidate diseases from a patient's symptoms. To increase the efficiency of the model, additional variables will need to be considered for the elaboration of the model in future studies. For example, price can be a good candidate for an explanatory variable because it has a major impact on consumer purchase decisions. If the prices of recommended items are much higher than the items in which a consumer is interested, the consumer may hesitate to accept the recommendations.

    Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

    • Kim, Mingun;Kim, Kyoung-Jae
      • Journal of Intelligence and Information Systems
      • /
      • v.20 no.4
      • /
      • pp.107-120
      • /
      • 2014
    • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

    A Study on the Competencies of Automotive Professional Engineers in Korea (자동차 신제품개발 관련 차량기술사의 전문적 업무역량 분석)

    • Kim, Joo-Young;Lim, Se-Yung
      • 대한공업교육학회지
      • /
      • v.33 no.2
      • /
      • pp.192-217
      • /
      • 2008
    • This paper investigated the perceived criticalities and patterns of Korean Professional Engineer's competency regarding the working activities of automative product development, manufacturing, etc by using questionnaires responded to the survey which were applied to the automotive professors, experts and professional engineers (vocational parties) by e/mail, etc. This research investigated the following questions: First, what are the characteristic patterns, relevancy and perceived criticalities of Korean Professional Engineer's competencies? Second, What are the ranked priority of the Korean Professional Engineers' competencies? Are there any differency for each item, sub group of job, intelectual criterior of the competencies between relevancy and perceived criticalities according to the types of vocational parties, etc.? Accoring to the results; first, Professor group showed highest points among 3 groups per each item of the competencies by vocational parties Second, Chassis design group ranked top position among the 8 sub groups by vocational parties and, third, Problem Solving Knowledge ranked highest points than any others. Korean Professional Engineers are found to be positioned as key members, leaders and managers on surveying market, product planning, designing product & components, developing component parts, establishing shop with production equipment, managing quality control & material handling, organizing relevant meetings, developing human resources by training and learning, to back up finance with law matters, cooperating with concerned parties to achieve organizational goals, and to coordinate projects. etc, identifying ethical issues and business skills in order to survive and win to be competitive in various kinds of the automotive industry battle fields.

    Development of the forecasting model for import volume by item of major countries based on economic, industrial structural and cultural factors: Focusing on the cultural factors of Korea (경제적, 산업구조적, 문화적 요인을 기반으로 한 주요 국가의 한국 품목별 수입액 예측 모형 개발: 한국의, 한국에 대한 문화적 요인을 중심으로)

    • Jun, Seung-pyo;Seo, Bong-Goon;Park, Do-Hyung
      • Journal of Intelligence and Information Systems
      • /
      • v.27 no.4
      • /
      • pp.23-48
      • /
      • 2021
    • The Korean economy has achieved continuous economic growth for the past several decades thanks to the government's export strategy policy. This increase in exports is playing a leading role in driving Korea's economic growth by improving economic efficiency, creating jobs, and promoting technology development. Traditionally, the main factors affecting Korea's exports can be found from two perspectives: economic factors and industrial structural factors. First, economic factors are related to exchange rates and global economic fluctuations. The impact of the exchange rate on Korea's exports depends on the exchange rate level and exchange rate volatility. Global economic fluctuations affect global import demand, which is an absolute factor influencing Korea's exports. Second, industrial structural factors are unique characteristics that occur depending on industries or products, such as slow international division of labor, increased domestic substitution of certain imported goods by China, and changes in overseas production patterns of major export industries. Looking at the most recent studies related to global exchanges, several literatures show the importance of cultural aspects as well as economic and industrial structural factors. Therefore, this study attempted to develop a forecasting model by considering cultural factors along with economic and industrial structural factors in calculating the import volume of each country from Korea. In particular, this study approaches the influence of cultural factors on imports of Korean products from the perspective of PUSH-PULL framework. The PUSH dimension is a perspective that Korea develops and actively promotes its own brand and can be defined as the degree of interest in each country for Korean brands represented by K-POP, K-FOOD, and K-CULTURE. In addition, the PULL dimension is a perspective centered on the cultural and psychological characteristics of the people of each country. This can be defined as how much they are inclined to accept Korean Flow as each country's cultural code represented by the country's governance system, masculinity, risk avoidance, and short-term/long-term orientation. The unique feature of this study is that the proposed final prediction model can be selected based on Design Principles. The design principles we presented are as follows. 1) A model was developed to reflect interest in Korea and cultural characteristics through newly added data sources. 2) It was designed in a practical and convenient way so that the forecast value can be immediately recalled by inputting changes in economic factors, item code and country code. 3) In order to derive theoretically meaningful results, an algorithm was selected that can interpret the relationship between the input and the target variable. This study can suggest meaningful implications from the technical, economic and policy aspects, and is expected to make a meaningful contribution to the export support strategies of small and medium-sized enterprises by using the import forecasting model.

    Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

    • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
      • Journal of Intelligence and Information Systems
      • /
      • v.26 no.1
      • /
      • pp.97-117
      • /
      • 2020
    • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.