• Title/Summary/Keyword: Marketing Intelligence

Search Result 213, Processing Time 0.027 seconds

A Study on the Application of Outlier Analysis for Fraud Detection: Focused on Transactions of Auction Exception Agricultural Products (부정 탐지를 위한 이상치 분석 활용방안 연구 : 농수산 상장예외품목 거래를 대상으로)

  • Kim, Dongsung;Kim, Kitae;Kim, Jongwoo;Park, Steve
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.93-108
    • /
    • 2014
  • To support business decision making, interests and efforts to analyze and use transaction data in different perspectives are increasing. Such efforts are not only limited to customer management or marketing, but also used for monitoring and detecting fraud transactions. Fraud transactions are evolving into various patterns by taking advantage of information technology. To reflect the evolution of fraud transactions, there are many efforts on fraud detection methods and advanced application systems in order to improve the accuracy and ease of fraud detection. As a case of fraud detection, this study aims to provide effective fraud detection methods for auction exception agricultural products in the largest Korean agricultural wholesale market. Auction exception products policy exists to complement auction-based trades in agricultural wholesale market. That is, most trades on agricultural products are performed by auction; however, specific products are assigned as auction exception products when total volumes of products are relatively small, the number of wholesalers is small, or there are difficulties for wholesalers to purchase the products. However, auction exception products policy makes several problems on fairness and transparency of transaction, which requires help of fraud detection. In this study, to generate fraud detection rules, real huge agricultural products trade transaction data from 2008 to 2010 in the market are analyzed, which increase more than 1 million transactions and 1 billion US dollar in transaction volume. Agricultural transaction data has unique characteristics such as frequent changes in supply volumes and turbulent time-dependent changes in price. Since this was the first trial to identify fraud transactions in this domain, there was no training data set for supervised learning. So, fraud detection rules are generated using outlier detection approach. We assume that outlier transactions have more possibility of fraud transactions than normal transactions. The outlier transactions are identified to compare daily average unit price, weekly average unit price, and quarterly average unit price of product items. Also quarterly averages unit price of product items of the specific wholesalers are used to identify outlier transactions. The reliability of generated fraud detection rules are confirmed by domain experts. To determine whether a transaction is fraudulent or not, normal distribution and normalized Z-value concept are applied. That is, a unit price of a transaction is transformed to Z-value to calculate the occurrence probability when we approximate the distribution of unit prices to normal distribution. The modified Z-value of the unit price in the transaction is used rather than using the original Z-value of it. The reason is that in the case of auction exception agricultural products, Z-values are influenced by outlier fraud transactions themselves because the number of wholesalers is small. The modified Z-values are called Self-Eliminated Z-scores because they are calculated excluding the unit price of the specific transaction which is subject to check whether it is fraud transaction or not. To show the usefulness of the proposed approach, a prototype of fraud transaction detection system is developed using Delphi. The system consists of five main menus and related submenus. First functionalities of the system is to import transaction databases. Next important functions are to set up fraud detection parameters. By changing fraud detection parameters, system users can control the number of potential fraud transactions. Execution functions provide fraud detection results which are found based on fraud detection parameters. The potential fraud transactions can be viewed on screen or exported as files. The study is an initial trial to identify fraud transactions in Auction Exception Agricultural Products. There are still many remained research topics of the issue. First, the scope of analysis data was limited due to the availability of data. It is necessary to include more data on transactions, wholesalers, and producers to detect fraud transactions more accurately. Next, we need to extend the scope of fraud transaction detection to fishery products. Also there are many possibilities to apply different data mining techniques for fraud detection. For example, time series approach is a potential technique to apply the problem. Even though outlier transactions are detected based on unit prices of transactions, however it is possible to derive fraud detection rules based on transaction volumes.

Intelligent Brand Positioning Visualization System Based on Web Search Traffic Information : Focusing on Tablet PC (웹검색 트래픽 정보를 활용한 지능형 브랜드 포지셔닝 시스템 : 태블릿 PC 사례를 중심으로)

  • Jun, Seung-Pyo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.93-111
    • /
    • 2013
  • As Internet and information technology (IT) continues to develop and evolve, the issue of big data has emerged at the foreground of scholarly and industrial attention. Big data is generally defined as data that exceed the range that can be collected, stored, managed and analyzed by existing conventional information systems and it also refers to the new technologies designed to effectively extract values from such data. With the widespread dissemination of IT systems, continual efforts have been made in various fields of industry such as R&D, manufacturing, and finance to collect and analyze immense quantities of data in order to extract meaningful information and to use this information to solve various problems. Since IT has converged with various industries in many aspects, digital data are now being generated at a remarkably accelerating rate while developments in state-of-the-art technology have led to continual enhancements in system performance. The types of big data that are currently receiving the most attention include information available within companies, such as information on consumer characteristics, information on purchase records, logistics information and log information indicating the usage of products and services by consumers, as well as information accumulated outside companies, such as information on the web search traffic of online users, social network information, and patent information. Among these various types of big data, web searches performed by online users constitute one of the most effective and important sources of information for marketing purposes because consumers search for information on the internet in order to make efficient and rational choices. Recently, Google has provided public access to its information on the web search traffic of online users through a service named Google Trends. Research that uses this web search traffic information to analyze the information search behavior of online users is now receiving much attention in academia and in fields of industry. Studies using web search traffic information can be broadly classified into two fields. The first field consists of empirical demonstrations that show how web search information can be used to forecast social phenomena, the purchasing power of consumers, the outcomes of political elections, etc. The other field focuses on using web search traffic information to observe consumer behavior, identifying the attributes of a product that consumers regard as important or tracking changes on consumers' expectations, for example, but relatively less research has been completed in this field. In particular, to the extent of our knowledge, hardly any studies related to brands have yet attempted to use web search traffic information to analyze the factors that influence consumers' purchasing activities. This study aims to demonstrate that consumers' web search traffic information can be used to derive the relations among brands and the relations between an individual brand and product attributes. When consumers input their search words on the web, they may use a single keyword for the search, but they also often input multiple keywords to seek related information (this is referred to as simultaneous searching). A consumer performs a simultaneous search either to simultaneously compare two product brands to obtain information on their similarities and differences, or to acquire more in-depth information about a specific attribute in a specific brand. Web search traffic information shows that the quantity of simultaneous searches using certain keywords increases when the relation is closer in the consumer's mind and it will be possible to derive the relations between each of the keywords by collecting this relational data and subjecting it to network analysis. Accordingly, this study proposes a method of analyzing how brands are positioned by consumers and what relationships exist between product attributes and an individual brand, using simultaneous search traffic information. It also presents case studies demonstrating the actual application of this method, with a focus on tablets, belonging to innovative product groups.

Analysis of Football Fans' Uniform Consumption: Before and After Son Heung-Min's Transfer to Tottenham Hotspur FC (국내 프로축구 팬들의 유니폼 소비 분석: 손흥민의 토트넘 홋스퍼 FC 이적 전후 비교)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.91-108
    • /
    • 2020
  • Korea's famous soccer players are steadily performing well in international leagues, which led to higher interests of Korean fans in the international leagues. Reflecting the growing social phenomenon of rising interests on international leagues by Korean fans, the study examined the overall consumer perception in the consumption of uniform by domestic soccer fans and compared the changes in perception following the transfers of the players. Among others, the paper examined the consumer perception and purchase factors of soccer fans shown in social media, focusing on periods before and after the recruitment of Heung-Min Son to English Premier League's Tottenham Football Club. To this end, the EPL uniform is the collection keyword the paper utilized and collected consumer postings from domestic website and social media via Python 3.7, and analyzed them using Ucinet 6, NodeXL 1.0.1, and SPSS 25.0 programs. The results of this study can be summarized as follows. First, the uniform of the club that consistently topped the league, has been gaining attention as a popular uniform, and the players' performance, and the players' position have been identified as key factors in the purchase and search of professional football uniforms. In the case of the club, the actual ranking and whether the league won are shown to be important factors in the purchase and search of professional soccer uniforms. The club's emblem and the sponsor logo that will be attached to the uniform are also factors of interest to consumers. In addition, in the decision making process of purchase of a uniform by professional soccer fan, uniform's form, marking, authenticity, and sponsors are found to be more important than price, design, size, and logo. The official online store has emerged as a major purchasing channel, followed by gifts for friends or requests from acquaintances when someone travels to the United Kingdom. Second, a classification of key control categories through the convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm shows differences in the classification of individual groups, but groups that include the EPL's club and player keywords are identified as the key topics in relation to professional football uniforms. Third, between 2002 and 2006, the central theme for professional football uniforms was World Cup and English Premier League, but from 2012 to 2015, the focus has shifted to more interest of domestic and international players in the English Premier League. The subject has changed to the uniform itself from this time on. In this context, the paper can confirm that the major issues regarding the uniforms of professional soccer players have changed since Ji-Sung Park's transfer to Manchester United, and Sung-Yong Ki, Chung-Yong Lee, and Heung-Min Son's good performances in these leagues. The paper also identified that the uniforms of the clubs to which the players have transferred to are of interest. Fourth, both male and female consumers are showing increasing interest in Son's league, the English Premier League, which Tottenham FC belongs to. In particular, the increasing interest in Son has shown a tendency to increase interest in football uniforms for female consumers. This study presents a variety of researches on sports consumption and has value as a consumer study by identifying unique consumption patterns. It is meaningful in that the accuracy of the interpretation has been enhanced by using a cluster analysis via convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm to identify the main topics. Based on the results of this study, the clubs will be able to maximize its profits and maintain good relationships with fans by identifying key drivers of consumer awareness and purchasing for professional soccer fans and establishing an effective marketing strategy.

An Ontology Model for Public Service Export Platform (공공 서비스 수출 플랫폼을 위한 온톨로지 모형)

  • Lee, Gang-Won;Park, Sei-Kwon;Ryu, Seung-Wan;Shin, Dong-Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.149-161
    • /
    • 2014
  • The export of domestic public services to overseas markets contains many potential obstacles, stemming from different export procedures, the target services, and socio-economic environments. In order to alleviate these problems, the business incubation platform as an open business ecosystem can be a powerful instrument to support the decisions taken by participants and stakeholders. In this paper, we propose an ontology model and its implementation processes for the business incubation platform with an open and pervasive architecture to support public service exports. For the conceptual model of platform ontology, export case studies are used for requirements analysis. The conceptual model shows the basic structure, with vocabulary and its meaning, the relationship between ontologies, and key attributes. For the implementation and test of the ontology model, the logical structure is edited using Prot$\acute{e}$g$\acute{e}$ editor. The core engine of the business incubation platform is the simulator module, where the various contexts of export businesses should be captured, defined, and shared with other modules through ontologies. It is well-known that an ontology, with which concepts and their relationships are represented using a shared vocabulary, is an efficient and effective tool for organizing meta-information to develop structural frameworks in a particular domain. The proposed model consists of five ontologies derived from a requirements survey of major stakeholders and their operational scenarios: service, requirements, environment, enterprise, and county. The service ontology contains several components that can find and categorize public services through a case analysis of the public service export. Key attributes of the service ontology are composed of categories including objective, requirements, activity, and service. The objective category, which has sub-attributes including operational body (organization) and user, acts as a reference to search and classify public services. The requirements category relates to the functional needs at a particular phase of system (service) design or operation. Sub-attributes of requirements are user, application, platform, architecture, and social overhead. The activity category represents business processes during the operation and maintenance phase. The activity category also has sub-attributes including facility, software, and project unit. The service category, with sub-attributes such as target, time, and place, acts as a reference to sort and classify the public services. The requirements ontology is derived from the basic and common components of public services and target countries. The key attributes of the requirements ontology are business, technology, and constraints. Business requirements represent the needs of processes and activities for public service export; technology represents the technological requirements for the operation of public services; and constraints represent the business law, regulations, or cultural characteristics of the target country. The environment ontology is derived from case studies of target countries for public service operation. Key attributes of the environment ontology are user, requirements, and activity. A user includes stakeholders in public services, from citizens to operators and managers; the requirements attribute represents the managerial and physical needs during operation; the activity attribute represents business processes in detail. The enterprise ontology is introduced from a previous study, and its attributes are activity, organization, strategy, marketing, and time. The country ontology is derived from the demographic and geopolitical analysis of the target country, and its key attributes are economy, social infrastructure, law, regulation, customs, population, location, and development strategies. The priority list for target services for a certain country and/or the priority list for target countries for a certain public services are generated by a matching algorithm. These lists are used as input seeds to simulate the consortium partners, and government's policies and programs. In the simulation, the environmental differences between Korea and the target country can be customized through a gap analysis and work-flow optimization process. When the process gap between Korea and the target country is too large for a single corporation to cover, a consortium is considered an alternative choice, and various alternatives are derived from the capability index of enterprises. For financial packages, a mix of various foreign aid funds can be simulated during this stage. It is expected that the proposed ontology model and the business incubation platform can be used by various participants in the public service export market. It could be especially beneficial to small and medium businesses that have relatively fewer resources and experience with public service export. We also expect that the open and pervasive service architecture in a digital business ecosystem will help stakeholders find new opportunities through information sharing and collaboration on business processes.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Individual Thinking Style leads its Emotional Perception: Development of Web-style Design Evaluation Model and Recommendation Algorithm Depending on Consumer Regulatory Focus (사고가 시각을 바꾼다: 조절 초점에 따른 소비자 감성 기반 웹 스타일 평가 모형 및 추천 알고리즘 개발)

  • Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.171-196
    • /
    • 2018
  • With the development of the web, two-way communication and evaluation became possible and marketing paradigms shifted. In order to meet the needs of consumers, web design trends are continuously responding to consumer feedback. As the web becomes more and more important, both academics and businesses are studying consumer emotions and satisfaction on the web. However, some consumer characteristics are not well considered. Demographic characteristics such as age and sex have been studied extensively, but few studies consider psychological characteristics such as regulatory focus (i.e., emotional regulation). In this study, we analyze the effect of web style on consumer emotion. Many studies analyze the relationship between the web and regulatory focus, but most concentrate on the purpose of web use, particularly motivation and information search, rather than on web style and design. The web communicates with users through visual elements. Because the human brain is influenced by all five senses, both design factors and emotional responses are important in the web environment. Therefore, in this study, we examine the relationship between consumer emotion and satisfaction and web style and design. Previous studies have considered the effects of web layout, structure, and color on emotions. In this study, however, we excluded these web components, in contrast to earlier studies, and analyzed the relationship between consumer satisfaction and emotional indexes of web-style only. To perform this analysis, we collected consumer surveys presenting 40 web style themes to 204 consumers. Each consumer evaluated four themes. The emotional adjectives evaluated by consumers were composed of 18 contrast pairs, and the upper emotional indexes were extracted through factor analysis. The emotional indexes were 'softness,' 'modernity,' 'clearness,' and 'jam.' Hypotheses were established based on the assumption that emotional indexes have different effects on consumer satisfaction. After the analysis, hypotheses 1, 2, and 3 were accepted and hypothesis 4 was rejected. While hypothesis 4 was rejected, its effect on consumer satisfaction was negative, not positive. This means that emotional indexes such as 'softness,' 'modernity,' and 'clearness' have a positive effect on consumer satisfaction. In other words, consumers prefer emotions that are soft, emotional, natural, rounded, dynamic, modern, elaborate, unique, bright, pure, and clear. 'Jam' has a negative effect on consumer satisfaction. It means, consumer prefer the emotion which is empty, plain, and simple. Regulatory focus shows differences in motivation and propensity in various domains. It is important to consider organizational behavior and decision making according to the regulatory focus tendency, and it affects not only political, cultural, ethical judgments and behavior but also broad psychological problems. Regulatory focus also differs from emotional response. Promotion focus responds more strongly to positive emotional responses. On the other hand, prevention focus has a strong response to negative emotions. Web style is a type of service, and consumer satisfaction is affected not only by cognitive evaluation but also by emotion. This emotional response depends on whether the consumer will benefit or harm himself. Therefore, it is necessary to confirm the difference of the consumer's emotional response according to the regulatory focus which is one of the characteristics and viewpoint of the consumers about the web style. After MMR analysis result, hypothesis 5.3 was accepted, and hypothesis 5.4 was rejected. But hypothesis 5.4 supported in the opposite direction to the hypothesis. After validation, we confirmed the mechanism of emotional response according to the tendency of regulatory focus. Using the results, we developed the structure of web-style recommendation system and recommend methods through regulatory focus. We classified the regulatory focus group in to three categories that promotion, grey, prevention. Then, we suggest web-style recommend method along the group. If we further develop this study, we expect that the existing regulatory focus theory can be extended not only to the motivational part but also to the emotional behavioral response according to the regulatory focus tendency. Moreover, we believe that it is possible to recommend web-style according to regulatory focus and emotional desire which consumers most prefer.

Word-of-Mouth Effect for Online Sales of K-Beauty Products: Centered on China SINA Weibo and Meipai (K-Beauty 구전효과가 온라인 매출액에 미치는 영향: 중국 SINA Weibo와 Meipai 중심으로)

  • Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.197-218
    • /
    • 2019
  • In addition to economic growth and national income increase, China is also experiencing rapid growth in consumption of cosmetics. About 67% of the total trade volume of Chinese cosmetics is made by e-commerce and especially K-Beauty products, which are Korean cosmetics are very popular. According to previous studies, 80% of consumer goods such as cosmetics are affected by the word of mouth information, searching the product information before purchase. Mostly, consumers acquire information related to cosmetics through comments made by other consumers on SNS such as SINA Weibo and Wechat, and recently they also use information about beauty related video channels. Most of the previous online word-of-mouth researches were mainly focused on media itself such as Facebook, Twitter, and blogs. However, the informational characteristics and the expression forms are also diverse. Typical types are text, picture, and video. This study focused on these types. We analyze the unstructured data of SINA Weibo, the SNS representative platform of China, and Meipai, the video platform, and analyze the impact of K-Beauty brand sales by dividing online word-of-mouth information with quantity and direction information. We analyzed about 330,000 data from Meipai, and 110,000 data from SINA Weibo and analyzed the basic properties of cosmetics. As a result of analysis, the amount of online word-of-mouth information has a positive effect on the sales of cosmetics irrespective of the type of media. However, the online videos showed higher impacts than the pictures and texts. Therefore, it is more effective for companies to carry out advertising and promotional activities in parallel with the existing SNS as well as video related information. It is understood that it is important to generate the frequency of exposure irrespective of media type. The positiveness of the video media was significant but the positiveness of the picture and text media was not significant. Due to the nature of information types, the amount of information in video media is more than that in text-oriented media, and video-related channels are emerging all over the world. In particular, China has made a number of video platforms in recent years and has enjoyed popularity among teenagers and thirties. As a result, existing SNS users are being dispersed to video media. We also analyzed the effect of online type of information on the online cosmetics sales by dividing the product type of cosmetics into basic cosmetics and color cosmetics. As a result, basic cosmetics had a positive effect on the sales according to the number of online videos and it was affected by the negative information of the videos. In the case of basic cosmetics, effects or characteristics do not appear immediately like color cosmetics, so information such as changes after use is often transmitted over a period of time. Therefore, it is important for companies to move more quickly to issues generated from video media. Color cosmetics are largely influenced by negative oral statements and sensitive to picture and text-oriented media. Information such as picture and text has the advantage and disadvantage that the process of making it can be made easier than video. Therefore, complaints and opinions are generally expressed in SNS quickly and immediately. Finally, we analyzed how product diversity affects sales according to online word of mouth information type. As a result of the analysis, it can be confirmed that when a variety of products are introduced in a video channel, they have a positive effect on online cosmetics sales. The significance of this study in the theoretical aspect is that, as in the previous studies, online sales have basically proved that K-Beauty cosmetics are also influenced by word-of-mouth. However this study focused on media types and both media have a positive impact on sales, as in previous studies, but it has been proven that video is more informative and influencing than text, depending on media abundance. In addition, according to the existing research on information direction, it is said that the negative influence has more influence, but in the basic study, the correlation is not significant, but the effect of negation in the case of color cosmetics is large. In the case of temporal fashion products such as color cosmetics, fast oral effect is influenced. In practical terms, it is expected that it will be helpful to use advertising strategies on the sales and advertising strategy of K-Beauty cosmetics in China by distinguishing basic and color cosmetics. In addition, it can be said that it recognized the importance of a video advertising strategy such as YouTube and one-person media. The results of this study can be used as basic data for analyzing the big data in understanding the Chinese cosmetics market and establishing appropriate strategies and marketing utilization of related companies.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.