Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)
-
- Journal of Intelligence and Information Systems
- /
- v.26 no.2
- /
- pp.57-78
- /
- 2020
With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.
This study was designed to examine the performance of an aspirated radiation shield(ARS), which was made at the investigator's lab and characterized by relatively easier making and lower costs based on survey data and reports on errors in its measurements of temperature and relative humidity. The findings were summarized as follows: the ARS and the Jinju weather station made measurements and recorded the range of maximum, average, and minimum temperature at
Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.
Up to this day, mobile communications have evolved rapidly over the decades, mainly focusing on speed-up to meet the growing data demands of 2G to 5G. And with the start of the 5G era, efforts are being made to provide such various services to customers, as IoT, V2X, robots, artificial intelligence, augmented virtual reality, and smart cities, which are expected to change the environment of our lives and industries as a whole. In a bid to provide those services, on top of high speed data, reduced latency and reliability are critical for real-time services. Thus, 5G has paved the way for service delivery through maximum speed of 20Gbps, a delay of 1ms, and a connecting device of 106/㎢ In particular, in intelligent traffic control systems and services using various vehicle-based Vehicle to X (V2X), such as traffic control, in addition to high-speed data speed, reduction of delay and reliability for real-time services are very important. 5G communication uses high frequencies of 3.5Ghz and 28Ghz. These high-frequency waves can go with high-speed thanks to their straightness while their short wavelength and small diffraction angle limit their reach to distance and prevent them from penetrating walls, causing restrictions on their use indoors. Therefore, under existing networks it's difficult to overcome these constraints. The underlying centralized SDN also has a limited capability in offering delay-sensitive services because communication with many nodes creates overload in its processing. Basically, SDN, which means a structure that separates signals from the control plane from packets in the data plane, requires control of the delay-related tree structure available in the event of an emergency during autonomous driving. In these scenarios, the network architecture that handles in-vehicle information is a major variable of delay. Since SDNs in general centralized structures are difficult to meet the desired delay level, studies on the optimal size of SDNs for information processing should be conducted. Thus, SDNs need to be separated on a certain scale and construct a new type of network, which can efficiently respond to dynamically changing traffic and provide high-quality, flexible services. Moreover, the structure of these networks is closely related to ultra-low latency, high confidence, and hyper-connectivity and should be based on a new form of split SDN rather than an existing centralized SDN structure, even in the case of the worst condition. And in these SDN structural networks, where automobiles pass through small 5G cells very quickly, the information change cycle, round trip delay (RTD), and the data processing time of SDN are highly correlated with the delay. Of these, RDT is not a significant factor because it has sufficient speed and less than 1 ms of delay, but the information change cycle and data processing time of SDN are factors that greatly affect the delay. Especially, in an emergency of self-driving environment linked to an ITS(Intelligent Traffic System) that requires low latency and high reliability, information should be transmitted and processed very quickly. That is a case in point where delay plays a very sensitive role. In this paper, we study the SDN architecture in emergencies during autonomous driving and conduct analysis through simulation of the correlation with the cell layer in which the vehicle should request relevant information according to the information flow. For simulation: As the Data Rate of 5G is high enough, we can assume the information for neighbor vehicle support to the car without errors. Furthermore, we assumed 5G small cells within 50 ~ 250 m in cell radius, and the maximum speed of the vehicle was considered as a 30km ~ 200 km/hour in order to examine the network architecture to minimize the delay.
From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (