• Title/Summary/Keyword: 연관마이닝

Search Result 486, Processing Time 0.027 seconds

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

UX Methodology Study by Data Analysis Focusing on deriving persona through customer segment classification (데이터 분석을 통한 UX 방법론 연구 고객 세그먼트 분류를 통한 페르소나 도출을 중심으로)

  • Lee, Seul-Yi;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.151-176
    • /
    • 2021
  • As the information technology industry develops, various kinds of data are being created, and it is now essential to process them and use them in the industry. Analyzing and utilizing various digital data collected online and offline is a necessary process to provide an appropriate experience for customers in the industry. In order to create new businesses, products, and services, it is essential to use customer data collected in various ways to deeply understand potential customers' needs and analyze behavior patterns to capture hidden signals of desire. However, it is true that research using data analysis and UX methodology, which should be conducted in parallel for effective service development, is being conducted separately and that there is a lack of examples of use in the industry. In thiswork, we construct a single process by applying data analysis methods and UX methodologies. This study is important in that it is highly likely to be used because it applies methodologies that are actively used in practice. We conducted a survey on the topic to identify and cluster the associations between factors to establish customer classification and target customers. The research methods are as follows. First, we first conduct a factor, regression analysis to determine the association between factors in the happiness data survey. Groups are grouped according to the survey results and identify the relationship between 34 questions of psychological stability, family life, relational satisfaction, health, economic satisfaction, work satisfaction, daily life satisfaction, and residential environment satisfaction. Second, we classify clusters based on factors affecting happiness and extract the optimal number of clusters. Based on the results, we cross-analyzed the characteristics of each cluster. Third, forservice definition, analysis was conducted by correlating with keywords related to happiness. We leverage keyword analysis of the thumb trend to derive ideas based on the interest and associations of the keyword. We also collected approximately 11,000 news articles based on the top three keywords that are highly related to happiness, then derived issues between keywords through text mining analysis in SAS, and utilized them in defining services after ideas were conceived. Fourth, based on the characteristics identified through data analysis, we selected segmentation and targetingappropriate for service discovery. To this end, the characteristics of the factors were grouped and selected into four groups, and the profile was drawn up and the main target customers were selected. Fifth, based on the characteristics of the main target customers, interviewers were selected and the In-depthinterviews were conducted to discover the causes of happiness, causes of unhappiness, and needs for services. Sixth, we derive customer behavior patterns based on segment results and detailed interviews, and specify the objectives associated with the characteristics. Seventh, a typical persona using qualitative surveys and a persona using data were produced to analyze each characteristic and pros and cons by comparing the two personas. Existing market segmentation classifies customers based on purchasing factors, and UX methodology measures users' behavior variables to establish criteria and redefine users' classification. Utilizing these segment classification methods, applying the process of producinguser classification and persona in UX methodology will be able to utilize them as more accurate customer classification schemes. The significance of this study is summarized in two ways: First, the idea of using data to create a variety of services was linked to the UX methodology used to plan IT services by applying it in the hot topic era. Second, we further enhance user classification by applying segment analysis methods that are not currently used well in UX methodologies. To provide a consistent experience in creating a single service, from large to small, it is necessary to define customers with common goals. To this end, it is necessary to derive persona and persuade various stakeholders. Under these circumstances, designing a consistent experience from beginning to end, through fast and concrete user descriptions, would be a very effective way to produce a successful service.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.

International Research Trend on Mountainous Sediment-related Disasters Induced by Earthquakes (지진 유발 산지토사재해 관련 국외 연구동향 분석)

  • Lee, Sang-In;Seo, Jung-Il;Kim, Jin-Hak;Ryu, Dong-Seop;Seo, Jun-Pyo;Kim, Dong-Yeob;Lee, Chang-Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.4
    • /
    • pp.431-440
    • /
    • 2017
  • The 2016 Gyeongju Earthquake ($M_L$ 5.8) (occurred on September 12, 2016) and the 2017 Pohang Earthquake ($M_L$ 5.4) (occurred on November 15, 2017) caused unprecedented damages in South Korea. It is necessary to establish basic data related to earthquake-induced mountainous sediment-related disasters over worldwide. In this study, we analyzed previous international studies on the earthquake-induced mountainous sediment-related disasters, then classified research areas according to research themes using text-mining and co-word analysis in VOSviewer program, and finally examined spatio-temporal research trends by research area. The result showed that the related-researches have been rapidly increased since 2005, which seems to be affected by recent large-scale earthquakes occurred in China, Taiwan and Japan. In addition, the research area related to mountainous sediment-related disasters induced by earthquakes was classified into four subjects: (i) mechanisms of disaster occurrence; (ii) rainfall parameters controlling disaster occurrence; (iii) prediction of potential disaster area using aerial and satellite photographs; and (iv) disaster risk mapping through the modeling of disaster occurrence. These research areas are considered to have a strong correlation with each other. On the threshold year (i.e., 2012-2013), when cumulative number of research papers was reached 50% of total research papers published since 1987, proportions per unit year of all research areas should increase. Especially, the proportion of the research areas related to prediction of potential disaster area using aerial and satellite photographs is highly increased compared to other three research areas. These trends are responsible for the rapidly increasing research papers with study sites in China, and the research papers examined in Taiwan, Japan, and the United States have also contributed to increases in all research areas. The results are could be used as basic data to present future research direction related to mountainous sediment-related disasters induced by earthquakes in South Korea.