• Title/Summary/Keyword: Big Data Clustering

Search Result 146, Processing Time 0.026 seconds

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

Trend Analysis of Corona Virus(COVID-19) based on Social Media (소셜미디어에 나타난 코로나 바이러스(COVID-19) 인식 분석)

  • Yoon, Sanghoo;Jung, Sangyun;Kim, Young A
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.5
    • /
    • pp.317-324
    • /
    • 2021
  • This study deals with keywords from social media on domestic portal sites related to COVID-19, which is spreading widely. The data were collected between January 20 and August 15, 2020, and were divided into three stages. The precursor period is before COVID-19 started spreading widely between January 20 and February 17, the serious period denotes the spread in Daegu between February 18 and April 20, and the stable period is the decrease in numbers of confirmed infections up to August 15. The top 50 words were extracted and clustered based on TF-IDF. As a result of the analysis, the precursor period keywords corresponded to congestion of the Situation. The frequent keywords in the serious period were Nation and Infection Route, along with instability surrounding the Treatment of COVID-19. The most common keywords in all periods were infection, mask, person, occurrence, confirmation, and information. People's emotions are becoming more positive as time goes by. Cafes and blogs share text containing writers' thoughts and subjectivity via the internet, so they are the main information-sharing spaces in the non-face-to-face era caused by COVID-19. However, since selectivity and randomness in information delivery exists, a critical view of the information produced on social media is necessary.

A Semantic Text Model with Wikipedia-based Concept Space (위키피디어 기반 개념 공간을 가지는 시멘틱 텍스트 모델)

  • Kim, Han-Joon;Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.3
    • /
    • pp.107-123
    • /
    • 2014
  • Current text mining techniques suffer from the problem that the conventional text representation models cannot express the semantic or conceptual information for the textual documents written with natural languages. The conventional text models represent the textual documents as bag of words, which include vector space model, Boolean model, statistical model, and tensor space model. These models express documents only with the term literals for indexing and the frequency-based weights for their corresponding terms; that is, they ignore semantical information, sequential order information, and structural information of terms. Most of the text mining techniques have been developed assuming that the given documents are represented as 'bag-of-words' based text models. However, currently, confronting the big data era, a new paradigm of text representation model is required which can analyse huge amounts of textual documents more precisely. Our text model regards the 'concept' as an independent space equated with the 'term' and 'document' spaces used in the vector space model, and it expresses the relatedness among the three spaces. To develop the concept space, we use Wikipedia data, each of which defines a single concept. Consequently, a document collection is represented as a 3-order tensor with semantic information, and then the proposed model is called text cuboid model in our paper. Through experiments using the popular 20NewsGroup document corpus, we prove the superiority of the proposed text model in terms of document clustering and concept clustering.

Innovation of technology and social changes - quantitative analysis based on patent big data (기술의 진보와 혁신, 그리고 사회변화: 특허빅데이터를 이용한 정량적 분석)

  • Kim, Yongdai;Jong, Sang Jo;Jang, Woncheol;Lee, Jongsu
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1025-1039
    • /
    • 2016
  • We introduce various methods to investigate the relations between innovation of technology and social changes by analyzing more than 4 millions of patents registered at United States Patent and Trademark Office(USPTO) from year 1985 to 2015. First, we review the history of patent law and its relation with the quantitative changes of registered patents. Second, we investigate the differences of technical innovations of several countries by use of cluster analysis based on the numbers of registered patents at several technical sectors. Third, we introduce the PageRank algorithm to define important nodes in network type data and apply the PageRank algorithm to find important technical sectors based on citation information between registered patents. Finally, we explain how to use the canonical correlation analysis to study relationship between technical innovation and social changes.

Analysis of Relationship between Construction Accidents and Particulate Matter using Big Data

  • Lee, Minsu;Jeong, Jaewook;Jeong, Jaemin;Lee, Jaehyun
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.128-135
    • /
    • 2022
  • Because construction work is conducted outdoors, construction workers are affected by harmful environmental factor. Especially, Particulate Matter (PM10) is one of the harmful environmental factors with a diameter of 10㎍/m3 or less. When PM10 is inhaled by human, it can cause fatal impact on the human. Contrary to the various analyses of health impact on PM10, the research on the relationship between construction accidents and PM10 are few. Therefore, this study aims to conduct the relative frequency analysis which find out the correlation between construction accidents and PM10, and the modified PM10 grade is suggested to expect accidents probability caused by PM10 in the construction industry. This study is conducted by four steps. i) Establishment of the database; ii) Classification of data; iii) Analysis of the Relative Frequency of accidents in the construction industry by PM10 concentration; iv) Modified PM10 groups to classify the impact of PM10 on accident. In terms of frequency analysis, the most accidents were occurred in the average concentration of PM10 (32㎍/m3). However, we found that the relative frequency of accident was increased as the concentration of PM10 increased. This means the higher PM10 concentration can cause more accidents during construction. In addition, PM10 concentration was divided as 6 groups by the WHO, but the modified PM10 grade by the relative frequency on accident was suggested as 3 groups.

  • PDF

Big Five Personality in Discriminating the Groups by the Level of Social Sims (심리학적 도구 '5요인 성격 특성'에 의한 소셜 게임 연구: <심즈 소셜> 게임의 분석사례를 중심으로)

  • Lee, Dong-Yeop
    • Cartoon and Animation Studies
    • /
    • s.29
    • /
    • pp.129-149
    • /
    • 2012
  • The purpose of this study was to investigate the clustering and Big Five Personality domains in discriminating groups by level of school-related adjustment, as experienced by Social Sims game users. Social Games are based on web that has simple rules to play in fictional time and space background. This paper is to analyze the relationships between social networks and user behaviors through the social games . In general, characteristics of social games are simple, fun and easy to play, popular to the public, and based on personal connections in reality. These features of social games make themselves different from video games with one player or MMORPG with many unspecific players. Especially Social Game show a noticeable characteristic related to social learning. The object of this research is to provide a possibility that game that its social perspective can be strengthened in social game environment and analyze whether it actually influences on problem solving of real life problems, therefore suggesting its direction of alternative play means and positive simulation game. Data was collected by administering 4 questionnaires (the short version of BFI, Satisfaction with life, Career Decision-.Making Self-.Efficacy, Depression) to the participants who were 20 people in Seoul and Daejeon. For the purposes of the data analysis, both Stepwise Discriminant analysis and Cluster analysis was employed. Neuroticism, Openness, Conscientiousness within the Big Five Personality domains were seen to be significant variables when it came to discriminating the groups. These findings indicated that the short version of the BFI may be useful in understanding for game user behaviors When it comes to cultural research, digital game takes up a significant role. We can see that from the fact that game, which has only been considered as a leisure activity or commercial means, is being actively research for its methodological, social role and function. Among digital game's several meanings, one of the most noticeable ones is the research on its critical, social participating function. According to Jame Paul gee, the most important merit of game is 'projected identity'. This means that experiences from various perspectives is possible.[1] In his recent autobiography , he described gamer as an active problem solver. In addition, Gonzalo Francesca also suggested an alternative game developing method through 'game that conveys critical messages by strengthening critical reasons'. [2] They all provided evidences showing game can be a strong academic tool. Not only does a genre called social game exist in the field of media and Social Network Game, but there are also some efforts to positively evaluate its value Through these kinds of researches, we can study how game can give positive influence along with the change in its general perception, which would eventually lead to spreading healthy game culture and enabling fresh life experience. This would better bring out the educative side of the game and become a social communicative tool. The object of this game is to provide a possibility that the social aspect can be strengthened within the game environment and analyze whether it actually influences the problem solving of real life problems. Therefore suggesting it's direction of alternative play means positive game simulation.

Analysis of public library book loan demand according to weather conditions using machine learning (머신러닝을 활용한 기상조건에 따른 공공도서관 도서대출 수요분석)

  • Oh, Min-Ki;Kim, Keun-Wook;Shin, Se-Young;Lee, Jin-Myeong;Jang, Won-Jun
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.41-52
    • /
    • 2022
  • Although domestic public libraries achieved quantitative growth based on the 1st and 2nd comprehensive library development plans, there were some qualitative shortcomings, and various studies have been conducted to improve them. Most of the preceding studies have limitations in that they are limited to social and economic factors and statistical analysis. Therefore, in this study, by applying the spatiotemporal concept to quantitatively calculate the decrease in public library loan demand due to rainfall and heatwave, by clustering areas with high demand for book loan due to weather changes and areas where it is not, factors inside and outside public libraries and After the combination, changes in public library loan demand according to weather changes were analyzed. As a result of the analysis, there was a difference in the decrease due to the weather for each public library, and it was found that there were some differences depending on the characteristics and spatial location of the public library. Also, when the temperature was over 35℃, the decrease in book loan demand increased significantly. As internal factors, the number of seats, the number of books, and area were derived. As external factors, the public library access ramp, cafe, reading room, floating population in their teens, and floating population of women in their 30s/40s were analyzed as important variables. The results of this analysis are judged to contribute to the establishment of policies to promote the use of public libraries in consideration of the weather in a specific season, and also suggested limitations of the study.

Predicting Performance of Heavy Industry Firms in Korea with U.S. Trade Policy Data (미국 무역정책 변화가 국내 중공업 기업의 경영성과에 미치는 영향)

  • Park, Jinsoo;Kim, Kyoungho;Kim, Buomsoo;Suh, Jihae
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.4
    • /
    • pp.71-101
    • /
    • 2017
  • Since late 2016, protectionism has been a major trend in world trade with the Great Britain exiting the European Union and the United States electing Donald Trump as the 45th president. Consequently, there has been a huge public outcry regarding the negative prospects of heavy industry firms in Korea, which are highly dependent upon international trade with Western countries including the United States. In light of such trend and concerns, we have tried to predict business performance of heavy industry firms in Korea with data regarding trade policy of the United States. United States International Trade Commission (USITC) levies countervailing duties and anti-dumping duties to firms that violate its fair-trade regulations. In this study, we have performed data analysis with past records of countervailing duties and anti-dumping duties. With results from clustering analysis, it could be concluded that trade policy trends of the Unites States significantly affects the business performance of heavy industry firms in Korea. Furthermore, we have attempted to quantify such effects by employing long short-term memory (LSTM), a popular neural networks model that is well-suited to deal with sequential data. Our major contribution is that we have succeeded in empirically validating the intuitive argument and also predicting the future trend with rigorous data mining techniques. With some improvements, our results are expected to be highly relevant to designing regulations regarding heavy industry in Korea.

A Study on Energy Efficiency for Cluster-based Routing Protocol (클러스터 기반 라우팅 프로토콜의 에너지 효율성에 관한 연구)

  • Lee, Won-Seok;Ahn, Tae-Won;Song, ChangYoung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.3
    • /
    • pp.163-169
    • /
    • 2016
  • To establish the equitable distribution of total energy load, a representative cluster based routing protocol LEACH selects cluster heads randomly in accordance with the pre-determined probability every round. But because the current energy level of sensor nodes is not considered, if a sensor node which has little residual energy is elected as a cluster head, it can not live to fulfil the role of cluster head which has big energy load. As a result, the first time of death of a node is quickened and the service quality of WSN gets worse. In this regard we propose a new routing method that, by considering the current energy of a cluster head and the distance between cluster heads and a base station, selects the sub cluster head for saving the energy of a cluster head. Simulation results show that the first time of death of a node prolongs, more data arrive at the base station and the service quality of WSN improves.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.