• Title/Summary/Keyword: Nearest neighbor method

Search Result 414, Processing Time 0.022 seconds

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

Water Balance Projection Using Climate Change Scenarios in the Korean Peninsula (기후변화 시나리오를 활용한 미래 한반도 물수급 전망)

  • Kim, Cho-Rong;Kim, Young-Oh;Seo, Seung Beom;Choi, Su-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.8
    • /
    • pp.807-819
    • /
    • 2013
  • This study proposes a new methodology for future water balance projection considering climate change by assigning a weight to each scenario instead of inputting future streamflows based on GCMs into a water balance model directly. K-nearest neighbor algorithm was employed to assign weights and streamflows in non-flood period (October to the following June) was selected as the criterion for assigning weights. GCM-driven precipitation was input to TANK model to simulate future streamflow scenarios and Quantile Mapping was applied to correct bias between GCM hindcast and historical data. Based on these bias-corrected streamflows, different weights were assigned to each streamflow scenarios to calculate water shortage for the projection periods; 2020s (2010~2039), 2050s (2040~2069), and 2080s (2070~2099). As a result by applying the proposed methodology to project water shortage over the Korean Peninsula, average water shortage for 2020s is projected to increase to 10~32% comparing to the basis (1967~2003). In addition, according to getting decreased in streamflows in non-flood period gradually by 2080s, average water shortage for 2080s is projected to increase up to 97% (516.5 million $m^3/yr$) as maximum comparing to the basis. While the existing research on climate change gives radical increase in future water shortage, the results projected by the weighting method shows conservative change. This study has significance in the applicability of water balance projection regarding climate change, keeping the existing framework of national water resources planning and this lessens the confusion for decision-makers in water sectors.

A Comparison between Simulation Results of DSSAT CROPGRO-SOYBEAN at US Cornbelt using Different Gridded Weather Forecast Data (격자기상예보자료 종류에 따른 미국 콘벨트 지역 DSSAT CROPGRO-SOYBEAN 모형 구동 결과 비교)

  • Yoo, Byoung Hyun;Kim, Kwang Soo;Hur, Jina;Song, Chan-Yeong;Ahn, Joong-Bae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.3
    • /
    • pp.164-178
    • /
    • 2022
  • Uncertainties in weather forecasts would affect the reliability of yield prediction using crop models. The objective of this study was to compare uncertainty in crop yield prediction caused by the use of the weather forecast data. Daily weather data were produced at 10 km spatial resolution using W eather Research and Forecasting (W RF) model. The nearest neighbor method was used to downscale these data at the resolution of 5 km (W RF5K). Parameter-elevation Regressions on Independent Slopes Model (PRISM) was also applied to the WRF data to produce the weather data at the same resolution. W RF5K and PRISM data were used as inputs to the CROPGRO-SOYBEAN model to predict crop yield. The uncertainties of the gridded data were analyzed using cumulative growing degree days (CGDD) and cumulative solar radiation (CSRAD) during the soybean growing seasons for the crop of interest. The degree of agreement (DOA) statistics including structural similarity index were determined for the crop model outputs. Our results indicated that the DOA statistics for CGDD were correlated with that for the maturity dates predicted using WRF5K and PRISM data. Yield forecasts had small values of the DOA statistics when large spatial disagreement occured between maturity dates predicted using WRF5K and PRISM. These results suggest that the spatial uncertainties in temperature data would affect the reliability of the phenology and, as a result, yield predictions at a greater degree than those in solar radiation data. This merits further studies to assess the uncertainties of crop yield forecasts using a wide range of crop calendars.