DOI QR코드

DOI QR Code

준지도 지지 벡터 회귀 모델을 이용한 반응 모델링

Response Modeling with Semi-Supervised Support Vector Regression

  • 김동일 (삼성전자 시스템기술팀)
  • Kim, Dong-Il (System Engineering Team, Samsung Electronics, Co. Ltd.)
  • 투고 : 2014.07.29
  • 심사 : 2014.08.19
  • 발행 : 2014.09.30

초록

본 논문에서는 준지도 지지 벡터 회귀 모델(semi-supervised support vector regression)을 이용한 반응 모델링(response modeling)을 제안한다. 반응 모델링의 성능 및 수익성을 높이기 위해, 고객 데이터 셋의 대부분을 차지하는 레이블이 존재하지 않는 데이터를 기존 레이블이 존재하는 데이터와 함께 학습에 이용한다. 제안하는 알고리즘은 학습 복잡도를 낮은 수준으로 유지하기 위해 일괄 학습(batch learning) 방식을 사용한다. 레이블 없는 데이터의 레이블 추정에서 불확실성(uncertainty)을 고려하기 위해, 분포추정(distribution estimation)을 하여 레이블이 존재할 수 있는 영역을 정의한다. 그리고 추정된 레이블 영역으로부터 오버샘플링(oversampling)을 통해 각 레이블이 없는 데이터에 대한 레이블을 복수 개 추출하여 학습 데이터 셋을 구성한다. 이 때, 불확실성의 정도에 따라 샘플링 비율을 다르게 함으로써, 불확실한 영역에 대해 더 많은 정보를 발생시킨다. 마지막으로 지능적 학습 데이터 선택 기법을 적용하여 학습 복잡도를 최종적으로 감소시킨다. 제안된 반응 모델링의 성능 평가를 위해, 실제 마케팅 데이터 셋에 대해 다양한 레이블 데이터 비율로 실험을 진행하였다. 실험 결과 제안된 준지도 지지 벡터 회귀 모델을 이용한 반응 모델이 기존 모델에 비해 더 높은 정확도 및 수익을 가질 수 있다는 점을 확인하였다.

In this paper, I propose a response modeling with a Semi-Supervised Support Vector Regression (SS-SVR) algorithm. In order to increase the accuracy and profit of response modeling, unlabeled data in the customer dataset are used with the labeled data during training. The proposed SS-SVR algorithm is designed to be a batch learning to reduce the training complexity. The label distributions of unlabeled data are estimated in order to consider the uncertainty of labeling. Then, multiple training data are generated from the unlabeled data and their estimated label distributions with oversampling to construct the training dataset with the labeled data. Finally, a data selection algorithm, Expected Margin based Pattern Selection (EMPS), is employed to reduce the training complexity. The experimental results conducted on a real-world marketing dataset showed that the proposed response modeling method trained efficiently, and improved the accuracy and the expected profit.

키워드

참고문헌

  1. F.F. Gonul, B.D. Kim, and M. Shi, "Mailing Smarter to Catalog Customer," Journal of Interactive Marketing, Vol. 14, No. 2, pp.2-6, Apr. 2000.
  2. H. Shin, and S. Cho, "Response Modeling with Support Vector Machines," Expert Systems with Applications, Vol. 30, No. 4, pp.746-760, May. 2006. https://doi.org/10.1016/j.eswa.2005.07.037
  3. R.C. Blatberg, B.D. Kim, and S.A. Neslin, "Database Marketing: Analyzing and Managing Customers," Springer, pp.245-287, 2008.
  4. K. Wang, S. Zhou, Q. Yang, and J.M.S. Yeung, "Mining Customer Value from Association Rules to Direct Marketing," Data Mining and Knowledge Discovery, Vol. 11, pp.57-79, Jul. 2005. https://doi.org/10.1007/s10618-005-1355-x
  5. D. Kim, H.J. Lee, and S. Cho, "Response Modeling with Support Vector Regression," Expert Systems with Applications, Vol. 34, No. 2, pp.1102-1108, Feb. 2008. https://doi.org/10.1016/j.eswa.2006.12.019
  6. D. Kim, and S. Cho, "Pattern Selection for Support Vector Regression based Response Modeling," Expert Systems with Applications, Vol. 39, No. 10, pp.8975-8985, Aug. 2012. https://doi.org/10.1016/j.eswa.2012.02.026
  7. A. Smola, and B. Scholkopf, "A Tutorial on Support Vector Regression," NeuroCOLT Technical Report NC-TR-98-030, University of London, 2002.
  8. V. Vapnik, "The Natural of Statistical Learning Theory," Springer, pp.549-557, 1995.
  9. H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support Vector Regression Machines," Advances in Neural Information Processing System, Vol. 9, pp.155-161, May. 1997.
  10. B. Choi, and K. Cho, "Comparison of HMM and SVM Schemes in Detecting Mobile Botnet," Journal of the Korea Society of Computer and Information, vol.19, no.4, pp.81-90, 2014 Apr. https://doi.org/10.9708/jksci.2014.19.4.081
  11. K. Huh, and S. Kim, "Context-Aware Fusion with Support Vector Machine," Journal of the Korea Society of Computer and Information, vol.19, no.6, pp.19-26, 2014 Jun. https://doi.org/10.9708/jksci.2014.19.6.019
  12. X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report 1350, University of Wisconsin at Madison, 2006.
  13. E.C. Malthouse, "Ridge Regression and Direct Marketing Scoring Models," Journal of Interactive Marketing, Vol. 19, No. 4, pp.10-23, Nov. 1999.
  14. D. Haughton, and S. Oulabi, "Direct Marketing Modeling with CART and CHAID," Journal of Direct Marketing, Vol. 11, No. 4, pp.42-52, Nov. 1997. https://doi.org/10.1002/(SICI)1522-7138(199723)11:4<42::AID-DIR7>3.0.CO;2-W
  15. D.L. Olson, and B. C, "Direct Marketing Decision Support Through Predictive Customer Response Modeling," Decision Support Systems, Vol. 51, No. 1, pp.443-451, Dec. 2012.
  16. E.H. Suh, K.C. Noh, and C.K. Suh, "Customer List Segmentation using the Combined Response Model," Expert Systems with Applications, Vol. 17, No. 2, pp.89-97, Aug. 1999. https://doi.org/10.1016/S0957-4174(99)00026-3
  17. Y. Bentz, and D. Merunka, "Neural Networks and the Multinomial Logit for Brand Choice Moldeing: a Hybrid Approach," Journal of Forecasting, Vol. 19, No. 3, pp.177-200, Apr. 2000. https://doi.org/10.1002/(SICI)1099-131X(200004)19:3<177::AID-FOR738>3.0.CO;2-6
  18. K. Ha, S. Cho, and D. MacLachlan, "Response Models based on Bagging Neural Networks," Journal of Interactive Marketing, Vol. 19, No. 1, pp.17-30, Feb. 2005. https://doi.org/10.1002/dir.20028
  19. E. Yu, and S. Cho, "Constructing Response Model using Ensemble based on Feature Subset Selection," Expert Systems with Applications, Vol. 30, No. 2, pp.352-360, Feb. 2006. https://doi.org/10.1016/j.eswa.2005.07.026
  20. H. Lee, and S. Cho, "Focusing on Non-Respondents: Response Modeling with Novelty Detectors," Expert Systems with Applications 33(2), pp.522-530, Feb. 2007. https://doi.org/10.1016/j.eswa.2006.05.016
  21. M. Daneshmandi, and M. Ahmadzadeh, "A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing," Indian Journal of Computer Science and Engineering, Vol. 3, No. 6, pp.844-855, Dec. 2012.
  22. A.N. Aliabadi, "Hybrid Model of Customer Response Modeling Through Combination of Neural Networks and Data Preprocessing," In Proceedings of the 2013 IEEE International Conference on Fuzzy Systems (FUZZ), pp.1-4, Hyderabad, India, 2013 Jul.
  23. P. Kang, S. Cho, and D. MacLachlan, "Improved Response Modeling based on Clustering, Under-Sampling, and Ensemble," Expert Systems with Applications, Vol. 39, No. 8, pp.6738-6753, Jun. 2012. https://doi.org/10.1016/j.eswa.2011.12.028
  24. H. Lee, H. Shin, S. Hwang, S. Cho, and D. MacLachlan, "Semi-Supervised Response Modeling," Journal of Interactive Marketing, Vol. 24, No. 1, pp.42-54, Feb. 2010. https://doi.org/10.1016/j.intmar.2009.10.004
  25. M. Sun, Z.Y. Chen, and Z.P. Fan, "A Multi-task Multi-Kernel Transfer Learning Method for Customer Response Modeling in Social Media," Procedia Computer Science, Vol. 31, pp.221-230, Jun. 2014. https://doi.org/10.1016/j.procs.2014.05.263
  26. H. Risselada, P.C. Verhoef, and T.H.A. Bijmolt, "Dynamic Effects of Social Influence and Direct Marketing on the Adoption of High-Technology Products," Journal of Marketing, Vol. 78, No. 2, pp.99-118, Mar. 2014. https://doi.org/10.1509/jm.13.0220
  27. A. Blum, and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-Training," In Proceedings of the Workshop on Computational Learning Theory, pp.92-100, New York, NY, USA, 1998 Jul.
  28. T. Mitchell, "The Role of Unlabeled Data in Supervised Learning," In Proceedings of the 6th International Colloquium on Cognitive Science, San Sebastian, Spain, 1999 May.
  29. Z.H. Zhou, and M. Li, "Semisupervised Regression with Cotraining-Style Algorithms," IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 11, pp.1479-1493, Nov. 2007. https://doi.org/10.1109/TKDE.2007.190644
  30. X. Wang, L. Fu, and L. Ma, "Semi-Supervised Support Vector Regression Model for Remote Sensing Water Quality Retrieving," Chinese Geographical Science, Vol. 21, No. 1, pp.57-64, Feb. 2011. https://doi.org/10.1007/s11769-010-0425-1
  31. D. Kim, P. Kang, and S. Cho, "Semi-Supervised Support Vector Regression Considering Labeling Uncertainty with Label Distribution Estimation and Oversampling," Neurocomputing, Submitted, Jun. 2014.
  32. S.K. Lee, P. Kang, and S. Cho, "Probabilistic Local Reconstruction in k-NN Regression and Its Applications to Virtual Metrology in Semiconductor Manufacturing," Neurocomputing, Vol. 131, pp.427-439, May. 2014. https://doi.org/10.1016/j.neucom.2013.10.001
  33. D. de Ridder, and R. Duin, "Locally Linear Embedding for Classification," Technical Report PH-2002-01, Delft University of Technology, 2002.
  34. E.C. Malthouse, "Performance-based Variable Selection for Scoring Models," Journal of Interactive Marketing, Vol. 16, No. 4, pp.37-50, Nov. 2002. https://doi.org/10.1002/dir.10043