Figure 2.1. Visualization of web log data structure.
Figure 2.2. Generation of derived variables using ‘Anti join’ method.
Figure 3.1. Variable importance plot of the Random Forest model.
Figure 3.2. Partial Dependence plot of variables with high importance.
Table 2.1. Descriptions on customer information variables
Table 2.2. Descriptions on web log information variables
Table 2.3. Percentage of customers by Duration’s category
Table 2.4. Average score by customer groups
Table 3.1. Comparison of 10-fold CV error and Test error by regression models
Table 3.2. Comparison of 10-fold CV error and Test error by classification models
Table 3.3. Confusion matrix of the Random Forest model
참고문헌
- Breiman, L. (2001). Random forests, Machine Learning, 13, 5-32. https://doi.org/10.1023/A:1010933404324
- Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794 .
- Friedman, J. (2001). Greedy boosting approximation: a gradient boosting machine, The Annals of Statistics, 29, 1189-1232 . https://doi.org/10.1214/aos/1013203451
- Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M., andWatts, D. J. (2010). Predicting consumer behavior with Web search. In Proceedings of the National Academy of Sciences of the United States of America, 107, 17486-17490. https://doi.org/10.1073/pnas.1005962107
- Harford, T. (2014). Big data: are we making a big mistake?, Significance, 14-19.
- Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines, IEEE Intelligent Systems and their Applications, 13, 18-28. https://doi.org/10.1109/5254.708428
- Igor, V. C., Scott, G., and Smyth, P. (2000). A general probabilistic framework for clustering individuals and objects. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining 2000, 140-149.
- Kim, J. K. (2002). A study of web log file analysis for internet marketing of travel agency, Journal of Tourism and Leisure Research, 13, 147-160 .
- Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis, Science, 343, 1203-1205. https://doi.org/10.1126/science.1248506
- Pandagre, K. N. and Veenadhari, S. (2017). Data mining techniques with web log, International Journal of Advanced Research in Computer Science Transactions on Pattern Analysis and Machine Intelligence, 8, 384-386.
- Sujatha, V. and Punithavalli (2012). Improved user navigation pattern prediction technique from web log data, Procedia Engineering, 30, 92-99. https://doi.org/10.1016/j.proeng.2012.01.838
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x