Browse > Article
http://dx.doi.org/10.5351/KJAS.2019.32.3.391

Consumer behavior prediction using Airbnb web log data  

An, Hyoin (Department of Statistics, Ewha Womans University)
Choi, Yuri (Department of Statistics, Ewha Womans University)
Oh, Raeeun (Department of Statistics, Ewha Womans University)
Song, Jongwoo (Department of Statistics, Ewha Womans University)
Publication Information
The Korean Journal of Applied Statistics / v.32, no.3, 2019 , pp. 391-404 More about this Journal
Abstract
Customers' fixed characteristics have often been used to predict customer behavior. It has recently become possible to track customer web logs as customer activities move from offline to online. It has become possible to collect large amounts of web log data; however, the researchers only focused on organizing the log data or describing the technical characteristics. In this study, we predict the decision-making time until each customer makes the first reservation, using Airbnb customer data provided by the Kaggle website. This data set includes basic customer information such as gender, age, and web logs. We use various methodologies to find the optimal model and compare prediction errors for cases with web log data and without it. We consider six models such as Lasso, SVM, Random Forest, and XGBoost to explore the effectiveness of the web log data. As a result, we choose Random Forest as our optimal model with a misclassification rate of about 20%. In addition, we confirm that using web log data in our study doubles the prediction accuracy in predicting customer behavior compared to not using it.
Keywords
web log; customer behavior prediction; machine learning; data mining;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Breiman, L. (2001). Random forests, Machine Learning, 13, 5-32.   DOI
2 Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794 .
3 Friedman, J. (2001). Greedy boosting approximation: a gradient boosting machine, The Annals of Statistics, 29, 1189-1232 .   DOI
4 Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M., andWatts, D. J. (2010). Predicting consumer behavior with Web search. In Proceedings of the National Academy of Sciences of the United States of America, 107, 17486-17490.   DOI
5 Harford, T. (2014). Big data: are we making a big mistake?, Significance, 14-19.
6 Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines, IEEE Intelligent Systems and their Applications, 13, 18-28.   DOI
7 Igor, V. C., Scott, G., and Smyth, P. (2000). A general probabilistic framework for clustering individuals and objects. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining 2000, 140-149.
8 Kim, J. K. (2002). A study of web log file analysis for internet marketing of travel agency, Journal of Tourism and Leisure Research, 13, 147-160 .
9 Lazer, D., Kennedy, R., King, G., and Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis, Science, 343, 1203-1205.   DOI
10 Pandagre, K. N. and Veenadhari, S. (2017). Data mining techniques with web log, International Journal of Advanced Research in Computer Science Transactions on Pattern Analysis and Machine Intelligence, 8, 384-386.
11 Sujatha, V. and Punithavalli (2012). Improved user navigation pattern prediction technique from web log data, Procedia Engineering, 30, 92-99.   DOI
12 Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58, 267-288.   DOI