Browse > Article
http://dx.doi.org/10.9708/jksci.2019.24.11.051

Web access prediction based on parallel deep learning  

Togtokh, Gantur (Dept. of Information Technology, Ulaan Baatar University)
Kim, Kyung-Chang (Dept. of Computer Engineering, Hongik University)
Abstract
Due to the exponential growth of access information on the web, the need for predicting web users' next access has increased. Various models such as markov models, deep neural networks, support vector machines, and fuzzy inference models were proposed to handle web access prediction. For deep learning based on neural network models, training time on large-scale web usage data is very huge. To address this problem, deep neural network models are trained on cluster of computers in parallel. In this paper, we investigated impact of several important spark parameters related to data partitions, shuffling, compression, and locality (basic spark parameters) for training Multi-Layer Perceptron model on Spark standalone cluster. Then based on the investigation, we tuned basic spark parameters for training Multi-Layer Perceptron model and used it for tuning Spark when training Multi-Layer Perceptron model for web access prediction. Through experiments, we showed the accuracy of web access prediction based on our proposed web access prediction model. In addition, we also showed performance improvement in training time based on our spark basic parameters tuning for training Multi-Layer Perceptron model over default spark parameters configuration.
Keywords
Apache Spark; Neural network; Parallel deep learning; Parameter tuning; Web access prediction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Spark guidelines documentatin for tunning https://spark.apache.org/docs/latest/tuning.html
2 R. Tous, A. Gounaris, C. Tripiana, J. Torres, S. Girona, E. Ayguadé, J.Labarta, Y. Becerra, D. Carrera, M. Valero "Spark deployment and performance evaluation on the marenostrum supercomputer" IEEE International Conference on Big Data (Big Data) (2015), pp. 299-306
3 Alpine Data tuning tip http://techsuppdiva.github.io/spark1.6.htm
4 A.J. Awan, M. Brorsson, V. Vlassov, E. Ayguade "How data volume affects spark data data analysitcs on a scale-up server" (2015)
5 Spark parameters configuration http://spark.apache.org/docs/latest /configuration.htm
6 Om Prakash Mandal, Hiteshware Kumar Azad "Web Access Prediction Model using Clustering and Artificial Neural Network", IJERT, Vol.3 Issue 9, 2014.
7 Pruthvi, "Web-Users' Browsing behavior Prediction by Implementing Neural Network in MapReduce", IJAFRC, Vol.1 Issue 5, 2014
8 Vidushi, Yashpal Singh, "SOM Improved Neural Network Approach for Next Page Prediction" International Journal of Computer Science and Mobile Computing, Volume 4, Issue 5, pg. 175-181, May 2015
9 Mamoun A.Awad, Issa Khalil, "Prediction of User's web-browsing behavior: Application of Markov Model", IEEE Transactions on Systems, Man, And Cybernetics - Part B: Cybernetic, vol. 42, no. 4, pp. 1131-1142, August 2012.   DOI
10 Wang, Yan "Web Mining and Knowledge Discovery of Usage Patterns", http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.6743& rep=rep1&type=pdf, 2000.
11 Giovanna Castellano, Anna M. Fanelli, and Maria A. Torsello, "Web Usage Mining: Discovering Usage Patterns for Web Applications", Advanced Techniques in Web Intelligence-2, SCI 452, pp. 75-104, 2013.
12 M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M.J.Franklin, S. Shenker, I. Stoica "Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing" 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2012), pp. 15-28
13 AnastasiosGounaris, Jordi Torres, "A Methodlogy for Spark Parameter Tuning", Big Data Research, Volume 11, pages 22-32, 2018   DOI
14 P. Petridis, A. Gounaris, J. Torres "Spark parameter tuning via trial-and-error," Advances in Big Data - Proceedings of the 2nd INNS Conference on Big Data (2016), pp. 226-237
15 NASA: web access log dataset: http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
16 ClarkNet: web access log dataset: http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html