Browse > Article
http://dx.doi.org/10.7465/jkdi.2017.28.6.1245

Adaptive stochastic gradient method under two mixing heterogenous models  

Moon, Sang Jun (Department of Statistics, University of Seoul)
Jeon, Jong-June (Department of Statistics, University of Seoul)
Publication Information
Journal of the Korean Data and Information Science Society / v.28, no.6, 2017 , pp. 1245-1255 More about this Journal
Abstract
The online learning is a process of obtaining the solution for a given objective function where the data is accumulated in real time or in batch units. The stochastic gradient descent method is one of the most widely used for the online learning. This method is not only easy to implement, but also has good properties of the solution under the assumption that the generating model of data is homogeneous. However, the stochastic gradient method could severely mislead the online-learning when the homogeneity is actually violated. We assume that there are two heterogeneous generating models in the observation, and propose the a new stochastic gradient method that mitigate the problem of the heterogeneous models. We introduce a robust mini-batch optimization method using statistical tests and investigate the convergence radius of the solution in the proposed method. Moreover, the theoretical results are confirmed by the numerical simulations.
Keywords
Mini-batch; on-line learning; robustness; stochastic gradient descent method;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159.
2 Boyd, S. and Lieven, V. (2004). Convex optimization. Cambridge university press, 466-468.
3 Bottou, L. (2010). Large-Scale Machine Learning with Stochastic Gradient Descent. Proceedings of COMPSTAT' 2010, 177-186.
4 Dekel, O., Gilad-Bachrach, R., Shamir, O. and Xiao, L. (2012). Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13, 165-202.
5 Hwang, C. and Shim, J. (2016). Deep LS-SVM for regression. Journal of the Korean Data & Information Science Society, 27, 827-833.   DOI
6 Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
7 Konecny, J., Liu, J., Richtarik, P. and Takac, M. (2016). Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE Journal of Selected Topics in Signal Processing, 10, 242-255.   DOI
8 LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature, 521, 436-444.   DOI
9 Lee, W. and Chun, H. (2016). A deep learning analysis of the Chinese Yuans volatility in the onshore and offshore markets. Journal of the Korean Data & Information Science Society, 27, 327-335.   DOI
10 Li, M., Zhang, T., Chen, Y. and Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.
11 Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
12 Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-538.   DOI
13 Shapiro, A. and Wardi, Y. (1996). Convergence analysis of gradient descent stochastic algorithms. Journal of optimization theory and applications, 91, 439-454.   DOI
14 Tieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4.2, 26-31
15 Yamanishi, K., Takeuchi, J. I., Williams, G. and Milne, P. (2004). On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8, 275-300.   DOI