Browse > Article
http://dx.doi.org/10.7469/JKSQM.2020.48.4.553

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data  

Kim, Do Gyun (Engineering Research Institute, Ajou University)
Choi, Jin Young (Department of Industrial Engineering, Ajou University)
Publication Information
Abstract
Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.
Keywords
Distributed Machine Learning; H-RTGL; One-Class Classification; Model/Data Parallelism; Big data;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Huang, F., Matusevych, S., Anandkumar, A., Karampatziakis, N., and Mineiro, P. 2014. Distributed latent dirichlet allocation via tensor factorization. In NIPS Optimization Workshop. 1-5.
2 Jeong, I., Kim, D.G., Choi, J. Y., and Ko, J. 2019. Geometric one-class classifiers using hyper-rectangles for knowledge extraction. Expert Systems with Applications, 117:112-124.   DOI
3 Jin, R., Kou, C., Liu, R., and Li, Y. 2013. Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. Journal of Cloud Computing: Advances, Systems and Applications 2(1):18.   DOI
4 Khan, S. S., and Madden, M. G. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review 29(3):345-374.   DOI
5 Kim, D. G., Choi, J. Y., and Ko, J. 2018. An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation. Journal of Society of Korea Industrial and Systems Engineering 41(2):56-64.   DOI
6 Kim, D. G., and Choi, J. Y. 2020. A distributed processing framework based on H-RTGL for an efficient One-Class Classification of Big Data. Proceedings of the Korean Society for Quality Management Conference, 137.
7 Baek, C. H., Choe, J. H., and Lim, S. U. 2018. Review and suggestion of characteristics and quality measurement items of artificial intelligence service. Journal of the Korean Society for Quality Management 46(3):677-694.   DOI
8 Bekkerman, R., Bilenko, M., and Langford, J. 2011. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press.
9 Kim, H., Park, J., Jang, J., and Yoon, S. 2016. Deepspark: A spark-based distributed deep learning framework for commodity clusters. arXiv preprint arXiv:1602.08191.
10 Liu, Y., Xu, L., and Li, M. 2017. The parallelization of back propagation neural network in mapreduce and spark. International Journal of Parallel Programming 45(4):760-779.   DOI
11 Moritz, P., Nishihara, R., Stoica, I., and Jordan, M. I. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051.
12 Padhy, R. P. 2013. Big data processing with Hadoop-MapReduce in cloud systems. International Journal of Cloud Computing and Services Science 2(1):16-27.
13 Parsian, M. 2015. Data algorithms: Recipes for scaling up with hadoop and spark. O'Reilly Media, Inc.
14 Peralta, B., Parra, L., Herrera, O., and Caro, L. 2017. Distributed mixture-of-experts for Big Data using PETUUM framework. In 2017 36th International Conference of the Chilean Computer Science Society (SCCC). 1-7.
15 Tax, D. M. J., 2010. One-class classifier results URL http://homepage.tudelft.nl/n9d04/occ/
16 Chao, T. S. 2001. Introduction to semiconductor manufacturing technology. SPIE PRESS.
17 Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., and Yu, Y. (2015). Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 1(2), 49-67.   DOI
18 Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. 2010. Spark: Cluster computing with working sets. HotCloud, 10(10-10):95.
19 Caruana, G., Li, M., and Liu, Y. 2013. An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing 108:45-57.   DOI
20 Chang, E. Y. 2011. Psvm: Parallelizing support vector machines on distributed computers. In Foundations of Large-Scale Multimedia Information Management and Retrieval, 213-230.
21 Cho, H., Kim, K. T., Jang, Y. H., Kim, S. H., Kim, J. S., Park, K. Y., Jang, J. S., and Kim, J. M. 2015. Development of Load Profile Monitoring System Based on Cloud Computing in Automotive. Journal of the Korean Society for Quality Management 43(4):573-588.   DOI
22 Dai, W., and Ji, W. 2014. A mapreduce implementation of C4. 5 decision tree algorithm. International Journal of Database Theory and Application 7(1):49-60.   DOI
23 Giacomelli, P. 2013. Apache mahout cookbook. Packt Publishing.
24 Guo, W., Alham, N. K., Liu, Y., Li, M., and Qi, M. 2016. A resource aware MapReduce based parallel SVM for large scale image classifications. Neural Processing Letters, 44(1):161-184.   DOI
25 Hodge, V. J., O'Keefe, S., and Austin, J. 2016. Hadoop neural network for parallel and distributed feature selection. Neural Networks, 78:24-35.   DOI