Browse > Article
http://dx.doi.org/10.7472/jksii.2013.14.6.71

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment  

Kim, Myoungjin (Department of Internet and Multimedia Engineering, Konkuk University)
Han, Seungho (Department of Internet and Multimedia Engineering, Konkuk University)
Cui, Yun (Department of Internet and Multimedia Engineering, Konkuk University)
Lee, Hanku (Department of Internet and Multimedia Engineering, Konkuk University)
Publication Information
Journal of Internet Computing and Services / v.14, no.6, 2013 , pp. 71-84 More about this Journal
Abstract
Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.
Keywords
Cloud Computing; NoSQL; MongoDB; Unstructured Data; Banking System;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Woogryul Jeon, Jeeyeon Kim, Youngsook Lee, Dongho Won, "Analysis of Threats and Countermeasures on Mobile Smartphone," Journal of The Korea Society of Computer and Information, Vol. 16, No. 2, pp.153-163, 2011.   과학기술학회마을   DOI   ScienceOn
2 Taebok Yoon, Seunghoon Lee, KwangHo Yoonm, Jee-Hyong Lee, "Design and Application of Multi Concept Keyword Model based on Web-using information", Review of Korean Society for internet Information, Vol. 10, No. 5, pp.95-1105, 2009. 10.   과학기술학회마을
3 U-Chang Park, "A Database Schema Integration Method Using XML Schema", Review of Korean Society for internet Information, Vol. 3, No. 2, pp.39-56, 2002. 04.
4 Hyung-Woo Lee, "Android based Mobile Device Rooting Attack Detection and Malicious Application Event Monitoring," Review of Korean Society for Internet Information, Vol. 13, No. 1, pp.30-38, 2012.
5 Jae-woo Park, Sung-tae Moon, Gi-Wook Son, In-Kyoung Kim, Kyoung-Soo Han, Eul-Gyu Im, Il-Gon Kim, "An Automatic Malware Classification System using String List and APIs," Journal of Security Engineering, Vol.8, No.5, pp.611-626, 2011.
6 Neal Leavitt, "Will NoSQL Databases Live Up to Their Promise?", Computer, Vol. 43, No. 2, pp.12-14, 2010. 02.
7 Jing Han, Haihong E., Guan Le, Jian Du, "Survey on NoSQL database", Pervasive Computing and Applications (ICPCA) 2011 6th International Conference on, pp.363-366, 2011. 10.
8 Robin Hecht, Stefan Jablonski, "NoSQL evaluation: A use case oriented survey", Cloud and Service Computing(CSC) 2011 International Conference on, pp.336-341, 2011. 12.
9 Jaroslav Pokorny, "NoSQL databases:a step to database scalability in web environment", iiWAS '11 Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, pp.278-283, 2011. 12.
10 Michael Stonebraker, "SQL databases v. NoSQL databases", Communications of the ACM, Vol. 53, No. 4, pp.10-11, 2010. 04.
11 Zhu Wei-ping, Li Ming-Xin, Chen Huan, "Using MongoDB to implement textbook management system instead of MySQL", Communication Software and Networks (ICCSN) 2011 IEEE 3rd International Conference on, pp.303-305, 2011. 05.
12 Jeffry Dean, Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters", Communications of the ACM - 50th anniversary issue: 1958-2008, Vol. 51, No.1, pp.107-113, 2008. 01.
13 Santo Lombardo, Elisabetta Di Nitto, Danio Ardagna, "Issues in Handling complex Data Structures with NoSQL databases", Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) 2012 14th International Symposium on, pp.443-448, 2012. 09.
14 Kristina Chodorow, Michael Dirolf, "MongoDB : the definitive guide", 2010. 09.
15 MongoDB, http://www.mongodb.org/
16 Konstatin Shavchko, Hairong Kuang, Sanjay Radia, Robert Chansler, "The Hadoop Distributed File System", Mass Storage Systems and Technologies (MSST) 2010 IEEE 26th Symposium on, pp.1-10, 2010. 05.