• Title/Summary/Keyword: Log 분산처리

Search Result 41, Processing Time 0.027 seconds

Messaging System Analysis for Effective Embedded Tester Log Processing (효과적인 Embedded Tester Log 처리를 위한 Messaging System 분석)

  • Nam, Ki-ahn;Kwon, Oh-young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.645-648
    • /
    • 2017
  • The existing embedded tester used TCP and shared file system for log processing. In addition, the existing processing method was treated as 1-N structure. This method wastes resources of the tester for exception handling. We implemented a log processing message layer that can be distributed by messaging system. And we compare the transmission method using the message layer and the transmission method using TCP and the shared file system. As a result of comparison, transmission using the message layer showed higher transmission bandwidth than TCP. In the CPU usage, the message layer showed lower efficiency than TCP, but showed no significant difference. It can be seen that the log processing using the message layer shows higher efficiency.

  • PDF

UX Analysis for Mobile Devices Using MapReduce on Distributed Data Processing Platform (MapReduce 분산 데이터처리 플랫폼에 기반한 모바일 디바이스 UX 분석)

  • Kim, Sungsook;Kim, Seonggyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.9
    • /
    • pp.589-594
    • /
    • 2013
  • As the concept of web characteristics represented by openness and mind sharing grows more and more popular, device log data generated by both users and developers have become increasingly complicated. For such reasons, a log data processing mechanism that automatically produces meaningful data set from large amount of log records have become necessary for mobile device UX(User eXperience) analysis. In this paper, we define the attributes of to-be-analyzed log data that reflect the characteristics of a mobile device and collect real log data from mobile device users. Along with the MapReduce programming paradigm in Hadoop platform, we have performed a mobile device User eXperience analysis in a distributed processing environment using the collected real log data. We have then demonstrated the effectiveness of the proposed analysis mechanism by applying the various combinations of Map and Reduce steps to produce a simple data schema from the large amount of complex log records.

A Distributed Real-time Self-Diagnosis System for Processing Large Amounts of Log Data (대용량 로그 데이터 처리를 위한 분산 실시간 자가 진단 시스템)

  • Son, Siwoon;Kim, Dasol;Moon, Yang-Sae;Choi, Hyung-Jin
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.58-68
    • /
    • 2018
  • Distributed computing helps to efficiently store and process large data on a cluster of multiple machines. The performance of distributed computing is greatly influenced depending on the state of the servers constituting the distributed system. In this paper, we propose a self-diagnosis system that collects log data in a distributed system, detects anomalies and visualizes the results in real time. First, we divide the self-diagnosis process into five stages: collecting, delivering, analyzing, storing, and visualizing stages. Next, we design a real-time self-diagnosis system that meets the goals of real-time, scalability, and high availability. The proposed system is based on Apache Flume, Apache Kafka, and Apache Storm, which are representative real-time distributed techniques. In addition, we use simple but effective moving average and 3-sigma based anomaly detection technique to minimize the delay of log data processing during the self-diagnosis process. Through the results of this paper, we can construct a distributed real-time self-diagnosis solution that can diagnose server status in real time in a complicated distributed system.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

A Distributed Algorithm to Update Spanning Tree and Strongly-Connected Components (생성트리와 강결합요소의 갱신을 위한 분산 알고리즘)

  • Park, Jeong-Ho;Park, Yun-Yong;Choe, Seong-Hui
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.299-306
    • /
    • 1999
  • Considers the problem to update the spanning tree and strongly-connected components in response to topology change of the network. This paper proposes a distributed algorithm that solves such a problem after several processors and links are added and deleted. Its message complexity and its ideal-time complexity are O(n'log n'+ (n'+s+t)) and O(n'logn') respectively where n'is the number of processors in the network after the topology change, s is the number of added links, and t is the total number of links in the strongly connected component (of the network before the topology change) including the deleted links.

  • PDF

Log processing using messaging system in SSD Storage Tester (SSD Storage Tester에서 메시징 시스템을 이용한 로그 처리)

  • Nam, Ki-ahn;Kwon, Oh-young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1531-1539
    • /
    • 2017
  • The existing SSD storage tester processed logs in a 1-N structure between server and client using TCP and network file system. This method causes some problems for example, an increase in CPU usage and difficulty in exception handling, etc. In this paper, we implement a log processing message layer that can deal with asynchronous distributed processing using open source messaging system such as kafka, RabbitMQ and compare this layer with existing log transmission method. A log simulator was implemented to compare the transmission bandwidth and CPU usage. Test results show that the transmission using the message layer has higher performance than the transmission using the message layer, and the CPU usage does not show any significant difference The message layer can be implemented more easily than the conventional method and the efficiency is higher than that of the conventional method.

Distributed Algorithm for Updating Minimum-Weight Spanning Tree Problem (MST 재구성 분산 알고리즘)

  • Park, Jeong-Ho;Min, Jun-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.2
    • /
    • pp.184-193
    • /
    • 1994
  • This paper considers the Updating Minimum-weight Spanning Tree Problem(UMP), that is, the problem to update the Minimum-weight Spanning Tree(MST) in response to topology change of the network. This paper proposes the algorithm which reconstructs the MST after several links deleted and added. Its message complexity and its ideal-time complexity are Ο(m+n log(t+f)) and Ο(n+n log(t+f)) respectively, where n is the number of processors in the network, t(resp.f) is the number of added links (resp. the number of deleted links of the old MST), And m=t+n if f=Ο, m=e (i.e. the number of links in the network after the topology change) otherwise. Moreover the last part of this paper touches in the algorithm which deals with deletion and addition of processors as well as links.

  • PDF

CERES: A Log-based, Interactive Web Analytics System for Backbone Networks (CERES: 백본망 로그 기반 대화형 웹 분석 시스템)

  • Suh, Ilhyun;Chung, Yon Dohn
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.10
    • /
    • pp.651-657
    • /
    • 2015
  • The amount of web traffic has increased as a result of the rapid growth of the use of web-based applications. In order to obtain valuable information from web logs, we need to develop systems that can support interactive, flexible, and efficient ways to analyze and handle large amounts of data. In this paper, we present CERES, a log-based, interactive web analytics system for backbone networks. Since CERES focuses on analyzing web log records generated from backbone networks, it is possible to perform a web analysis from the perspective of a network. CERES is designed for deployment in a server cluster using the Hadoop Distributed File System (HDFS) as the underlying storage. We transform and store web log records from backbone networks into relations and then allow users to use a SQL-like language to analyze web log records in a flexible and interactive manner. In particular, we use the data cube technique to enable the efficient statistical analysis of web log. The system provides users a web-based, multi-modal user interface.

A Security Log Analysis System using Logstash based on Apache Elasticsearch (아파치 엘라스틱서치 기반 로그스태시를 이용한 보안로그 분석시스템)

  • Lee, Bong-Hwan;Yang, Dong-Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.2
    • /
    • pp.382-389
    • /
    • 2018
  • Recently cyber attacks can cause serious damage on various information systems. Log data analysis would be able to resolve this problem. Security log analysis system allows to cope with security risk properly by collecting, storing, and analyzing log data information. In this paper, a security log analysis system is designed and implemented in order to analyze security log data using the Logstash in the Elasticsearch, a distributed search engine which enables to collect and process various types of log data. The Kibana, an open source data visualization plugin for Elasticsearch, is used to generate log statistics and search report, and visualize the results. The performance of Elasticsearch-based security log analysis system is compared to the existing log analysis system which uses the Flume log collector, Flume HDFS sink and HBase. The experimental results show that the proposed system tremendously reduces both database query processing time and log data analysis time compared to the existing Hadoop-based log analysis system.

Design and implementation of a Large-Scale Security Log Collection System based on Hadoop Ecosystem (Hadoop Ecosystem 기반 대용량 보안로그 수집 시스템 설계 및 구축)

  • Lee, Jong-Yoon;Lee, Bong-Hwan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.461-463
    • /
    • 2014
  • 네트워크 공격이 다양해지고 빈번하게 발생함에 따라 이에 따라 해킹 공격의 유형을 파악하기 위해 다양한 보안 솔루션이 생겨났다. 그 중 하나인 통합보안관리시스템은 다양한 로그 관리와 분석을 통해 보안 정책을 세워 차후에 있을 공격에 대비할 수 있지만 기존 통합보안관리시스템은 대부분 관계형 데이터베이스의 사용으로 급격히 증가하는 데이터를 감당하지 못한다. 많은 정보를 가지는 로그데이터의 유실 방지 및 시스템 저하를 막기 위해 대용량의 로그 데이터를 처리하는 방식이 필요해짐에 따라 분산처리에 특화되어 있는 하둡 에코시스템을 이용하여 늘어나는 데이터에 따라 유연하게 대처할 수 있고 기존 NoSQL 로그 저장방식에서 나아가 로그 저장단계에서 정규화를 사용하여 처리, 저장 능력을 향상시켜 실시간 처리 및 저장, 확장성이 뛰어난 하둡 기반의 로그 수집 시스템을 제안하고자 한다.