• Title/Summary/Keyword: Apache Flume

Search Result 4, Processing Time 0.023 seconds

A Distributed Real-time Self-Diagnosis System for Processing Large Amounts of Log Data (대용량 로그 데이터 처리를 위한 분산 실시간 자가 진단 시스템)

  • Son, Siwoon;Kim, Dasol;Moon, Yang-Sae;Choi, Hyung-Jin
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.58-68
    • /
    • 2018
  • Distributed computing helps to efficiently store and process large data on a cluster of multiple machines. The performance of distributed computing is greatly influenced depending on the state of the servers constituting the distributed system. In this paper, we propose a self-diagnosis system that collects log data in a distributed system, detects anomalies and visualizes the results in real time. First, we divide the self-diagnosis process into five stages: collecting, delivering, analyzing, storing, and visualizing stages. Next, we design a real-time self-diagnosis system that meets the goals of real-time, scalability, and high availability. The proposed system is based on Apache Flume, Apache Kafka, and Apache Storm, which are representative real-time distributed techniques. In addition, we use simple but effective moving average and 3-sigma based anomaly detection technique to minimize the delay of log data processing during the self-diagnosis process. Through the results of this paper, we can construct a distributed real-time self-diagnosis solution that can diagnose server status in real time in a complicated distributed system.

A Study on the Data Collection Methods based Hadoop Distributed Environment (하둡 분산 환경 기반의 데이터 수집 기법 연구)

  • Jin, Go-Whan
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Many studies have been carried out for the development of big data utilization and analysis technology recently. There is a tendency that government agencies and companies to introduce a Hadoop of a processing platform for analyzing big data is increasing gradually. Increased interest with respect to the processing and analysis of these big data collection technology of data has become a major issue in parallel to it. However, study of the collection technology as compared to the study of data analysis techniques, it is insignificant situation. Therefore, in this paper, to build on the Hadoop cluster is a big data analysis platform, through the Apache sqoop, stylized from relational databases, to collect the data. In addition, to provide a sensor through the Apache flume, a system to collect on the basis of the data file of the Web application, the non-structured data such as log files to stream. The collection of data through these convergence would be able to utilize as a basic material of big data analysis.

A Security Log Analysis System using Logstash based on Apache Elasticsearch (아파치 엘라스틱서치 기반 로그스태시를 이용한 보안로그 분석시스템)

  • Lee, Bong-Hwan;Yang, Dong-Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.2
    • /
    • pp.382-389
    • /
    • 2018
  • Recently cyber attacks can cause serious damage on various information systems. Log data analysis would be able to resolve this problem. Security log analysis system allows to cope with security risk properly by collecting, storing, and analyzing log data information. In this paper, a security log analysis system is designed and implemented in order to analyze security log data using the Logstash in the Elasticsearch, a distributed search engine which enables to collect and process various types of log data. The Kibana, an open source data visualization plugin for Elasticsearch, is used to generate log statistics and search report, and visualize the results. The performance of Elasticsearch-based security log analysis system is compared to the existing log analysis system which uses the Flume log collector, Flume HDFS sink and HBase. The experimental results show that the proposed system tremendously reduces both database query processing time and log data analysis time compared to the existing Hadoop-based log analysis system.

Big Data Platform for Utilizing and Analyzing Real-Time Sensing Information in Industrial Sites (산업현장 실시간 센싱정보 활용/분석을 위한 빅데이터 플랫폼)

  • Lee, Yonghwan;Suh, Jinhyung
    • Journal of Creative Information Culture
    • /
    • v.6 no.1
    • /
    • pp.15-21
    • /
    • 2020
  • In order to utilize big data in general industrial sites, the structured big data collected from facilities, processes, and environments of industrial sites must first be processed and stored, and in the case of unstructured data, it must be stored as unstructured data or converted into structured data and stored in a database. In this paper, we study a method of collecting big data based on open IoT standards that can converge and utilize measurement information, environmental information of industrial sites to collect big data. The platform for collecting big data proposed in this paper is capable of collecting, processing, and storing big data at industrial sites to process real-time sensing information. For processing and analyzing data according to the purpose of the stored industrial, various big data technologies also can be applied.