Search | Korea Science

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
- KIISE Transactions on Computing Practices
- /
- v.23 no.2
- /
- pp.128-133
- /
- 2017
In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.
https://doi.org/10.5626/KTCP.2017.23.2.128 인용 KSCI

Anomaly Detection of Hadoop Log Data Using Moving Average and 3-Sigma (이동 평균과 3-시그마를 이용한 하둡 로그 데이터의 이상 탐지)

Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae;Won, Hee-Sun
- KIPS Transactions on Software and Data Engineering
- /
- v.5 no.6
- /
- pp.283-288
- /
- 2016
In recent years, there have been many research efforts on Big Data, and many companies developed a variety of relevant products. Accordingly, we are able to store and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of log data, which rapidly occur in multiple servers, in this paper we design a new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement anomaly detection methods, which identify abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection methods by demonstrating that our methods identifies anomalies correctly. These results show that our anomaly detection is an excellent approach for properly detecting anomalies from Hadoop log data.
https://doi.org/10.3745/KTSDE.2016.5.6.283 인용 PDF KSCI

Performance Optimization Strategies for Fully Utilizing Apache Spark (아파치 스파크 활용 극대화를 위한 성능 최적화 기법)

Myung, Rohyoung;Yu, Heonchang;Choi, Sukyong
- KIPS Transactions on Computer and Communication Systems
- /
- v.7 no.1
- /
- pp.9-18
- /
- 2018
Enhancing performance of big data analytics in distributed environment has been issued because most of the big data related applications such as machine learning techniques and streaming services generally utilize distributed computing frameworks. Thus, optimizing performance of those applications at Spark has been actively researched. Since optimizing performance of the applications at distributed environment is challenging because it not only needs optimizing the applications themselves but also requires tuning of the distributed system configuration parameters. Although prior researches made a huge effort to improve execution performance, most of them only focused on one of three performance optimization aspect: application design, system tuning, hardware utilization. Thus, they couldn't handle an orchestration of those aspects. In this paper, we deeply analyze and model the application processing procedure of the Spark. Through the analyzed results, we propose performance optimization schemes for each step of the procedure: inner stage and outer stage. We also propose appropriate partitioning mechanism by analyzing relationship between partitioning parallelism and performance of the applications. We applied those three performance optimization schemes to WordCount, Pagerank, and Kmeans which are basic big data analytics and found nearly 50% performance improvement when all of those schemes are applied.
https://doi.org/10.3745/KTCCS.2018.7.1.9 인용 PDF

걸프(Gulf)전을 통해 살펴본 사막의전투 양상

Heo, Seon-Mu
- Defense and Technology
- /
- no.3 s.145
- /
- pp.58-69
- /
- 1991
제2차 세계대전중 "폰 라빈스타인" 장군이 말한대로, 사막은 전술가에게는 천국이지만, 병참장교에게는 악몽과 같다는 명구가 사막전투의 어려움을 그대로 일러주고 있다. 사막은 높은 열과 습도로 인해 장병들의 작전효율을 크게 감소시키며, 특히 이번 걸프전의 경우 먼지와 모래바람은 첨단장비의 한계를 인식시켜주고 있다. AH-64 "아파치" 헬기는 매일 엔진을 세척하고 있으며, M1A1 "에이브람스" 전차는 가스터어빈 엔진의 에어필터를 이틀에 한번씩 교체하고 있다. 또한 무전기등이 고온에서의 장시간 사용으로 빈번하게 고장이 발생하며, 높은 온도는 "패트리어트" 제어장치의 공기정화장비에 과동력을 공급하여 오동작이 일어나기도 하였다
PDF

An Analysis of System calls for Web Server : Apache 2.0 MPM-worker (하이브리드 멀티 프로세스 멀티 스래드 방식 웹서버의 시스템 호출 오버해드 분석)

Yeom, Mi-Ryeong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.05b
- /
- pp.1349-1352
- /
- 2003
웹 서버는 CPU time의 대부분인 $75{\sim}78%$를 시스템 코드에서 소비하며 사용자 코드에서는 생각보다 많은 시간을 소비하지 않는다. 이것은 웹 서버의 성능에 운영체제가 많은 영향을 끼치고 있음을 암시하는 것이다. 본 논문에서는 Linux Trace Toolkit를 이용하여, 하이브리드 멀티 프로세스 멀티 스래드 방식의 아파치 웹 서버가 구동 중인 동안 호출되는 시스템 호출의 동작 과정과 역할에 대해 알아보고 어떤 시스템 코드에서 오버해드가 큰지를 분석하였다.
PDF

Design and Implementation of GIS using Servlet on the Internet (인터넷에서 서블릿을 이용한 지리정보시스템의 설계 및 구현)

김병학
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2001.06a
- /
- pp.49-52
- /
- 2001
In this paper, the design and implementation of the Geographic Information Retrieval System for the ArcView is described. The environments for the system configurations include a PC server under Linux Operating System, Apache Web-server, and Oracle as database engine. In addition, JSP(Java Server page) and Servlet is used to view database and Map-Image.
PDF

A the internet distance education system development (인터넷 원격 교육 시스템 개발)

김도원;김윤미;최성
- Proceedings of the KAIS Fall Conference
- /
- 2001.05a
- /
- pp.338-341
- /
- 2001
본 논문은 원격교육을 위한 교재 설계 및 저작의 전반적인 개발원리와 세부 시스템의 알고리즘을 소개하고자한다. 리눅스는 지원하는 프로토콜이 다양하고 Windows개열 보다 낮은 사양의 하드웨어 사양에서도 서비스가 가능하므로 예산에 많은 제약을 받고 있는 학교 교육에 적합하다 원격 교육시스템의 개발 환경은 웹서버는 아파치 서버, 교재설계와 저작 모듈을 위한 응용프로그램은 컴퍼넌트 기반의 JAVA beam을 웹문서는 PHP, DBMS는 MySQL을 사용하고 있다.

Analysis of Multi-thread Fool Utilization Scheme on the Apache Web Server (아파치 웹 서버에서의 다중 쓰레드 풀 활용 기법 분석)

Jeon Heung Seok;Lee Seung Won;Kang Hyun Kyu
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.1
- /
- pp.21-28
- /
- 2005
Web servers or web application servers, in general, adopt multi-thread model for efficient handling of many user requests. However, the multi -thread model always does not show the better performance than multi -process model. Sometimes, in a certain specific case, it can show worse performance than multi -process model. In this paper, to trace the cause of the decreased performance of multi -thread model, we experiment and analyze the performance of the multi-thread model by using two approaches. At first, we compare the performance of the multi-process model and multi-thread model for various application environments. Second, we observe the effects of variations of web server's dynamic directives, which are used to increase the flexibility of the web server for various system environments. For the experiments, we integrated a web client simulator, which was written by us, with the Apache 2.0 web server. This paper shows and analyze the results of the experiments.
PDF KSCI

Design and Implementation of a Search Engine based on Apache Spark (아파치 스파크 기반 검색엔진의 설계 및 구현)

Park, Ki-Sung;Choi, Jae-Hyun;Kim, Jong-Bae;Park, Jae-Won
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.1
- /
- pp.17-28
- /
- 2017
Recently, a study on data has been actively conducted because the value of the data has become more useful. Web crawler that is program of data collection recently spotlighted because it can take advantage of the various fields. Web crawler can be defined as a tool to analyze the web pages and collects the URL by traversing the web server in an automated manner. For the treatment of Big-data, distributed Web crawler is widely used which is based on the Hadoop MapReduce. But, it is difficult to use and has constraints on the performance. Apache spark that is the In-memory computing platform is an alternative to MapReduce. The search engine which is one of the main purposes of web crawler displays the information you search by keyword gathered by web crawler. If search engines implement a spark-based web crawler instead of traditional MapReduce-based web crawler, it would be a more rapid data collection.
https://doi.org/10.6109/jkiice.2017.21.1.17 인용 PDF KSCI

A System Design for Real-Time Monitoring of Patient Waiting Time based on Open-Source Platform (오픈소스 플랫폼 기반의 실시간 환자 대기시간 모니터링 시스템 설계)

Ryu, Wooseok
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.22 no.4
- /
- pp.575-580
- /
- 2018
This paper discusses system for real-time monitoring of patient waiting time in hospitals based on open-source platform. It is necessary to make use of open-source projects to develop a high-performance stream processing system, which analyzes and processes stream data in real time, with less cost. The Hadoop ecosystem is a well-known big data processing platform consisting of numerous open-source subprojects. This paper first defines several requirements for the monitoring system, and selects a few projects from the Hadoop ecosystem that are suited to meet the requirements. Then, the paper proposes system architecture and a detailed module design using Apache Spark, Apache Kafka, and so on. The proposed system can reduce development costs by using open-source projects and by acquiring data from legacy hospital information system. High-performance and fault-tolerance of the system can also be achieved through distributed processing.
https://doi.org/10.6109/jkiice.2018.22.4.575 인용 PDF KSCI

Search Result 103, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)