• Title/Summary/Keyword: apache

Search Result 355, Processing Time 0.031 seconds

Anomaly Detection of Hadoop Log Data Using Moving Average and 3-Sigma (이동 평균과 3-시그마를 이용한 하둡 로그 데이터의 이상 탐지)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae;Won, Hee-Sun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.283-288
    • /
    • 2016
  • In recent years, there have been many research efforts on Big Data, and many companies developed a variety of relevant products. Accordingly, we are able to store and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of log data, which rapidly occur in multiple servers, in this paper we design a new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement anomaly detection methods, which identify abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection methods by demonstrating that our methods identifies anomalies correctly. These results show that our anomaly detection is an excellent approach for properly detecting anomalies from Hadoop log data.

Utility of B-type Natriuretic Peptide in Patients with Acute Respiratory Distress Syndrome (급성호흡곤란증후군 환자에 있어서 B-type Natriuretic Peptide의 유용성)

  • Rhee, Chin Kook;Joo, Young Bin;Kim, Seok Chan;Park, Sung Hak;Lee, Sook Young;Koh, Yoon Seok;Kim, Young Kyoon
    • Tuberculosis and Respiratory Diseases
    • /
    • v.62 no.5
    • /
    • pp.389-397
    • /
    • 2007
  • Background B-type natriuretic peptide (BNP) has been shown to be strong mortality predictors in a wide variety of cardiovascular syndromes. Little is known about BNP in patients with acute respiratory distress syndrome (ARDS). We studied whether BNP can predict mortality in patients with ARDS. Method Echocardiographic study was done to all patients with ARDS, and we excluded patient with low ejection fraction (less than 50%) or showing any features of diastolic dysfunction. 47 patients were enrolled between December, 2003 and February, 2006. Parameters including BNP were obtained within 24h hours at the time of enrollment. Result Mean BNP concentrations and APACHE II scores differed between the survivors and nonsurvivors (BNP, $219.5{\pm}57.7pg/mL$ vs $492.3{\pm}88.8pg/mL$; p=0.013, APACHE II score, $17.4{\pm}1.6$ vs $23.1{\pm}1.3$, p=0.009, respectively). With the use of the threshold value for BNP of 585 pg/mL, the specificity for the prediction of mortality was 94%. The threshold value for APACHE II of 15.5 showed sensitivity of 87%. 'APACHE II + $11{\times}logBNP$' showed sensitivity 63%, and specificity 82%, using threshold value for 46.14. Conclusion BNP concentrations and APCHE II scores were more elevated in nonsurvivors than survivors in patients with ARDS who have normal ejection fraction. BNP can predict mortality. Further study should be done.

Outcomes in Relation to Time of Tracheostomy in Patients with Mechanical Ventilation (기계호흡환자의 기관절개 시행 시기에 따른 결과 분석)

  • Shin, Jeong-Eun;Shin, Tae-Rim;Park, Young-Mi;Nam, Jun-Sik;Cheon, Seon-Hee;Chang, Jung-Hyun
    • Tuberculosis and Respiratory Diseases
    • /
    • v.47 no.3
    • /
    • pp.365-373
    • /
    • 1999
  • Background: Despite widespread use of tracheostomy in intensive care unit, it is still controversial to define the best timing from endotracheal intubation to tracheostomy under prolonged mechanical ventilation. Early tracheostomy has an advantage of easy airway maintenance and enhanced patient mobility whereas a disadvantage in view of nosocomial infection and tracheal stenosis. However, there is a controversy about the proper timing of tracheostomy. Methods: We conducted a retrospective study of the 35 medical and 15 surgical ICU patients who had admitted to Ewha Womans University Mokdong Hospital from January 1996 to August 1998 with the observation of APACHE III score, occurrence of nosocomial infections, and clinical outcomes during 28 days from tracheostomy in terms of early (n=25) vs. late (n=25) tracheostomy. We defined the reference day of early and late tracheostomy as 7th day from intubation. Results: The number of patients were 25 each in early and late tracheostomy group. The mean age were $48{\pm}18$ years in early tracheostomy group and $63{\pm}17$ years in late tracheostomy group, showing younger in early tracheostomy group. The median duration of intubation prior to tracheostomy was 3 days and 13 days in early and late tracheostomy groups. Organs that caused primary problem were nervous system in 27 cases(54%), pulmonary 14(28%), cardiovascular 4(8%), gastrointestinal 4(8%) and genitourinary 1(2%) in the decreasing order. Prolonged ventilation was the most common reason for the purpose of tracheostomy in both groups. APACHE m scores at each time of intubation and tracheostomy were slightly higher in late tracheostomy group but not significant statistically. Day to day APACHE III scores were not different between two groups with observation upto 7th day after tracheostomy, Occurrence of nosocomial infections, weaning from mechanical ventilation, and mortality showed no significant difference between two groups with observation of 28 days from tracheostomy. The mortality was increased as the APACHE m score upto 7 days after tracheostomy increased, but there were no increment for the mortality in terms of the time of tracheostomy and the days of ventilator use before tracheostomy, Conclusion: The early tracheostomy seems to have no benefit with respect to severity of illness, nosocomial infection, duration of ventilatory support, and mortality. It suggests that the time of tracheostomy is better to be decided on clinical judgement in each case. And in near future, prospective, randomized case-control study is required to confirm these results.

  • PDF

Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL (OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.2
    • /
    • pp.270-276
    • /
    • 2018
  • Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.

System-Call-Level Core Affinity for Improving Network Performance (네트워크 성능향상을 위한 시스템 호출 수준 코어 친화도)

  • Uhm, Junyong;Cho, Joong-Yeon;Jin, Hyun-Wook
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.80-84
    • /
    • 2017
  • Existing operating systems experience scalability issues as the number of cores increases. The network I/O performance on manycore systems is faced with the major limiting factors of cache consistency costs and locking overheads. Legacy methods resolve this issue include the new microkernel-like operating system or modification of existing kernels; however, these solutions are not fully application transparent. In this study, we proposed a library that improves the network performance by separating system call context from user context and by applying the core affinity without any kernel and application modifications. Experiment results showed that our implementation can improve the network throughput of Apache by up to 30%.

Design of InfiniBand RDMA-based Network Structure of Apache Storm (InfiniBand RDMA 기반 Apache Storm의 네트워크 구조 설계)

  • Yang, Seokwoo;Son, Siwoon;Choi, Seong-Yun;Choi, Mi-Jung;Moon, Yang-Sae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.679-681
    • /
    • 2017
  • Apache Storm은 대용량 데이터 스트림을 처리하기 위한 실시간 분산 병렬 처리 프레임워크이며, 이를 사용해 다수의 프로세스 및 스레드를 동시에 동작시킬 수 있다. 하지만, 이러한 멀티 프로세스 및 스레드 환경을 제공하는 Storm은 많은 네트워크 시스템 호출을 수행하고, 이는 잦은 문맥 전환(context switch), 운영체제로의 버퍼 복사, 운영체제 내의 버퍼 복사 등으로 인해 CPU 과부하 문제를 발생시킬 수 있다. 이러한 문제는 고성능 네트워크 장비인 InfiniBand의 IPoIB(IP over InfiniBand) 통신을 사용할 때, InfiniBand가 지원하는 대역폭(bandwidth) 대비 저용량 데이터의 송수신으로 인해 더 잦은 문맥 전환과 버퍼 복사가 발생하여 CPU 과부하 문제가 더욱 심각해진다. 따라서, 본 논문에서는 InfiniBand의 RDMA(Remote Direct Memory Access)를 Storm에 적용하는 설계안을 제시함으로써 CPU 과부하 문제를 해결한다.

Apache Storm based Query Filtering System for Multivariate Data Streams (다변량 데이터 스트림을 위한 아파치 스톰 기반 질의 필터링 시스템)

  • Kim, Youngkuk;Son, Siwoon;Moon, Yang-Sae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.561-564
    • /
    • 2018
  • 최근 빠르게 발생하는 빅데이터 스트림이 다양한 분야에서 활용되고 있다. 이러한 빅데이터 전체를 수집하고 처리하는 것은 매우 비경제적이므로, 데이터 스트림 중 필요한 데이터를 걸러내는 필터링 과정이 필요하다. 본 논문에서는 아파치 스톰(Apache Storm)을 사용하여 데이터 스트림의 질의 필터링 시스템을 구축한다. 스톰은 대용량 데이터 스트림을 처리하기 위한 실시간 분산 병렬 처리 프레임워크이다. 하지만, 스톰은 입력 데이터 구조나 알고리즘 변경 시, 코드의 수정과 재배포, 재시작 등이 필요하다. 따라서, 본 논문에서는 이 같은 문제를 해결하기 위해 아파치 카프카(Apache Kafka)를 사용하여 데이터 수집 모듈과 스톰의 처리 모듈을 분리함으로써 시스템의 가용성을 크게 높인다. 또한, 시스템을 웹 기반 클라이언트-서버 모델로 구현하여 사용자가 언제 어디에서든 질의 필터링 시스템을 사용할 수 있게 하며, 웹 클라이언트를 통해 입력한 질의를 자동적 분석하는 쿼리 파서를 구현하여 별도의 프로그램의 수정 없이 질의 필터링을 적용할 수 있다.

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics (빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터 소스에서 데이터 추출)

  • Nichie, Aaron;Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.7
    • /
    • pp.1277-1282
    • /
    • 2016
  • The term "Big Data" has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.

A Human Movement Stream Processing System for Estimating Worker Locations in Shipyards

  • Duong, Dat Van Anh;Yoon, Seokhoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.135-142
    • /
    • 2021
  • Estimating the locations of workers in a shipyard is beneficial for a variety of applications such as selecting potential forwarders for transferring data in IoT services and quickly rescuing workers in the event of industrial disasters or accidents. In this work, we propose a human movement stream processing system for estimating worker locations in shipyards based on Apache Spark and TensorFlow serving. First, we use Apache Spark to process location data streams. Then, we design a worker location prediction model to estimate the locations of workers. TensorFlow serving manages and executes the worker location prediction model. When there are requirements from clients, Apache Spark extracts input data from the processed data for the prediction model and then sends it to TensorFlow serving for estimating workers' locations. The worker movement data is needed to evaluate the proposed system but there are no available worker movement traces in shipyards. Therefore, we also develop a mobility model for generating the workers' movements in shipyards. Based on synthetic data, the proposed system is evaluated. It obtains a high performance and could be used for a variety of tasksin shipyards.

A Deep Learning Approach for Intrusion Detection

  • Roua Dhahbi;Farah Jemili
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.89-96
    • /
    • 2023
  • Intrusion detection has been widely studied in both industry and academia, but cybersecurity analysts always want more accuracy and global threat analysis to secure their systems in cyberspace. Big data represent the great challenge of intrusion detection systems, making it hard to monitor and analyze this large volume of data using traditional techniques. Recently, deep learning has been emerged as a new approach which enables the use of Big Data with a low training time and high accuracy rate. In this paper, we propose an approach of an IDS based on cloud computing and the integration of big data and deep learning techniques to detect different attacks as early as possible. To demonstrate the efficacy of this system, we implement the proposed system within Microsoft Azure Cloud, as it provides both processing power and storage capabilities, using a convolutional neural network (CNN-IDS) with the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library. We study the performance of the model in two categories of classification (binary and multiclass) using CSE-CIC-IDS2018 dataset. Our system showed a great performance due to the integration of deep learning technique and Apache Spark engine.