• Title/Summary/Keyword: 스트림 빅데이터

Search Result 41, Processing Time 0.025 seconds

Squall: A Real-time Big Data Processing Framework based on TMO Model for Real-time Events and Micro-batch Processing (Squall: 실시간 이벤트와 마이크로-배치의 동시 처리 지원을 위한 TMO 모델 기반의 실시간 빅데이터 처리 프레임워크)

  • Son, Jae Gi;Kim, Jung Guk
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.84-94
    • /
    • 2017
  • Recently, the importance of velocity, one of the characteristics of big data (5V: Volume, Variety, Velocity, Veracity, and Value), has been emphasized in the data processing, which has led to several studies on the real-time stream processing, a technology for quick and accurate processing and analyses of big data. In this paper, we propose a Squall framework using Time-triggered Message-triggered Object (TMO) technology, a model that is widely used for processing real-time big data. Moreover, we provide a description of Squall framework and its operations under a single node. TMO is an object model that supports the non-regular real-time processing method for certain conditions as well as regular periodic processing for certain amount of time. A Squall framework can support the real-time event stream of big data and micro-batch processing with outstanding performances, as compared to Apache storm and Spark Streaming. However, additional development for processing real-time stream under multiple nodes that is common under most frameworks is needed. In conclusion, the advantages of a TMO model can overcome the drawbacks of Apache storm or Spark Streaming in the processing of real-time big data. The TMO model has potential as a useful model in real-time big data processing.

A Design on a Streaming Big Data Processing System (스트리밍 빅데이터 처리 시스템 설계)

  • Kim, Sungsook;Kim, GyungTae;Park, Kiejin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.99-101
    • /
    • 2015
  • 현재 다양한 센서 기기에서 쏟아지는 대용량의 정형/비정형의 스트림 데이터의 경우 기존의 단일 스트리밍 처리 시스템 만으로 처리하기에는 한계가 있다. 클러스터의 디스크가 아닌 메모리들을 사용하여 대용량 데이터 처리를 할 수 있는 Spark 는 분산 처리 임에도 불구하고 강력한 데이터 일관성과 실시간성을 확보할 수 있는 플랫폼이다. 본 연구에서는 대용량 스트림 데이터 분석 시 발생하는 메모리 공간 부족과 실시간 병렬 처리 문제를 해결하고자, 클러스터의 메모리를 이용하여 대용량 데이터의 분산 처리와 스트림 실시간 처리를 동시에 할 수 있도록 구성하였다. 실험을 통하여, 기존 배치 처리 방식과 제안 시스템의 성능 차이를 확인 할 수 있었다.

A Study on InfiniBand Network Low-Latency Assure for High Speed Processing of Exponential Transaction (폭증스트림 고속 처리를 위한 InfiniBand 환경에서의 Low-Latency 보장 연구)

  • Jung, Hyedong;Hong, Jinwoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.259-261
    • /
    • 2013
  • 금융 IT와 같은 분야에서는 빅데이터의 큰 특징 중 하나인 Velocity의 개선이 가장 큰 문제이다. 이는 산업의 특성상 승자가 시장을 독식하는 구조로 0.1 초라도 빠른 시스템 속도를 확보하면 시장 경쟁력이 매우 크기 때문이다. 비단 금융 IT 뿐만 아니라 다른 산업들도 최근 보다 빠른 속도의 데이터 처리에 매우 민감하게 반응하는 환경으로 변화하고 있으므로 이에 대한 솔루션이 필요하며 본 연구에서는 폭증스트림의 고속처리를 위한 Low-Latency에 대한 다양한 실험과 환경 구축을 통해 빅데이터의 Velocity 문제를 해결할 수 있는 방안을 제시한다.

Study on the Sensor Gateway for Receive the Real-Time Big Data in the IoT Environment (IoT 환경에서 실시간 빅 데이터 수신을 위한 센서 게이트웨이에 관한 연구)

  • Shin, Seung-Hyeok
    • Journal of Advanced Navigation Technology
    • /
    • v.19 no.5
    • /
    • pp.417-422
    • /
    • 2015
  • A service size of the IoT environment is determined by the number of sensors. The number of sensors increase means increases the amount of data generated by the IoT environment. There are studies to reliably operate a network for research and operational dynamic buffer for data when network congestion control congestion in the network environment. There are also studies of the stream data that has been processed in the connectionless network environment. In this study, we propose a sensor gateway for processing big data of the IoT environment. For this, review the RESTful for designing a sensor middleware, and apply the double-buffer algorithm to process the stream data efficiently. Finally, it generates a big data traffic using the MJpeg stream that is based on the HTTP protocol over TCP to evaluate the proposed system, with open source media player VLC using the image received and compare the throughput performance.

In-memory Compression Scheme Based on Incremental Frequent Patterns for Graph Streams (그래프 스트림 처리를 위한 점진적 빈발 패턴 기반 인-메모리 압축 기법)

  • Lee, Hyeon-Byeong;Shin, Bo-Kyoung;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.35-46
    • /
    • 2022
  • Recently, with the development of network technologies, as IoT and social network service applications have been actively used, a lot of graph stream data is being generated. In this paper, we propose a graph compression scheme that considers the stream graph environment by applying graph mining to the existing compression technique, which has been focused on compression rate and runtime. In this paper, we proposed Incremental frequent pattern based compression technique for graph streams. Since the proposed scheme keeps only the latest reference patterns, it increases the storage utilization and improves the query processing time. In order to show the superiority of the proposed scheme, various performance evaluations are performed in terms of compression rate and processing time compared to the existing method. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.

Efficient Processing of an Aggregate Query Stream in MapReduce (맵리듀스에서 집계 질의 스트림의 효율적인 처리 기법)

  • Choi, Hyunjean;Lee, Ki Yong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.2
    • /
    • pp.73-80
    • /
    • 2014
  • MapReduce is a widely used programming model for analyzing and processing Big data. Aggregate queries are one of the most common types of queries used for analyzing Big data. In this paper, we propose an efficient method for processing an aggregate query stream, where many concurrent users continuously issue different aggregate queries on the same data. Instead of processing each aggregate query separately, the proposed method processes multiple aggregate queries together in a batch by a single, optimized MapReduce job. As a result, the number of queries processed per unit time increases significantly. Through various experiments, we show that the proposed method improves the performance significantly compared to a naive method.

Development of CEP-based Real Time Analysis System Using Hospital ERP System (병원 ERP시스템을 적용한 CEP 기반 실시간 분석시스템 개발)

  • Kim, Mi-Jin;Yu, Yun-Sik;Seo, Young-Woo;Jang, Jong-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.290-293
    • /
    • 2015
  • 개개인의 데이터가 비즈니스적으로 중요하지 않을 수 있지만, 대량으로 모으면 그 안에 숨겨진 새로운 정보를 발견할 가능성이 있는 데이터의 집합체로 빅데이터 분석 활용 사례는 점차 늘어나는 추세이다. 빅데이터 분석 기술 중 전통적인 데이터 분석방법인 하둡(Hadoop)은 예전부터 현재에 이르기까지 정형 비정형 빅데이터 분석에 널리 사용되고 있는 기술이다. 하지만 하둡은 배치성 처리 시스템으로 데이터가 많아질수록 응답 지연이 발생할 가능성이 높아, 현재 기업 경영환경과 시장환경에 대한 엄청난 양의 고속 이벤트 데이터에 대한 실시간 분석이 어려운 상황이다. 본 논문에서는 급변하는 비즈니스 환경에 대한 대안으로 오픈소스 CEP(Complex Event Processing)기반 기술을 사용하여 초당 수백에서 수십만건 이상의 이벤트 스트림을 실시간으로 지연 없이 분석가능하게 하는 실시간 분석 시스템을 개발하여 병원 ERP시스템에 적용하였다.

  • PDF

Suggestions on how to convert official documents to Machine Readable (공문서의 기계가독형(Machine Readable) 전환 방법 제언)

  • Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.67
    • /
    • pp.99-138
    • /
    • 2021
  • In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.

A Study on the Data Collection Methods based Hadoop Distributed Environment (하둡 분산 환경 기반의 데이터 수집 기법 연구)

  • Jin, Go-Whan
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.1-6
    • /
    • 2016
  • Many studies have been carried out for the development of big data utilization and analysis technology recently. There is a tendency that government agencies and companies to introduce a Hadoop of a processing platform for analyzing big data is increasing gradually. Increased interest with respect to the processing and analysis of these big data collection technology of data has become a major issue in parallel to it. However, study of the collection technology as compared to the study of data analysis techniques, it is insignificant situation. Therefore, in this paper, to build on the Hadoop cluster is a big data analysis platform, through the Apache sqoop, stylized from relational databases, to collect the data. In addition, to provide a sensor through the Apache flume, a system to collect on the basis of the data file of the Web application, the non-structured data such as log files to stream. The collection of data through these convergence would be able to utilize as a basic material of big data analysis.

The Method for Extracting Meaningful Patterns Over the Time of Multi Blocks Stream Data (시간의 흐름과 위치 변화에 따른 멀티 블록 스트림 데이터의 의미 있는 패턴 추출 방법)

  • Cho, Kyeong-Rae;Kim, Ki-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.10
    • /
    • pp.377-382
    • /
    • 2014
  • Analysis techniques of the data over time from the mobile environment and IoT, is mainly used for extracting patterns from the collected data, to find meaningful information. However, analytical methods existing, is based to be analyzed in a state where the data collection is complete, to reflect changes in time series data associated with the passage of time is difficult. In this paper, we introduce a method for analyzing multi-block streaming data(AM-MBSD: Analysis Method for Multi-Block Stream Data) for the analysis of the data stream with multiple properties, such as variability of pattern and large capacitive and continuity of data. The multi-block streaming data, define a plurality of blocks of data to be continuously generated, each block, by using the analysis method of the proposed method of analysis to extract meaningful patterns. The patterns that are extracted, generation time, frequency, were collected and consideration of such errors. Through analysis experiments using time series data.