• 제목/요약/키워드: Data Stream Processing

검색결과 445건 처리시간 0.03초

Scalable Big Data Pipeline for Video Stream Analytics Over Commodity Hardware

  • Ayub, Umer;Ahsan, Syed M.;Qureshi, Shavez M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권4호
    • /
    • pp.1146-1165
    • /
    • 2022
  • A huge amount of data in the form of videos and images is being produced owning to advancements in sensor technology. Use of low performance commodity hardware coupled with resource heavy image processing and analyzing approaches to infer and extract actionable insights from this data poses a bottleneck for timely decision making. Current approach of GPU assisted and cloud-based architecture video analysis techniques give significant performance gain, but its usage is constrained by financial considerations and extremely complex architecture level details. In this paper we propose a data pipeline system that uses open-source tools such as Apache Spark, Kafka and OpenCV running over commodity hardware for video stream processing and image processing in a distributed environment. Experimental results show that our proposed approach eliminates the need of GPU based hardware and cloud computing infrastructure to achieve efficient video steam processing for face detection with increased throughput, scalability and better performance.

Dynamic Load Management Method for Spatial Data Stream Processing on MapReduce Online Frameworks (맵리듀스 온라인 프레임워크에서 공간 데이터 스트림 처리를 위한 동적 부하 관리 기법)

  • Jeong, Weonil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제19권8호
    • /
    • pp.535-544
    • /
    • 2018
  • As the spread of mobile devices equipped with various sensors and high-quality wireless network communications functionsexpands, the amount of spatio-temporal data generated from mobile devices in various service fields is rapidly increasing. In conventional research into processing a large amount of real-time spatio-temporal streams, it is very difficult to apply a Hadoop-based spatial big data system, designed to be a batch processing platform, to a real-time service for spatio-temporal data streams. This paper extends the MapReduce online framework to support real-time query processing for continuous-input, spatio-temporal data streams, and proposes a load management method to distribute overloads for efficient query processing. The proposed scheme shows a dynamic load balancing method for the nodes based on the inflow rate and the load factor of the input data based on the space partition. Experiments show that it is possible to support efficient query processing by distributing the spatial data stream in the corresponding area to the shared resources when load management in a specific area is required.

Frequent Items Mining based on Regression Model in Data Streams (스트림 데이터에서 회귀분석에 기반한 빈발항목 예측)

  • Lee, Uk-Hyun
    • The Journal of the Korea Contents Association
    • /
    • 제9권1호
    • /
    • pp.147-158
    • /
    • 2009
  • Recently, the data model in stream data environment has massive, continuous, and infinity properties. However the stream data processing like query process or data analysis is conducted using a limited capacity of disk or memory. In these environment, the traditional frequent pattern discovery on transaction database can be performed because it is difficult to manage the information continuously whether a continuous stream data is the frequent item or not. In this paper, we propose the method which we are able to predict the frequent items using the regression model on continuous stream data environment. We can use as a prediction model on indefinite items by constructing the regression model on stream data. We will show that the proposed method is able to be efficiently used on stream data environment through a variety of experiments.

Stream Data Analysis of the Weather on the Location using Principal Component Analysis (주성분 분석을 이용한 지역기반의 날씨의 스트림 데이터 분석)

  • Kim, Sang-Yeob;Kim, Kwang-Deuk;Bae, Kyoung-Ho;Ryu, Keun-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • 제28권2호
    • /
    • pp.233-237
    • /
    • 2010
  • The recent advance of sensor networks and ubiquitous techniques allow collecting and analyzing of the data which overcome the limitation imposed by time and space in real-time for making decisions. Also, analysis and prediction of collected data can support useful and necessary information to users. The collected data in sensor networks environment is the stream data which has continuous, unlimited and sequential properties. Because of the continuous, unlimited and large volume properties of stream data, managing stream data is difficult. And the stream data needs dynamic processing method because of the memory constraint and access limitation. Accordingly, we analyze correlation stream data using principal component analysis. And using result of analysis, it helps users for making decisions.

Attribute-based Approach for Multiple Continuous Queries over Data Streams (데이터 스트림 상에서 다중 연속 질의 처리를 위한 속성기반 접근 기법)

  • Lee, Hyun-Ho;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • 제14D권5호
    • /
    • pp.459-470
    • /
    • 2007
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Query processing for such a data stream should also be continuous and rapid, which requires strict time and space constraints. In most DSMS(Data Stream Management System), the selection predicates of continuous queries are grouped or indexed to guarantee these constraints. This paper proposes a new scheme tailed an ASC(Attribute Selection Construct) that collectively evaluates selection predicates containing the same attribute in multiple continuous queries. An ASC contains valuable information, such as attribute usage status, partially pre calculated matching results and selectivity statistics for its multiple selection predicates. The processing order of those ASC's that are corresponding to the attributes of a base data stream can significantly influence the overall performance of multiple query evaluation. Consequently, a method of establishing an efficient evaluation order of multiple ASC's is also proposed. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.

A Study on the Design an Implementation Method of Computational Object Supporting CM Stream Interface in the Distributed Environment (분산 환경에서 CM 스트림 인터페이스를 지원하는 계산 객체의 설계 및 구현 방안 연구)

  • Song, Byeong-Gwon;Jin, Myeong-Suk;Kim, Geon-Ung
    • The Transactions of the Korea Information Processing Society
    • /
    • 제7권6호
    • /
    • pp.1785-1794
    • /
    • 2000
  • This paper presents a computational object model supporting CM(Continuous Media) stream interfaces including QoS(Quality of Service) required in the distributed application method for the proposed stream interface including QoS. A stream interface consists of a data channel and a control channel. In this paper, the CORBA supporting communication channel is used as the control channel and various transport protocols can be used as the dta channel of the stream interface. Also, specifications of the application QoS are included in stream interface specification. In implementation, FIFO queues and timers are used to support transmission rate, delay and jitter control mechanisms of he stream interface.

  • PDF

The Design of a Multiplexer for Multiview Image Processing

  • Kim, Do-Kyun;Lee, Yong-Joo;Koo, Gun-Seo;Lee, Yong-Surk
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -1
    • /
    • pp.682-685
    • /
    • 2002
  • In this paper, we defined necessary operations and functional blocks of a multiplexer for 3-D video systems and present our multiplexer design. We adopted the ITU-T's recommendation(H.222.0) to define the operations and functions of the multiplexer and explained the data structures and details of the design for multiview image processing. The data structure of TS(Transport Stream) and PES (Packetized Elementary Stream) in ITU-T Recommendation H.222.0 does not fit our multiview image processing system, because this recommendation is fur wide scope of transmission of non-telephone signals. Therefore, we modified these TS and PES stream structures. The TS is modified to DSS(3D System Stream) and PES is modified to SPDU(DSS Program Data Unit). We constructed the multiplexer through these modified DSS and SPDU. The number of multiview image channels is nine, and the image class employed is MPEG-2 SD(Standard Definition) level which requires a bandwidth of 2∼6 Mbps. The required clock speed should be faster than 54(= 6 ${\times}$ 9)㎒ which is the outer interface clock speed. The inside part of the multiplexer requires a clock speed of only 1/8 of 54㎒, since the inside part of the multiplexer operates by the unit of byte. we used ALTERA Quartus II and the FPGA verification for the simulation.

  • PDF

Hazelcast Vs. Ignite: Opportunities for Java Programmers

  • Maxim, Bartkov;Tetiana, Katkova;S., Kruglyk Vladyslav;G., Murtaziev Ernest;V., Kotova Olha
    • International Journal of Computer Science & Network Security
    • /
    • 제22권2호
    • /
    • pp.406-412
    • /
    • 2022
  • Storing large amounts of data has always been a big problem from the beginning of computing history. Big Data has made huge advancements in improving business processes by finding the customers' needs using prediction models based on web and social media search. The main purpose of big data stream processing frameworks is to allow programmers to directly query the continuous stream without dealing with the lower-level mechanisms. In other words, programmers write the code to process streams using these runtime libraries (also called Stream Processing Engines). This is achieved by taking large volumes of data and analyzing them using Big Data frameworks. Streaming platforms are an emerging technology that deals with continuous streams of data. There are several streaming platforms of Big Data freely available on the Internet. However, selecting the most appropriate one is not easy for programmers. In this paper, we present a detailed description of two of the state-of-the-art and most popular streaming frameworks: Apache Ignite and Hazelcast. In addition, the performance of these frameworks is compared using selected attributes. Different types of databases are used in common to store the data. To process the data in real-time continuously, data streaming technologies are developed. With the development of today's large-scale distributed applications handling tons of data, these databases are not viable. Consequently, Big Data is introduced to store, process, and analyze data at a fast speed and also to deal with big users and data growth day by day.

1H*-tree: An Improved Data Cube Structure for Multi-dimensional Analysis of Data Streams (1H*-tree: 데이터 스트림의 다차원 분석을 위한 개선된 데이터 큐브 구조)

  • XiangRui Chen;YuXiang Cheng;Yan Li;Song-Sun Shin;Dong-Wook Lee;Hae-Young Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2008년도 추계학술발표대회
    • /
    • pp.332-335
    • /
    • 2008
  • In this paper, based on H-tree, which is proposed as the basic data cube structure for multi-dimensional data stream analysis, we have done some analysis. We find there are a lot of redundant nodes in H-tree, and the tree-build method can be improved for saving not only memory, but also time used for inserting tuples. Also, to facilitate more fast and large amount of data stream analysis, which is very important for stream research, H*-tree is designed and developed. Our performance study compare the proposed H*-tree and H-tree, identify that H*-tree can save more memory and time during inserting data stream tuples.

Causality join query processing for data stream by spatio-temporal sliding window (시공간 슬라이딩윈도우기법을 이용한 데이터스트림의 인과관계 결합질의처리방법)

  • Kwon, O-Je;Li, Ki-Joune
    • Spatial Information Research
    • /
    • 제16권2호
    • /
    • pp.219-236
    • /
    • 2008
  • Data stream collected from sensors contain a large amount of useful information including causality relationships. The causality join query for data stream is to retrieve a set of pairs (cause, effect) from streams of data. A part of causality pairs may however be lost from the query result, due to the delay from sensors to a data stream management system, and the limited size of sliding windows. In this paper, we first investigate spatial, temporal, and spatio-temporal aspects of the causality join query for data stream. Second, we propose several strategies for sliding window management based on these observations. The accuracy of the proposed strategies is studied by intensive experiments, and the result shows that we improve the accuracy of causality join query in data stream from simple FIFO strategy.

  • PDF