• Title/Summary/Keyword: Large Volume Data Stream

Search Result 33, Processing Time 0.024 seconds

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

Efficient Data Management in RFID Applications

  • Cho, Yong-Jun;Bok, Kyoung-Soo;Park, Yong-Hun;Park, Hyeong-Soon;Park, Jun-Ho;Kang, Tae-Ho;Kim, Hak-Yong;Yoo, Jae-Soo
    • International Journal of Contents
    • /
    • v.5 no.1
    • /
    • pp.46-50
    • /
    • 2009
  • Logistics is in the limelight as one of a variety of RFID applications. The RFID technology is actively being applied to improve the competitiveness power of companies through the synthetic management of products and information. The RFID system generates large volume of stream data. It has problems which occur waste of storage and long processing time when storing large data and processing queries. Recently, many studies have been done to solve the problems which are generated in RFID system. In this thesis, we propose an efficient data management scheme for path queries and containment queries which are occurred frequently. The proposed data management scheme considers a change of the containment of products during a transport and supports a path of changed products by representing a path of various containments. Also, the compression utilizing the structure of supply chain reduces the stored data volumes. In order to show the superiority of our approach, we compare it with the existing schemes. As a result, our experimental results show that our scheme outperforms the existing scheme in terms of storage efficiency and query processing time.

Design of Query Processing based on Profiles for Efficient Searching Events (효율적인 이벤트 검색을 위한 프로파일 기반 질의 처리 방법)

  • Kim, ChangHoon;Kim, TaeYoung;Kim, JongMin;Ban, ChaeHoon;Kim, DongHyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.249-252
    • /
    • 2009
  • Recently, it is possible for users to acquire necessary data easily as the various schemes of the searching information are developed. Since these data rise continuously like stream data, it is required to extract the appropriate data for the user's needs from the mass data on the internet. In the traditional scheme, they are acquired by processing the user queries after the occurred data are stored at a database. However, it is inefficient to process the user queries over the large volume of continuous data by using the traditional scheme. In this paper, we propose the query processing scheme to extract the data efficiently for the user requirements from the large volume of continuous data. On the proposed scheme, we present the Event-Profile Model to define the data occurrence on the internet as the events and the user's requirements as the profiles. We also show the filtering scheme to process the events and the profiles efficiently.

  • PDF

The Processing Method of Stream Data in the Small-size Operating System (소규모 운영체제에서의 스트림데이터 처리기법)

  • Kim, Jin-Deog
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.871-874
    • /
    • 2007
  • Stream data need a efficient data management with high reliability and real time processing. The characteristics of these data are a large volume, a short report interval and asynchronous report time. The typical queries of these systems consist of the current query to search the latest signal value, the snapshot query to search the signal value of a past time, the historical query to search the signal value of a past time to current. This paper proposes the efficient method to manage the above signals by using a file structured database in QNX operating systems. The query model to accommodate various query for stream data is proposed. The proposed methods are applied to reactive protection system to verify their usefulness. The COM(Cabinet Operator Module) based on the QNX employs file database that adopts a delta version and a buffering method for the resource limit of a small storage and a low computing power.

  • PDF

Design and Implementation of Storage Manager for Real-Time Compressed Storing of Large Volume Datastream (대용량 데이터스트림 실시간 압축 저장을 위한 저장관리자 설계 및 구현)

  • Lee, Dong-Wook;Baek, Sung-Ha;Kim, Gyoung-Bae;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.3
    • /
    • pp.31-39
    • /
    • 2009
  • Requirement level regarding processing and managing real-time datastream in an ubiquitous environment is increased. Especially, due to the unbounded, high frequency and real-time characteristics of datastream, development of specialized stroge manager for DSMS is necessary to process such datastream. Existing DSMS, e.g. Coral8, can support datastream processing but it is not scalable and cannot perform well when handling large-volume real-time datastream, e.g. 100 thousand over per second. In the case of Oracle10g, which is generally used in related field, it supports storing and management processing. However, it does not support real-time datastream processing. In this paper, we propose specialized storage manager of DSMS for real-time compressed storing on semiconductor or LCD production facility of Samsung electronics, Hynix and HP. Hynix and HP. This paper describes the proposed system architecture and major components and show better performance of the proposed system compared with similar systems in the experiment section.

  • PDF

m-Health System for Processing of Clinical Biosignals based Android Platform (안드로이드 플랫폼 기반의 임상 바이오신호 처리를 위한 모바일 헬스 시스템)

  • Seo, Jung-Hee;Park, Hung-Bog
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.97-106
    • /
    • 2012
  • Management of biosignal data in mobile devices causes many problems in real-time transmission of large volume of multimedia data or storage devices. Therefore, this research paper intends to suggest an m-Health system, a clinical data processing system using mobile in order to provide quick medical service. This system deployed health system on IP network, compounded outputs from many bio sensing in remote sites and performed integrated data processing electronically on various bio sensors. The m-health system measures and monitors various biosignals and sends them to data servers of remote hospitals. It is an Android-based mobile application which patients and their family and medical staff can use anywhere anytime. Medical staff access patient data from hospital data servers and provide feedback on medical diagnosis and prescription to patients or users. Video stream for patient monitoring uses a scalable transcoding technique to decides data size appropriate for network traffic and sends video stream, remarkably reducing loads of mobile systems and networks.

An Efficient RFID Business Event Detection Method Using Preprocessing Filtering Scheme (전처리 필터링을 적용한 효율적인 RFID 비즈니스 이벤트 검출 기법)

  • Rho, Jin-Seok;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.35 no.2
    • /
    • pp.143-154
    • /
    • 2008
  • RFID events are large volume of stream data which come out continuously. Many studies have been done to detect a business event in RFID stream. However, the existing methods have many problems which increase unnecessary operations when business events do not satisfy minimum conditions. In this paper, to remove unnecessary operations, we define the minimum condition of business events and propose an efficient method that detects business events only when the minimum condition is satisfied. To check the minimum condition of business events, we register business queries in a query index. We detect business events using the query index and bitmap. It is shown through various experiment that the proposed method outperforms the existing methods.

Design and Implementation of the Notification System using Event-Profile Filtering (이벤트-프로파일 여과를 이용한 통지시스템의 설계 및 구현)

  • Ban, Chae-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.129-132
    • /
    • 2010
  • Users can obtain useful information from large of data because of development of internet. Since these data rise continuously like stream data, it is required to extract the appropriate information efficiently for the user's needs. In the traditional scheme, they are acquired by processing the user queries after the occurred data are stored at a database. However, it is inefficient to process the user queries over the large volume of continuous data by using the traditional scheme. In this paper, we propose the Event-Profile Model to define the data occurrence on the internet as the events and the user's requirements as the profiles. We also propose and implement the filtering scheme to process the events and the profiles efficiently. We evaluate the performance of the proposed scheme and our experiments show that the new scheme outperforms the other on various dataset.

  • PDF

A Query Processing Technique for XML Fragment Stream using XML Labeling (XML 레이블링을 이용한 XML 조각 스트림에 대한 질의 처리 기법)

  • Lee, Sang-Wook;Kim, Jin;Kang, Hyun-Chul
    • Journal of KIISE:Databases
    • /
    • v.35 no.1
    • /
    • pp.67-83
    • /
    • 2008
  • In order to realize ubiquitous computing, it is essential to efficiently use the resources and the computing power of mobile devices. Among others, memory efficiency, energy efficiency, and processing efficiency are required in executing the softwares embedded in mobile devices. In this paper, query processing over XML data in a mobile device where resources are limited is addressed. In a device with limited amount of memory, the techniques of XML. stream query processing need to be employed to process queries over a large volume of XML data Recently, a technique Galled XFrag was proposed whereby XML data is fragmented with the hole-filler model and streamed in fragments for processing. With XFrag, query processing is possible in the mobile device with limited memory without reconstructing the XML data out of its fragment stream. With the hole-filler model, however, memory efficiency is not high because the additional information on holes and fillers needs to be stored. In this paper, we propose a new technique called XFLab whereby XML data is fragmented with the XML labeling scheme which is for representing the structural relationship in XML data, and streamed in fragments for processing. Through implementation and experiments, XML showed that our XFLab outperformed XFrag both in memory usage and processing time.

XML Fragmentation for Resource-Efficient Query Processing over XML Fragment Stream (자원 효율적인 XML 조각 스트림 질의 처리를 위한 XML 분할)

  • Kim, Jin;Kang, Hyun-Chul
    • The KIPS Transactions:PartD
    • /
    • v.16D no.1
    • /
    • pp.27-42
    • /
    • 2009
  • In realizing ubiquitous computing, techniques of efficiently using the limited resource at client such as mobile devices are required. With a mobile device with limited amount of memory, the techniques of XML stream query processing should be employed to process queries over a large volume of XML data. Recently, several techniques were proposed which fragment XML documents into XML fragments and stream them for query processing at client. During query processing, there could be great difference in resource usage (query processing time and memory usage) depending on how the source XML documents are fragmented. As such, an efficient fragmentation technique is needed. In this paper, we propose an XML fragmentation technique whereby resource efficiency in query processing at client could be enhanced. For this, we first present a cost model of query processing over XML fragment stream. Then, we propose an algorithm for resource-efficient XML fragmentation. Through implementation and experiments, we showed that our fragmentation technique outperformed previous techniques both in processing time and memory usage. The contribution of this paper is to have made the techniques of query processing over XML fragment stream more feasible for practical use.