• Title/Summary/Keyword: Data Streams

Search Result 821, Processing Time 0.039 seconds

Data Stream Storing Techniques for Supporting Hybrid Query (하이브리드 질의를 위한 데이터 스트림 저장 기술)

  • Shin, Jae-Jyn;You, Byeong-Seob;Eo, Sang-Hun;Lee, Dong-Wook;Bae, Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.11
    • /
    • pp.1384-1397
    • /
    • 2007
  • This paper proposes fast storage techniques for hybrid query of data streams. DSMS(Data Stream Management System) have been researched for processing data streams that have busting income. To process hybrid query that retrieve both current incoming data streams and past data streams data streams have to be stored into disk. But due to fast input speed of data stream and memory and disk space limitation, the main research is not about querying to stored data streams but about querying to current incoming data streams. Proposed techniques of this paper use circular buffer for maximizing memory utility and for make non blocking insertion possible. Data in a disk is compressed to maximize the number of data in the disk. Through experiences, proposed technique show that bursting insertion is stored fast.

  • PDF

EXTENDED ONLINE DIVISIVE AGGLOMERATIVE CLUSTERING

  • Musa, Ibrahim Musa Ishag;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.406-409
    • /
    • 2008
  • Clustering data streams has an importance over many applications like sensor networks. Existing hierarchical methods follow a semi fuzzy clustering that yields duplicate clusters. In order to solve the problems, we propose an extended online divisive agglomerative clustering on data streams. It builds a tree-like top-down hierarchy of clusters that evolves with data streams using geometric time frame for snapshots. It is an enhancement of the Online Divisive Agglomerative Clustering (ODAC) with a pruning strategy to avoid duplicate clusters. Our main features are providing update time and memory space which is independent of the number of examples on data streams. It can be utilized for clustering sensor data and network monitoring as well as web click streams.

  • PDF

Predictive Memory Allocation over Skewed Streams

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.2
    • /
    • pp.199-202
    • /
    • 2009
  • Adaptive memory management is a serious issue in data stream management. Data stream differ from the traditional stored relational model in several aspect such as the stream arrives online, high volume in size, skewed data distributions. Data skew is a common property of massive data streams. We propose the predicted allocation strategy, which uses predictive processing to cope with time varying data skew. This processing includes memory usage estimation and indexing with timestamp. Our experimental study shows that the predictive strategy reduces both required memory space and latency time for skewed data over varying time.

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

Application of Grouping Method to select Priority Restoration Streams in Geumgang Watershed based on Analysis of Pollution Factors (하천수질 오염요소 분석을 근거로 금강수계의 우선정비 대상하천 선정을 위한 집단화 기법적용)

  • Lee, Sang Ho;Hwang, Jeong Jae
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.27 no.5
    • /
    • pp.661-669
    • /
    • 2013
  • River-water quality has been greatly improved during past several decades with the extraordinary expansion for the wastewater treatment capacities by the government. Research aims to select the priority restoration streams based on the chronicle data for tributaries in Geumgang watershed as the main stream area in the Chungchungnamdo province. The quality of BOD, phosphorus and percent of sewered population on 15 branch streams were compared by the grouping methods. The results of group D streams by category I that exceed 3.0 mg/L for BOD and 0.1 mg/L for phosphorus were Seuksung, Ganggyung and Bangchuk stream. The results of group D streams by category II that exceed 3.0 mg/L for BOD and less than 63.5 % of average percent of sewered population were Ganggyung, Gilsan, Bangchuk and Seuksung stream. The final results of selected streams drawn by the chronicle data which exceeded the standard quality and lower than the average percent of sewered population were Seoksung, Gangeyung and Bangchuk stream. The pollution of rivers in the down streams were more serious than in the upper streams. Their watersheds have to be improved river water quality, especially to extend sewer systems as well as wastewater treatment facilities.

A holistic distributed clustering algorithm based on sensor network (센서 네트워크 기반의 홀리스틱 분산 클러스터링 알고리즘)

  • Chen Ping;Kee-Wook Rim;Nam Ji-Yeun;Lee KyungOh
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.874-877
    • /
    • 2008
  • Nowadays the existing data processing systems can only support some simple query for sensor network. It is increasingly important to process the vast data streams in sensor network, and achieve effective acknowledges for users. In this paper, we propose a holistic distributed k-means algorithm for sensor network. In order to verify the effectiveness of this method, we compare it with central k-means algorithm to process the data streams in sensor network. From the evaluation experiments, we can verify that the proposed algorithm is highly capable of processing vast data stream with less computation time. This algorithm prefers to cluster the data streams at the distributed nodes, and therefore it largely reduces redundant data communications compared to the central processing algorithm.

Finding high utility old itemsets in web-click streams (웹 클릭 스트림에서 고유용 과거 정보 탐색)

  • Chang, Joong-Hyuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.4
    • /
    • pp.521-528
    • /
    • 2016
  • Web-based services are used widely in many computer application fields due to the increasing use of PCs and mobile devices. Accordingly, topics on the analysis of access logs generated in the application fields have been researched actively to support personalized services in the field, and analyzing techniques based on the weight differentiation of information in access logs have been proposed. This paper outlines an analysis technique for web-click streams, which is useful for finding high utility old item sets in web-click streams, whose data elements are generated at a rapid rate. Using the technique, interesting information can be found, which is difficult to find in conventional techniques for analyzing web-click streams and is used effectively in target marketing. The proposed technique can be adapted widely to analyzing the data generated in a range of computing application fields, such as IoT environments, bio-informatics, etc., which generated data as a form of data streams.

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences (단일 스캔을 통한 웹 방문 패턴의 탐색 기법)

  • Kim, Nak-Min;Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.221-227
    • /
    • 2010
  • Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

Transformation of Continuous Aggregation Join Queries over Data Streams

  • Tran, Tri Minh;Lee, Byung-Suk
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.27-58
    • /
    • 2009
  • Aggregation join queries are an important class of queries over data streams. These queries involve both join and aggregation operations, with window-based joins followed by an aggregation on the join output. All existing research address join query optimization and aggregation query optimization as separate problems. We observe that, by putting them within the same scope of query optimization, more efficient query execution plans are possible through more versatile query transformations. The enabling idea is to perform aggregation before join so that the join execution time may be reduced. There has been some research done on such query transformations in relational databases, but none has been done in data streams. Doing it in data streams brings new challenges due to the incremental and continuous arrival of tuples. These challenges are addressed in this paper. Specifically, we first present a query processing model geared to facilitate query transformations and propose a query transformation rule specialized to work with streams. The rule is simple and yet covers all possible cases of transformation. Then we present a generic query processing algorithm that works with all alternative query execution plans possible with the transformation, and develop the cost formulas of the query execution plans. Based on the processing algorithm, we validate the rule theoretically by proving the equivalence of query execution plans. Finally, through extensive experiments, we validate the cost formulas and study the performances of alternative query execution plans.

A transcode scheduling technique to reduce early-stage delay time in playing multimedia in mobile terminals (이동 단말기에서 멀티미디어 연출시 최초 재생 지연시간을 줄이기 위한 트랜스코드 스케줄링 기법)

  • Hong, Maria;Yoon, Joon-Sung;Lim, Young-Hwan
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.695-704
    • /
    • 2003
  • This paper proposes a new scheduling technique to play multimedia data streams in mobile terminals. The paper explores the characteristics of multimedia data streams , firstly. On basis of these characteristics, selection of specific data stream can be possible as well as transcoding protest. Our approach aims at reducing the early-stage delay time more effectively since it makes possible to select and transcodes some specific streams by employing a selection policy rather than transcoding all streams in the playing process Thus, this paper suggests a stream selection policy for the transcoding based on EPOB (End Point of Over Bandwidth). It aims to lower the required bandwidth of multimedia streams than the network bandwidth level and also to minimize early-stage delay time for multimedia streams, which is to be played in mobile terminals.