• Title/Summary/Keyword: stream mining

Search Result 153, Processing Time 0.022 seconds

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

Assessment of Water Pollution by the discharged water of the Abandended Mine

  • Kim, Hee-Joung;Yang, Jae-E.;Lee, Jai-Young;Park, Beang-Kil;Choi, Sang-Il;Jun, Sang-Ho
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2004.04a
    • /
    • pp.167-174
    • /
    • 2004
  • Several metalliferous and coal mines, including Myungjin, Seojin and Okdong located at the upper watershed of Okdong stream, were abandoned or closed since 1988 due to the mining industry promotion policy and thus disposed an enormous amount of mining wastes without a proper treatment facilities, resulting in water pollution in the downstream areas. AMD and waste effluents from the closed coal mines were very strongly acidic showing pH ranges of 2.7 to 4.5 and had a high level of total dissolved solid (TDS) showing the ranges of 1,030 to 1,947 mg/L. Also heavy metal concentrations in these samples such as Fe, Cu, Cd and anion such as sulfate were very high. These parameters of AMD and effluents were considered to be highly polluted as compared to those in the main stream area of the Okdong river and be major pollutants for water and soil in tile downstream area. Pollution indices of the surface water at the upper stream of Okdong river where AMD of the abandoned coal mines was flowed into main stream were in the ranges of 16.3 to 47.1. On the other hand, those at the mid stream where effluents from tailing dams and coal mines flowed into main stream were in tile ranges of 10.6 to 19.5. However, those at the lower stream were ranged from 10.6 to 14.9 These results indicated that mining wastes such as AMD and effluents from the closed mines were tile major source to water pollution at the Okdong stream areas.

  • PDF

Mining Association Rules in Multidimensional Stream Data (다차원 스트림 데이터의 연관 규칙 탐사 기법)

  • Kim, Dae-In;Park, Joon;Kim, Hong-Ki;Hwang, Bu-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.13D no.6 s.109
    • /
    • pp.765-774
    • /
    • 2006
  • An association rule discovery, a technique to analyze the stored data in databases to discover potential information, has been a popular topic in stream data system. Most of the previous researches are concerned to single stream data. However, this approach may ignore in mining to multidimensional stream data. In this paper, we study the techniques discovering the association rules to multidimensional stream data. And we propose a AR-MS method reflecting the characteristics of stream data since make the summarization information by one data scan and discovering the association rules for significant rare data that appear infrequently in the database but are highly associated with specific event. Also, AR-MS method can discover the maximal frequent item of multidimensional stream data by using the summarization information. Through analysis and experiments, we show that AR-MS method is superior to other previous methods.

Design and Implementation of a USN Middleware for Context-Aware and Sensor Stream Mining

  • Jin, Cheng-Hao;Lee, Yang-Koo;Lee, Seong-Ho;Yun, Un-il;Ryu, Keun-Ho
    • Spatial Information Research
    • /
    • v.19 no.1
    • /
    • pp.127-133
    • /
    • 2011
  • Recently, with the advances in sensor techniques and net work computing, Ubiquitous Sensor Network (USN) has been received a lot of attentions from various communities. The sensor nodes distributed in the sensor network tend to continuously generate a large amount of data, which is called stream data. Sensor stream data arrives in an online manner so that it is characterized as high-speed, real-time and unbounded and it requires fast data processing to get the up-to-date results. The data stream has many application domains such as traffic analysis, physical distribution, U-healthcare and so on. Therefore, there is an overwhelming need of a USN middleware for processing such online stream data to provide corresponding services to diverse applications. In this paper, we propose a novel USN middleware which can provide users both context-aware service and meaningful sequential patterns. Our proposed USN middleware is mainly focused on location based applications which use stream location data. We also show the implementation of our proposed USN middleware. By using the proposed USN middleware, we can save the developing cost of providing context aware services and stream sequential patterns mainly in location based applications.

Mining highly attention itemsets using a two-way decay mechanism in data stream mining (데이터 스트림 마이닝에서 양방향 감쇠 기법을 활용한 고관심 정보 탐색)

  • Chang, Joong-Hyuk
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.1-9
    • /
    • 2015
  • In most techniques of information differentiating for data stream mining, they give larger weight to the information generated in recent compared to the old information. However, there can be important one among the old information. For example, in case of a person was a regular customer in a retail store but has not come to the store in recent, old information with the shopping record of the person can be importantly used in a target marketing for increasing sales. In this paper, highly attention itemsets(HAI) are defined, which mean the itemsets generated in the past frequently but not generated in recent. In addition, a twao-way decay mechanism and a data stream mining method for finding HAI are proposed.

Customized Digital TV System for Individuals/Communities based on Data Stream Mining (데이터 스트림 마이닝 기법을 적용한 개인/커뮤니티 맞춤형 Digital TV 시스템)

  • Shin, Se-Jung;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.17D no.6
    • /
    • pp.453-462
    • /
    • 2010
  • The switch from analog to digital broadcast television is extended rapidly. The DTV can offer multiple programming choices, interactive capabilities and so on. Moreover, with the spread of Internet, the information exchange between the communities is increasing, too. These facts lead to the new TV service environment which can offer customized TV programs to personal/community users. This paper proposes a 'Customized Digital TV System for Individuals/Communities based on Data Stream Mining' which can analyze user's pattern of TV watching behavior. Due to the characteristics of TV program data stream and EPG(electronic program guide), the data stream mining methods are employed in the proposed system. When a user is watching DTV, the proposed system can control the surrounding circumstances as using the user behavior profiles. Furthermore, the channel recommendation system on the smart phone environment is proposed to utilize the profiles widely.

A Real-Time Data Mining for Stream Data Sets (연속발생 데이터를 위한 실시간 데이터 마이닝 기법)

  • Kim Jinhwa;Min Jin Young
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.29 no.4
    • /
    • pp.41-60
    • /
    • 2004
  • A stream data is a data set that is accumulated to the data storage from a data source over time continuously. The size of this data set, in many cases. becomes increasingly large over time. To mine information from this massive data. it takes much resource such as storage, memory and time. These unique characteristics of the stream data make it difficult and expensive to use this large size data accumulated over time. Otherwise. if we use only recent or part of a whole data to mine information or pattern. there can be loss of information. which may be useful. To avoid this problem. we suggest a method that efficiently accumulates information. in the form of rule sets. over time. It takes much smaller storage compared to traditional mining methods. These accumulated rule sets are used as prediction models in the future. Based on theories of ensemble approaches. combination of many prediction models. in the form of systematically merged rule sets in this study. is better than one prediction model in performance. This study uses a customer data set that predicts buying power of customers based on their information. This study tests the performance of the suggested method with the data set alone with general prediction methods and compares performances of them.

Mining of Frequent Structures over Streaming XML Data (스트리밍 XML 데이터의 빈발 구조 마이닝)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.23-30
    • /
    • 2008
  • The basic research of context aware in ubiquitous environment is an internet technique and XML. The XML data of continuous stream type are popular in network application through the internet. And also there are researches related to query processing for streaming XML data. As a basic research to efficiently query, we propose not only a labeled ordered tree model representing the XML but also a mining method to extract frequent structures from streaming XML data. That is, XML data to continuously be input are modeled by a stream tree which is called by XFP_tree and we exactly extract the frequent structures from the XFP_tree of current window to mine recent data. The proposed method can be applied to the basis of the query processing and index method for XML stream data.

A Study of Web Usage Mining for eCRM

  • Hyuncheol Kang;Jung, Byoung-Cheol
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.831-840
    • /
    • 2001
  • In this study, We introduce the process of web usage mining, which has lately attracted considerable attention with the fast diffusion of world wide web, and explain the web log data, which Is the main subject of web usage mining. Also, we illustrate some real examples of analysis for web log data and look into practical application of web usage mining for eCRM.

  • PDF

Numerical Simulations of Developing Mining Pit using Quasi-Steady Model (준정류모형을 이용한 하천의 준설 웅덩이 발달 모의)

  • Choi, Sung-Uk;Choi, Seongwook
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.1B
    • /
    • pp.53-57
    • /
    • 2012
  • This study presents a numerical model that is capable of simulating the evolution of mining pit in a stream. The numerical model is based on the quasi-steady assumption that the flow is steady with time-dependent morphological change. This hypothesis is valid due to the fact that the stream morphology changes over a long period compared with the time of flow change. Before applications, numerical experiments are carried out with two total load formulas such as Engelund and Hansen's (1967) and Ackers and White's (1973). It is found that the use of Engelund and Hansen's formula reproduces evolution of mining pit best compared with simulated profiles in Parker (2004). Then, the model is applied to two laboratory experiments in the literature. In general, the numerical model simulates properly the evolution of mining pit in laboratory open-channels. However, it is found that the model does not reproduce head-cutting, propagating upstream, and under-estimates the wave of the bed, propagating downstream, after finishing the re-fill of the mining pit.