• Title/Summary/Keyword: Stream Mining

Search Result 152, Processing Time 0.059 seconds

Development of the Performance Benchmark Tool for Data Stream Management Systems Combined with DBMS (DBMS와 결합된 데이터스트림관리시스템을 위한 성능 평가 도구 개발)

  • Kim, Gyoung-Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.8
    • /
    • pp.1-11
    • /
    • 2010
  • Many applications of DSMS(Data Stream Management System) require not only to process real-time stream data efficiently but also to provide high quality services such as data mining and data warehouse combining with DBMS(Database Management System) to users. In this paper we execute the performance benchmark of the combined system of DSMS and DBMS that is developed for high quality services. We use the stream data of network monitoring application system and combine the traditional representative DSMSs and DBMSs in a single system for the performance testing. We develop the total performance benchmark tool implementing JAVA language for the our testing. For our performance testing, we combine DSMS such as STREAM and Coral8 and DBMS such MySQL and Oracle10g respectively.

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

A GEOSENSOR FILTER FOR PROCESSING GEOSENSOR QUERIES ON DATA STREAMS

  • Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.119-121
    • /
    • 2008
  • Pattern matching is increasingly being employed in various researches as health care service, RFID-based system, facility management, and surveillance. Geosensor filter correlates a data stream to match specific patterns in distribution environments. In this paper, we present a geosensor query language to represent efficiently declarative geosensor query. Geosensor operators are proposed to use for fast query processing in terms of spatial and temporal area in distribution environments. We also propose a geosensor filter to match new query predicates into incoming stream predicates. Our filter can reduce the volume of transmission data and save power consumption of sensors. It can be utilized the stream data mining system to process in real-time various data as location, time, and geosensor information in distribution environments.

  • PDF

CONTINUOUS QUERY PROCESSING IN A DATA STREAM ENVIRONMENT

  • Lee, Dong-Gyu;Lee, Bong-Jae;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.3-5
    • /
    • 2007
  • Many continuous queries are important to be process efficiently in a data stream environment. It is applied a query index technique that takes linear performance irrespective of the number and width of intervals for processing many continuous queries. Previous researches are not able to support the dynamic insertion and deletion to arrange intervals for constructing an index previously. It shows that the insertion and search performance is slowed by the number and width of interval inserted. Many intervals have to be inserted and searched linearly in a data stream environment. Therefore, we propose Hashed Multiple Lists in order to process continuous queries linearly. Proposed technique shows fast linear search performance. It can be utilized the systems applying a sensor network, and preprocessing technique of spatiotemporal data mining.

  • PDF

Evaluating the Restoration of a Stream in an Abandoned Mine Land via Biomass Calculation of Benthic Macroinvertebrates

  • Mi-Jung Bae;Hyeon-Jung Seong;Seong-Nam Ham;Eui-Jin Kim
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.4
    • /
    • pp.415-420
    • /
    • 2022
  • It is essential that continual assessments of the impact of mine-derived water as a long-lasting burden on freshwater environments. Abundance-based evaluations of benthic macroinvertebrates have been conducted to evaluate anthropogenic disturbances and devise policies to reduce their impact. In this study, the status of a stream habitat was evaluated based on the body length and biomass weight of benthic macroinvertebrates of the family Baetidae. Following the renewal of the mining water treatment plant, the abundance of Baetidae assemblages recovered to a level comparable to that of a reference site. However, relatively low values were found for both body length and biomass weight in Baetidae species inhabiting the reddened streambed area, suggesting that the habitat has not yet been completely recovered despite the recovery of the abundance of the Baetidae assemblages. Therefore, continuous investigation and evaluation of this disturbed stream are necessary until their growth conditions of the habitat have functionally recovered.

Design of Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot (스냅샷을 가지는 다중 레벨 공간 DBMS를 기반으로 하는 센서 미들웨어 구조 설계)

  • Oh, Eun-Seog;Kim, Ho-Seok;Kim, Jae-Hong;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.1 s.16
    • /
    • pp.1-16
    • /
    • 2006
  • Recently, human based computing environment for supporting users to concentrate only user task without sensing other changes from users is being progressively researched and developed. But middleware deletes steream data processed for reducing process load of massive information from RFID sensor in this computing. So, this kind of middleware have problems when user demands probability or statistics needed for data warehousing or data mining and when user demands very important stream data repeatedly but already discarded in the middleware every former time. In this paper, we designs Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot and manage repeatedly required stream datas to solve reusing problems of historical stream data in current middleware. This system uses disk databse that manages historical stream datas filtered in middleware for requiring services using historical stream information as data mining or data warehousing from user, and uses memory database that mamages highly reuseable data as a snapshot when stream data storaged in disk database has high reuse frequency from user. For the more, this system processes memory database management policy in a cycle to maintain high reusement and rapid service for users. Our paper system solves problems of repeated requirement of stream datas, or a policy decision service using historical stream data of current middleware. Also offers variant and rapid data services maintaining high data reusement of main memory snapshot datas.

  • PDF

Assesment of soil pollution by Abandoned Mines wastes

  • Kim Hee-Joung;Yang Jae-E.;Lee Jai-Young;Park Beang-Kil;Kong Sung-Ho;Jun Sang-Ho
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2005.04a
    • /
    • pp.363-370
    • /
    • 2005
  • There are approximately 2,000 metallic mines which have been abandoned in Korea. Most of the mines are located in the watershed area, which is main source of drinking water for Seoul Metropolitan area. Untreated mining wastes are remained around abandoned mines in study area. These mining wastes, flowing into farmland and stream in the downstream of abandoned mines, would cause water and soil pollution. The mining waste samples from Guedo mine, Manjung mine and Joil mine recently abandoned were collected for the evaluation of the potential of water pollution by mine waste. Index of geoaccumulation($M\"{u}ller$, 1979), fractional composition and removal efficiency of some heavy metals by different concentration of HCl treatment were analyzed. Index of geoaccumulation of Cd, Pb, Zn, Cu, Ni and Cr are 6, $4{\sim}6,\;0{\sim}6,\;4{\sim}5$, 2 and 0 respectively. Index of geoaccumulation of Cd, Pb, Zn and Cu reveals the mining wastes has high pollution pottential in the area. Organic fraction of Cu, reducible fraction of Pb, residual fraction of Ni and Zn were the most abundant fraction of heavy metals in mining wastes.

  • PDF

Riparian Environment Change and Vegetation Immigration in Sandbar after Sand Mining (골채채취 후 수변환경 변화와 사주 내 식생이입)

  • Kong, Hak-Yang;Kim, Semi;Lee, Jaeyoon;Lee, Jae-An;Cho, Hyungjin
    • Journal of Korean Society on Water Environment
    • /
    • v.32 no.2
    • /
    • pp.135-141
    • /
    • 2016
  • This study investigated changes of hydrology, soil characteristics, riparian vegetation communities, and geomorphology in sandbars before and after sand-mining to determine the effect of sand-mining at upstream of Guemgang and Bochungcheon streams in Korea. Sand-mining events affected the mining area. They supplied organic matters and nutrients during flood. Sediment deposition caused soil texture change and expansion of vegetation area. However, riverbeds were stabilized after the disturbance. According to the analyses of aerial photographs, the vegetation area was significantly expanded in both dam-regulated streams and dam-unregulated streams after sand-mining. Willow shrubs advanced in disturbed area at an average of 10 years after sand-mining. It took willows trees 10.6 years to become dominant communities. Therefore, it took a total of 20.6 years for new riparian forest to form in sandbar after sand-mining. Our results confirmed that stream flow condition were dependent on vegetation recruitment in dam-regulated streams and dam-unregulated streams. For willow recruitment in unregulated streams, calculation of water level below dimensionless bed shear stress is important because low water level variation is a limiting factor of vegetation recruitment.

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions (다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법)

  • Lee, Hak-Joo;Kim, Jae-Wan;Lee, Won-Suk
    • Journal of Information Technology Services
    • /
    • v.16 no.1
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

Efficient Dynamic Weighted Frequent Pattern Mining by using a Prefix-Tree (Prefix-트리를 이용한 동적 가중치 빈발 패턴 탐색 기법)

  • Jeong, Byeong-Soo;Farhan, Ahmed
    • The KIPS Transactions:PartD
    • /
    • v.17D no.4
    • /
    • pp.253-258
    • /
    • 2010
  • Traditional frequent pattern mining considers equal profit/weight value of every item. Weighted Frequent Pattern (WFP) mining becomes an important research issue in data mining and knowledge discovery by considering different weights for different items. Existing algorithms in this area are based on fixed weight. But in our real world scenarios the price/weight/importance of a pattern may vary frequently due to some unavoidable situations. Tracking these dynamic changes is very necessary in different application area such as retail market basket data analysis and web click stream management. In this paper, we propose a novel concept of dynamic weight and an algorithm DWFPM (dynamic weighted frequent pattern mining). Our algorithm can handle the situation where price/weight of a pattern may vary dynamically. It scans the database exactly once and also eligible for real time data processing. To our knowledge, this is the first research work to mine weighted frequent patterns using dynamic weights. Extensive performance analyses show that our algorithm is very efficient and scalable for WFP mining using dynamic weights.