• Title/Summary/Keyword: stream mining

Search Result 153, Processing Time 0.024 seconds

Finding Frequent Itemsets based on Open Data Mining in Data Streams (데이터 스트림에서 개방 데이터 마이닝 기반의 빈발항목 탐색)

  • Chang, Joong-Hyuk;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.447-458
    • /
    • 2003
  • The basic assumption of conventional data mining methodology is that the data set of a knowledge discovery process should be fixed and available before the process can proceed. Consequently, this assumption is valid only when the static knowledge embedded in a specific data set is the target of data mining. In addition, a conventional data mining method requires considerable computing time to produce the result of mining from a large data set. Due to these reasons, it is almost impossible to apply the mining method to a realtime analysis task in a data stream where a new transaction is continuously generated and the up-to-dated result of data mining including the newly generated transaction is needed as quickly as possible. In this paper, a new mining concept, open data mining in a data stream, is proposed for this purpose. In open data mining, whenever each transaction is newly generated, the updated mining result of whole transactions including the newly generated transactions is obtained instantly. In order to implement this mechanism efficiently, it is necessary to incorporate the delayed-insertion of newly identified information in recent transactions as well as the pruning of insignificant information in the mining result of past transactions. The proposed algorithm is analyzed through a series of experiments in order to identify the various characteristics of the proposed algorithm.

Ontology based Preprocessing Scheme for Mining Data Streams from Sensor Networks (센서 네트워크의 데이터 스트림 마이닝을 위한 온톨로지 기반의 전처리 기법)

  • Jung, Jason J.
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.67-80
    • /
    • 2009
  • By a number of sensors and sensor networks, we can collect environmental information from a certain sensor space. To discover more useful information and knowledge, we want to employ data mining methodologies to sensor data stream from such sensor spaces. In this paper, we present a novel data preprocessing scheme to improve the performances of the data mining algorithms. Especially, ontologies are applied to represent meanings of the sensor data. For evaluating the proposed method, we have collected sensor streams for about 30 days, and simulated them to compare with other approaches.

  • PDF

The Development of Temporal Mining Technique Considering the Event Change of State in U-Health (U-Health에서 이벤트 상태 변화를 고려한 시간 마이닝 기법 개발)

  • Kim, Jae-In;Kim, Dae-In;Hwang, Bu-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.18D no.4
    • /
    • pp.215-224
    • /
    • 2011
  • U-Health collects patient information with various kinds of sensor. Stream data can be summarized as an interval event which has aninterval between start-time-point and end-time-point. Most of temporal mining techniques consider only the event occurrence-time-point and ignore stream data change of state. In this paper, we propose the temporal mining technique considering the event change of state in U-Health. Our method overcomes the restrictions of the environment by sending a significant event in U-Health from sensors to a server. We define four event states of stream data and perform the temporal data mining considered the event change of state. Finally, we can remove an ambiguity of discovered rules by describing cause-and-effect relations among events in temporal relation sequences.

Stream-based Biomedical Classification Algorithms for Analyzing Biosignals

  • Fong, Simon;Hang, Yang;Mohammed, Sabah;Fiaidhi, Jinan
    • Journal of Information Processing Systems
    • /
    • v.7 no.4
    • /
    • pp.717-732
    • /
    • 2011
  • Classification in biomedical applications is an important task that predicts or classifies an outcome based on a given set of input variables such as diagnostic tests or the symptoms of a patient. Traditionally the classification algorithms would have to digest a stationary set of historical data in order to train up a decision-tree model and the learned model could then be used for testing new samples. However, a new breed of classification called stream-based classification can handle continuous data streams, which are ever evolving, unbound, and unstructured, for instance--biosignal live feeds. These emerging algorithms can potentially be used for real-time classification over biosignal data streams like EEG and ECG, etc. This paper presents a pioneer effort that studies the feasibility of classification algorithms for analyzing biosignals in the forms of infinite data streams. First, a performance comparison is made between traditional and stream-based classification. The results show that accuracy declines intermittently for traditional classification due to the requirement of model re-learning as new data arrives. Second, we show by a simulation that biosignal data streams can be processed with a satisfactory level of performance in terms of accuracy, memory requirement, and speed, by using a collection of stream-mining algorithms called Optimized Very Fast Decision Trees. The algorithms can effectively serve as a corner-stone technology for real-time classification in future biomedical applications.

Hybrid Internet Business Model using Evolutionary Support Vector Regression and Web Response Survey

  • Jun, Sung-Hae
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.408-411
    • /
    • 2006
  • Currently, the nano economy threatens the mass economy. This is based on the internet business models. In the nano business models based on internet, the diversely personalized services are needed. Many researches of the personalization on the web have been studied. The web usage mining using click stream data is a tool for personalization model. In this paper, we propose an internet business model using evolutionary support vector machine and web response survey as a web usage mining. After analyzing click stream data for web usage mining, a personalized service model is constructed in our work. Also, using an approach of web response survey, we improve the performance of the customers' satisfaction. From the experimental results, we verify the performance of proposed model using two data sets from KDD Cup 2000 and our web server.

  • PDF

A Sequential Pattern Mining based on Dynamic Weight in Data Stream (스트림 데이터에서 동적 가중치를 이용한 순차 패턴 탐사 기법)

  • Choi, Pilsun;Kim, Hwan;Kim, Daein;Hwang, Buhyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.137-144
    • /
    • 2013
  • A sequential pattern mining is finding out frequent patterns from the data set in time order. In this field, a dynamic weighted sequential pattern mining is applied to a computing environment that changes depending on the time and it can be utilized in a variety of environments applying changes of dynamic weight. In this paper, we propose a new sequence data mining method to explore the stream data by applying the dynamic weight. This method reduces the candidate patterns that must be navigated by using the dynamic weight according to the relative time sequence, and it can find out frequent sequence patterns quickly as the data input and output using a hash structure. Using this method reduces the memory usage and processing time more than applying the existing methods. We show the importance of dynamic weighted mining through the comparison of different weighting sequential pattern mining techniques.

Evaluation of Heavy Metal Contamination in Streams within Samsanjeil and Sambong Cu Mining Area (삼산제일.삼봉 동광산 주변 수계의 중금속 오염도 평가)

  • Kim, Soon-Oh;Jung, Young-Il;Cho, Hyen-Goo
    • Journal of the Mineralogical Society of Korea
    • /
    • v.19 no.3 s.49
    • /
    • pp.171-187
    • /
    • 2006
  • The status of heavy metal contamination was investigated using chemical analyses of stream waters and sediments obtained from Samsanjeil and Sambong Cu mining area in Goseong-gun, Gyeongsangnam-do. In addition, the degree and the environmental risk of heavy metal contamination in stream sediments was assessed through pollution index (Pl) and danger index (DI) based on total digestion by aqua regia and fractionation of heavy metal contaminants by sequential extraction, respectively. Not only the degree of heavy metal contamination was significantly higher in Samsanjeil area than in Sambong area, but its environmental risk was also revealed much more serious in Samsanjeil area than in Sambong area. The differences in status and level of contamination and environmental risk between both two mining areas may be attributed to existence of contamination source and geology. Acid mine drainage is continuously discharged and flows into the stream in Samsanjeil mining area, and it makes the heavy metal contamination in the stream more deteriorated than in Sambong mining area in which acid mine drainage is not produced. In addition, the geology of Samsanjeil mining area is mainly comprised of andesitic rocks including a small amount of calcite and having lower pH buffering capacity fer acid mine drainage, and it is likely that the heavy metal contamination cannot be naturally attenuated in streams. On the contrary, the main geology of Sambong mining area consists of pyroclastic sedimentary Goseong formation containing a high content of carbonates, particularly calcite, and it seems that these carbonates of high pH buffering capacity prevent the heavy metal contamination from proceeding downstream in stream within that area.

Mining Interesting Sequential Pattern with a Time-interval Constraint for Efficient Analyzing a Web-Click Stream (웹 클릭 스트림의 효율적 분석을 위한 시간 간격 제한을 활용한 관심 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.16 no.2
    • /
    • pp.19-29
    • /
    • 2011
  • Due to the development of web technologies and the increasing use of smart devices such as smart phone, in recent various web services are widely used in many application fields. In this environment, the topic of supporting personalized and intelligent web services have been actively researched, and an analysis technique on a web-click stream generated from web usage logs is one of the essential techniques related to the topic. In this paper, for efficient analyzing a web-click stream of sequences, a sequential pattern mining technique is proposed, which satisfies the basic requirements for data stream processing and finds a refined mining result. For this purpose, a concept of interesting sequential patterns with a time-interval constraint is defined, which uses not on1y the order of items in a sequential pattern but also their generation times. In addition, A mining method to find the interesting sequential patterns efficiently over a data stream such as a web-click stream is proposed. The proposed method can be effectively used to various computing application fields such as E-commerce, bio-informatics, and USN environments, which generate data as a form of data streams.

Geochemical Enrichment and Migration of Environmental Toxic Elements in Stream Sediments and Soils from the Samkwang Au-Ag Mine Area, Korea (삼광 금-은광산 일대의 하상퇴적물과 토양내 함유된 독성원소의 지구화학적 부화와 이동)

  • Lee, Chan Hee;Lee, Byun Koo;Yoo, Bong-Cheal;Cho, Aeran
    • Economic and Environmental Geology
    • /
    • v.31 no.2
    • /
    • pp.111-125
    • /
    • 1998
  • Dispersion, migration and enrichment of environmental toxic elements from the Samkwang Au-Ag mine area were investigated based upon major, minor and rare earth element geochemistry. The Samkwang mine area composed mainly of Precambrian granitic gneiss. The mine had been mined for gold and silver, but closed in 1996. According to the X-ray powder diffraction, mineral composition of stream sediments and soils were partly variable mineralogy, which are composed of quartz, orthoclase, plagioclase, amphibole, muscovite, biotite and chlorite, respectively. Major element variations of the host granitic gneiss, stream sediments and soils of mining and non-mining drainage, indicate that those compositions are decrese $Al_2O_3$, $Fe_2O_3$, MgO, $TiO_2$, $P_2O_5$ and LOI with increasing $SiO_2$ respectively. Average compositional ranges (ppm) of minor and/or environmental toxic elements within those samples are revealed as As=<2-4500, Cd=<1-24, Cu=6-117, Sb=1-29, Pb=17-1377 and Zn=32-938, which are extremely high concentrations of sediments from the mining drainage (As=2006, Cd=l1, Cu=71, Pb=587 and Zn=481 ppm, respectively) than concentrations of the other samples and host granitic gneiss. Major elements (average enrichment index=6.53) in all samples are mostly enriched, excepting $SiO_2$, $Na_2O$ and $K_2O$, normalized by composition of host granitic gneiss. Rare earth element (average enrichment index=2.34) are enriched with the sediments from the mining drainage. Minor and/or environmental toxic elements within all samples on the basis of host rock were strongly enriched of all elements (especially As, Br, Cu, Pb and Zn), excepting Ba, Cr, Rb and Sr. Average enrichment index of trace elements in all samples is 15.55 (sediments of mining drainage=37.33). Potentially toxic elements (As, Cd, Cr, Cu, Ni, Pb, and Zn) of the samples revealed that average enrichment index is 46.10 (sediments of mining drainage=80.20, sediments of nonmining drainage=5.35, sediments of confluent drainage=20.22, subsurface soils of mining drainage=7.97 and subsurface soils of non-mining drainage=4.15). Sediments and soils of highly concentrated toxic elements are contained some pyrite, arsenopyrite, sphalerite, galena and goethite.

  • PDF

Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint (순차 데이터 스트림에서 발생 간격 제한 조건을 활용한 빈발 순차 패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.35-46
    • /
    • 2010
  • Sequential pattern mining is one of the essential data mining tasks, and it is widely used to analyze data generated in various application fields such as web-based applications, E-commerce, bioinformatics, and USN environments. Recently data generated in the application fields has been taking the form of continuous data streams rather than finite stored data sets. Considering the changes in the form of data, many researches have been actively performed to efficiently find sequential patterns over data streams. However, conventional researches focus on reducing processing time and memory usage in mining sequential patterns over a target data stream, so that a research on mining more interesting and useful sequential patterns that efficiently reflect the characteristics of the data stream has been attracting no attention. This paper proposes a mining method of sequential patterns over data streams with a gap constraint, which can help to find more interesting sequential patterns over the data streams. First, meanings of the gap for a sequential pattern and gap-constrained sequential patterns are defined, and subsequently a mining method for finding gap-constrained sequential patterns over a data stream is proposed.