• 제목/요약/키워드: 스트림 데이터 마이닝

검색결과 61건 처리시간 0.026초

Mining Interesting Sequential Pattern with a Time-interval Constraint for Efficient Analyzing a Web-Click Stream (웹 클릭 스트림의 효율적 분석을 위한 시간 간격 제한을 활용한 관심 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Korea Society of Industrial Information Systems
    • /
    • 제16권2호
    • /
    • pp.19-29
    • /
    • 2011
  • Due to the development of web technologies and the increasing use of smart devices such as smart phone, in recent various web services are widely used in many application fields. In this environment, the topic of supporting personalized and intelligent web services have been actively researched, and an analysis technique on a web-click stream generated from web usage logs is one of the essential techniques related to the topic. In this paper, for efficient analyzing a web-click stream of sequences, a sequential pattern mining technique is proposed, which satisfies the basic requirements for data stream processing and finds a refined mining result. For this purpose, a concept of interesting sequential patterns with a time-interval constraint is defined, which uses not on1y the order of items in a sequential pattern but also their generation times. In addition, A mining method to find the interesting sequential patterns efficiently over a data stream such as a web-click stream is proposed. The proposed method can be effectively used to various computing application fields such as E-commerce, bio-informatics, and USN environments, which generate data as a form of data streams.

Mining Association Rule for the Abnormal Event in Data Stream Systems (데이터 스트림 시스템에서 이상 이벤트에 대한 연관 규칙 마이닝)

  • Kim, Dae-In;Park, Joon;Hwang, Bu-Hyun
    • The KIPS Transactions:PartD
    • /
    • 제14D권5호
    • /
    • pp.483-490
    • /
    • 2007
  • Recently mining techniques that analyze the data stream to discover potential information, have been widely studied. However, most of the researches based on the support are concerned with the frequent event, but ignore the infrequent event even if it is crucial. In this paper, we propose SM-AF method discovering association rules to an abnormal event. In considering the window that an abnormal event is sensed, SM-AF method can discover the association rules to the critical event, even if it is occurred infrequently. Also, SM-AF method can discover the significant rare itemsets associated with abnormal event and periodic event itemsets. Through analysis and experiments, we show that SM-AF method is superior to the previous methods of mining association rules.

In-memory Compression Scheme Based on Incremental Frequent Patterns for Graph Streams (그래프 스트림 처리를 위한 점진적 빈발 패턴 기반 인-메모리 압축 기법)

  • Lee, Hyeon-Byeong;Shin, Bo-Kyoung;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • 제22권1호
    • /
    • pp.35-46
    • /
    • 2022
  • Recently, with the development of network technologies, as IoT and social network service applications have been actively used, a lot of graph stream data is being generated. In this paper, we propose a graph compression scheme that considers the stream graph environment by applying graph mining to the existing compression technique, which has been focused on compression rate and runtime. In this paper, we proposed Incremental frequent pattern based compression technique for graph streams. Since the proposed scheme keeps only the latest reference patterns, it increases the storage utilization and improves the query processing time. In order to show the superiority of the proposed scheme, various performance evaluations are performed in terms of compression rate and processing time compared to the existing method. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.

Data Streams classification using Local Concept-adapted IOLIN System (지역적 컨셉트 적응형 IOLIN시스템을 사용한 데이터 스트림의 분류)

  • Kim, Jae-Woo;Song, Jae-Won;Lee, Ju-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • 제13권1호
    • /
    • pp.37-44
    • /
    • 2008
  • Data stream has the tendency to change in Patterns over time. Also known as concept drift, such problem can reduce the predictive performance of a classification model CVFDT and IOLIN tried to solve the problem of a concept drift through incremental classification model updates. The local changes in patterns. however was revealed to be unable to resolve the problems of local concept drift that occurs by influencing on total classification results. In this paper, we propose adapted IOLIN system that improves system's predictive performance by detecting the local concept drift. The experimental result shows that adaptive IOLIN, the Proposed method, is about 2.8% in accuracy better than IOLIN and about 11.2% in accuracy better than CVFDT.

  • PDF

Search Method of the time sensitive frequent itemsets (시간에 따른 가변성을 고려한 상대적인 빈발항목 탐색방법)

  • Park, Tae-Su;Lee, Ju-Hong;Park, Sun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2005년도 추계학술발표대회 및 정기총회
    • /
    • pp.97-100
    • /
    • 2005
  • 최근 유비쿼터스 컴퓨팅 및 인터넷 서비스에 대한 관심이 증대되면서, 대용량의 데이터에 내재되어 있는 정보를 빠른 시간 내에 처리하여 새로운 지식을 창출하려는 요구가 증가하고 있다. 데이터 마이닝 기법을 이용하여 데이터 스트림에서 빈발항목을 탐색하는 기존의 연구는 시간을 고려하지 않고 단순히 집계를 통하여 빈발항목을 탐색하기 때문에 정확성을 보장하지 못한다. 따라서 본 논문에서는 데이터 스트림에서 시간적 측면을 고려하여 상대적인 빈발항목을 탐색하기 위한 새로운 알고리즘을 제안하고자 한다. 논문에서 제안하는 알고리즘의 성능은 다양한 실험을 통해서 검증된다.

  • PDF

An Adaptive Grid-based Clustering Algorithm over Multi-dimensional Data Streams (적응적 격자기반 다차원 데이터 스트림 클러스터링 방법)

  • Park, Nam-Hun;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • 제14D권7호
    • /
    • pp.733-742
    • /
    • 2007
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, memory usage for data stream analysis should be confined finitely although new data elements are continuously generated in a data stream. To satisfy this requirement, data stream processing sacrifices the correctness of its analysis result by allowing some errors. The old distribution statistics are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. This paper proposes a grid based clustering algorithm for a data stream. Given a set of initial grid cells, the dense range of a grid cell is recursively partitioned into a smaller cell based on the distribution statistics of data elements by a top down manner until the smallest cell, called a unit cell, is identified. Since only the distribution statistics of data elements are maintained by dynamically partitioned grid cells, the clusters of a data stream can be effectively found without maintaining the data elements physically. Furthermore, the memory usage of the proposed algorithm is adjusted adaptively to the size of confined memory space by flexibly resizing the size of a unit cell. As a result, the confined memory space can be fully utilized to generate the result of clustering as accurately as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics

A Technique for Detecting Companion Groups from Trajectory Data Streams (궤적 데이터 스트림에서 동반 그룹 탐색 기법)

  • Kang, Suhyun;Lee, Ki Yong
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제8권12호
    • /
    • pp.473-482
    • /
    • 2019
  • There have already been studies analyzing the trajectories of objects from data streams of moving objects. Among those studies, there are also studies to discover groups of objects that move together, called companion groups. Most studies to discover companion groups use existing clustering techniques to find groups of objects close to each other. However, these clustering-based methods are often difficult to find the right companion groups because the number of clusters is unpredictable in advance or the shape or size of clusters is hard to control. In this study, we propose a new method that discovers companion groups based on the distance specified by the user. The proposed method does not apply the existing clustering techniques but periodically determines the groups of objects close to each other, by using a technique that efficiently finds the groups of objects that exist within the user-specified distance. Furthermore, unlike the existing methods that return only companion groups and their trajectories, the proposed method also returns their appearance and disappearance time. Through various experiments, we show that the proposed method can detect companion groups correctly and very efficiently.

Design of Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot (스냅샷을 가지는 다중 레벨 공간 DBMS를 기반으로 하는 센서 미들웨어 구조 설계)

  • Oh, Eun-Seog;Kim, Ho-Seok;Kim, Jae-Hong;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • 제8권1호
    • /
    • pp.1-16
    • /
    • 2006
  • Recently, human based computing environment for supporting users to concentrate only user task without sensing other changes from users is being progressively researched and developed. But middleware deletes steream data processed for reducing process load of massive information from RFID sensor in this computing. So, this kind of middleware have problems when user demands probability or statistics needed for data warehousing or data mining and when user demands very important stream data repeatedly but already discarded in the middleware every former time. In this paper, we designs Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot and manage repeatedly required stream datas to solve reusing problems of historical stream data in current middleware. This system uses disk databse that manages historical stream datas filtered in middleware for requiring services using historical stream information as data mining or data warehousing from user, and uses memory database that mamages highly reuseable data as a snapshot when stream data storaged in disk database has high reuse frequency from user. For the more, this system processes memory database management policy in a cycle to maintain high reusement and rapid service for users. Our paper system solves problems of repeated requirement of stream datas, or a policy decision service using historical stream data of current middleware. Also offers variant and rapid data services maintaining high data reusement of main memory snapshot datas.

  • PDF

An Efficient Method for Mining Frequent Patterns based on Weighted Support over Data Streams (데이터 스트림에서 가중치 지지도 기반 빈발 패턴 추출 방법)

  • Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제10권8호
    • /
    • pp.1998-2004
    • /
    • 2009
  • Recently, due to technical developments of various storage devices and networks, the amount of data increases rapidly. The large volume of data streams poses unique space and time constraints on the data mining process. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Most of the researches based on the support are concerned with the frequent itemsets, but ignore the infrequent itemsets even if it is crucial. In this paper, we propose an efficient method WSFI-Mine(Weighted Support Frequent Itemsets Mine) to mine all frequent itemsets by one scan from the data stream. This method can discover the closed frequent itemsets using DCT(Data Stream Closed Pattern Tree). We compare the performance of our algorithm with DSM-FI and THUI-Mine, under different minimum supports. As results show that WSFI-Mine not only run significant faster, but also consume less memory.

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

  • Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
    • Journal of The Korean Association of Information Education
    • /
    • 제9권3호
    • /
    • pp.453-462
    • /
    • 2005
  • Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.

  • PDF