Search | Korea Science

A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases

Ahmed, Chowdhury Farhan;Tanbeer, Syed Khairuzzaman;Jeong, Byeong-Soo
- ETRI Journal
- /
- v.32 no.5
- /
- pp.676-686
- /
- 2010
Mining sequential patterns is an important research issue in data mining and knowledge discovery with broad applications. However, the existing sequential pattern mining approaches consider only binary frequency values of items in sequences and equal importance/significance values of distinct items. Therefore, they are not applicable to actually represent many real-world scenarios. In this paper, we propose a novel framework for mining high-utility sequential patterns for more real-life applicable information extraction from sequence databases with non-binary frequency values of items in sequences and different importance/significance values for distinct items. Moreover, for mining high-utility sequential patterns, we propose two new algorithms: UtilityLevel is a high-utility sequential pattern mining with a level-wise candidate generation approach, and UtilitySpan is a high-utility sequential pattern mining with a pattern growth approach. Extensive performance analyses show that our algorithms are very efficient and scalable for mining high-utility sequential patterns.
https://doi.org/10.4218/etrij.10.1510.0066 인용 PDF KSCI

WIS: Weighted Interesting Sequential Pattern Mining with a Similar Level of Support and/or Weight

Yun, Un-Il
- ETRI Journal
- /
- v.29 no.3
- /
- pp.336-352
- /
- 2007
Sequential pattern mining has become an essential task with broad applications. Most sequential pattern mining algorithms use a minimum support threshold to prune the combinatorial search space. This strategy provides basic pruning; however, it cannot mine correlated sequential patterns with similar support and/or weight levels. If the minimum support is low, many spurious patterns having items with different support levels are found; if the minimum support is high, meaningful sequential patterns with low support levels may be missed. We present a new algorithm, weighted interesting sequential (WIS) pattern mining based on a pattern growth method in which new measures, sequential s-confidence and w-confidence, are suggested. Using these measures, weighted interesting sequential patterns with similar levels of support and/or weight are mined. The WIS algorithm gives a balance between the measures of support and weight, and considers correlation between items within sequential patterns. A performance analysis shows that WIS is efficient and scalable in weighted sequential pattern mining.
PDF

Finding associations between genes by time-series microarray sequential patterns analysis

Nam, Ho-Jung;Lee, Do-Heon
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2005.09a
- /
- pp.161-164
- /
- 2005
Data mining techniques can be applied to identify patterns of interest in the gene expression data. One goal in mining gene expression data is to determine how the expression of any particular gene might affect the expression of other genes. To find relationships between different genes, association rules have been applied to gene expression data set [1]. A notable limitation of association rule mining method is that only the association in a single profile experiment can be detected. It cannot be used to find rules across different condition profiles or different time point profile experiments. However, with the appearance of time-series microarray data, it became possible to analyze the temporal relationship between genes. In this paper, we analyze the time-series microarray gene expression data to extract the sequential patterns which are similar to the association rules between genes among different time points in the yeast cell cycle. The sequential patterns found in our work can catch the associations between different genes which express or repress at diverse time points. We have applied sequential pattern mining method to time-series microarray gene expression data and discovered a number of sequential patterns from two groups of genes (test, control) and more sequential patterns have been discovered from test group (same CO term group) than from the control group (different GO term group). This result can be a support for the potential of sequential patterns which is capable of catching the biologically meaningful association between genes.
PDF

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

Chang, Joong-Hyuk
- Journal of Intelligence and Information Systems
- /
- v.16 no.3
- /
- pp.55-75
- /
- 2010
Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.
PDF KSCI

Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint (순차 데이터 스트림에서 발생 간격 제한 조건을 활용한 빈발 순차 패턴 탐색)

Chang, Joong-Hyuk
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.9
- /
- pp.35-46
- /
- 2010
Sequential pattern mining is one of the essential data mining tasks, and it is widely used to analyze data generated in various application fields such as web-based applications, E-commerce, bioinformatics, and USN environments. Recently data generated in the application fields has been taking the form of continuous data streams rather than finite stored data sets. Considering the changes in the form of data, many researches have been actively performed to efficiently find sequential patterns over data streams. However, conventional researches focus on reducing processing time and memory usage in mining sequential patterns over a target data stream, so that a research on mining more interesting and useful sequential patterns that efficiently reflect the characteristics of the data stream has been attracting no attention. This paper proposes a mining method of sequential patterns over data streams with a gap constraint, which can help to find more interesting sequential patterns over the data streams. First, meanings of the gap for a sequential pattern and gap-constrained sequential patterns are defined, and subsequently a mining method for finding gap-constrained sequential patterns over a data stream is proposed.
https://doi.org/10.9708/jksci.2010.15.9.035 인용 PDF KSCI

Learning Multidimensional Sequential Patterns Using Hellinger Entropy Function (Hellinger 엔트로피를 이용한 다차원 연속패턴의 생성방법)

Lee, Chang-Hwan
- The KIPS Transactions:PartB
- /
- v.11B no.4
- /
- pp.477-484
- /
- 2004
The technique of sequential pattern mining means generating a set of inter-transaction patterns residing in time-dependent data. This paper proposes a new method for generating sequential patterns with the use of Hellinger measure. While the current methods are generating single dimensional sequential patterns within a single attribute, the proposed method is able to detect multi-dimensional patterns among different attributes. A number of heuristics, based on the characteristics of Hellinger measure, are proposed to reduce the computational complexity of the sequential pattern systems. Some experimental results are presented.
https://doi.org/10.3745/KIPSTB.2004.11B.4.477 인용 PDF KSCI

Test Pattern Generation for Asynchronous Sequential Circuits Operating in Fundamental Mode (기본 모드에서 동작하는 비동기 순차 회로의 시험 벡터 생성)

조경연;이재훈;민형복
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.35C no.9
- /
- pp.38-48
- /
- 1998
Generating test patterns for asynchronous sequential circuits remains to be a very difficult problem. There are few algorithms for this problem, and previous works cut feedback loops, and insert synchronous flip-flops in the feedback loops during ATPG. The conventional algorithms are similar to the algorithms for synchronous sequential circuits. This means that the conventional algorithms generate test patterns by modeling asynchronous sequential circuits as synchronous sequential circuits. So, test patterns generated by those algorithms nay not detect target faults when the test patterns are applied to the asynchronous sequential circuit under test. In this paper an algorithm is presented to generate test patterns for asynchronous sequential circuits. Test patterns generated by the algorithm can detect target faults for asynchronous sequential circuits with the minimal possibility of critical race problem and oscillation. And it is guaranteed that the test patterns generated by the algorithm will detect target faults.
PDF

Tree-based Navigation Pattern Analysis

Choi, Hyun-Jip
- Communications for Statistical Applications and Methods
- /
- v.8 no.1
- /
- pp.271-279
- /
- 2001
Sequential pattern discovery is one of main interests in web usage mining. the technique of sequential pattern discovery attempts to find inter-session patterns such that the presence of a set of items is followed by another item in a time-ordered set of server sessions. In this paper, a tree-based sequential pattern finding method is proposed in order to discover navigation patterns in server sessions. At each learning process, the suggested method learns about the navigation patterns per server session and summarized into the modified Rymon's tree.
PDF

SuffixSpan: A Formal Approach For Mining Sequential Patterns (SuffixSpan: 순차패턴 마이닝을 위한 형식적 접근방법)

Cho, Dong-Young
- The Journal of Korean Association of Computer Education
- /
- v.5 no.4
- /
- pp.53-60
- /
- 2002
Typical Apriori-like methods for mining sequential patterns have some problems such as generating of many candidate patterns and repetitive searching of a large database. And PrefixSpan constructs the prefix projected databases which are stepwise partitioned in the mining process. It can reduce the searching space to estimate the support of candidate patterns, but the construction cost of projected databases is still high. For efficient sequential pattern mining, we need to reduce the cost to generate candidate patterns and searching space for the generated ones. To solve these problems, we proposed SuffixSpan(Suffix checked Sequential Pattern mining), a new method for sequential pattern mining, and show a formal approach to our method.
PDF

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework (맵리듀스 프레임웍 상에서 맵리듀스 함수 호출을 최적화하는 순차 패턴 마이닝 기법)

Kim, Jin-Hyun;Shim, Kyu-Seok
- The KIPS Transactions:PartD
- /
- v.18D no.2
- /
- pp.81-88
- /
- 2011
Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.
https://doi.org/10.3745/KIPSTD.2011.18D.2.081 인용 PDF KSCI

Search Result 258, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)