• Title/Summary/Keyword: Sequential patterns

Search Result 258, Processing Time 0.024 seconds

An Efficient Mining Algorithm for Generating Probabilistic Multidimensional Sequential Patterns (확률적 다차원 연속패턴의 생성을 위한 효율적인 마이닝 알고리즘)

  • Lee Chang-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.75-84
    • /
    • 2005
  • Sequential pattern mining is an important data mining problem with broad applications. While the current methods are generating sequential patterns within a single attribute, the proposed method is able to detect them among different attributes. By incorporating these additional attributes, the sequential patterns found are richer and more informative to the user This paper proposes a new method for generating multi-dimensional sequential patterns with the use of Hellinger entropy measure. Unlike the Previously used methods, the proposed method can calculate the significance of each sequential pattern. Two theorems are proposed to reduce the computational complexity of the proposed system. The proposed method is tested on some synthesized purchase transaction databases.

Sequential Pattern Mining Algorithms with Quantities (정량 정보를 포함한 순차 패턴 마이닝 알고리즘)

  • Kim, Chul-Yun;Lim, Jong-Hwa;Ng Raymond T.;Shim Kyu-Seok
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.453-462
    • /
    • 2006
  • Discovering sequential patterns is an important problem for many applications. Existing algorithms find sequential patterns in the sense that only items are included in the patterns. However, for many applications, such as business and scientific applications, quantitative attributes are often recorded in the data, which are ignored by existing algorithms but can provide useful insight to the users. In this paper, we consider the problem of mining sequential patterns with quantities. We demonstrate that naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. Thus, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions. Experimental results confirm that compared with the naive extensions, these schemes not only improve the execution time substantially but also show better scalability for sequential patterns with quantities.

Searching Sequential Patterns by Approximation Algorithm (근사 알고리즘을 이용한 순차패턴 탐색)

  • Sarlsarbold, Garawagchaa;Hwang, Young-Sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.29-36
    • /
    • 2009
  • Sequential pattern mining, which discovers frequent subsequences as patterns in a sequence database, is an important data mining problem with broad applications. Since a sequential pattern in DNA sequences can be a motif, we studied to find sequential patterns in DNA sequences. Most previously proposed mining algorithms follow the exact matching with a sequential pattern definition. They are not able to work in noisy environments and inaccurate data in practice. Theses problems occurs frequently in DNA sequences which is a biological data. We investigated approximate matching method to deal with those cases. Our idea is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call approximated pattern. The existing PrefixSpan algorithm can successfully find sequential patterns in a long sequence. We improved the PrefixSpan algorithm to find approximate sequential patterns. The experimental results showed that the number of repeats from the proposed method was 5 times more than that of PrefixSpan when the pattern length is 4.

Analysis of Conversation between Elderly Patients with Dementia and Nurses: Focusing on Structure and Sequential Patterns (치매 노인환자와 간호사의 대화 분석: 대화의 구조와 연속체 형태를 중심으로)

  • Yi, Myung-Sun
    • Journal of Korean Academy of Nursing
    • /
    • v.39 no.2
    • /
    • pp.166-176
    • /
    • 2009
  • Purpose: The purpose of the study was to identify functional structure and patterns of dialogue sequence in conversations between elderly patients with dementia and nurses in a long-term care facility. Methods: Conversation analysis was used to analyze the data which were collected using video-camera to capture non-verbal as well as verbal behaviors. Data collection was done during February 2005. Results: Introduction, assessment, intervention, and closing phases were identified as functional structure. Essential parts of the conversation were the assessment and intervention phases. In the assessment phase three sequential patterns of nurse-initiated dialogue and four sequential patterns of patient-initiated dialogue were identified. Also four sequential patterns were identified in nurse-initiated and three in patient-initiated dialogues in the intervention phase. In general, "ask question", "advise", and "directive" were the most frequently used utterance by nurses in nurse-initiated dialogue, indicating nurses' domination of the conversation. At the same time, "ask back", "refute", "escape", or "false promise" were used often by nurses to discourage patients from talking when patients were raising questions or demanding. Conclusion: It is important for nurses to encourage patient-initiated dialogue to counterbalance nurse-dominated conversation which results from imbalance between nurses and patients in terms of knowledge and task in healthcare institutions for elders.

A Partition Mining Method of Sequential Patterns using Suffix Checking (서픽스 검사를 이용한 단계적 순차패턴 분할 탐사 방법)

  • 허용도;조동영;박두순
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.590-598
    • /
    • 2002
  • For efficient sequential pattern mining, we need to reduce the cost to generate candidate patterns and searching space for the generated ones. Although Apriori-like methods like GSP[8] are simple, they have some problems such as generating of many candidate patterns and repetitive searching of a large database. PrefixSpan[2], which was proposed as an alternative of GSP, constructs the prefix projected databases which are stepwise partitioned in the mining process. It can reduce the searching space to estimate the support of candidate patterns, but the construction cost of projected databases is still high. To solve these problems, we proposed SuffixSpan(Suffix checked Sequential Pattern mining) as a new sequential pattern mining method. It generates a small size of candidate pattern sets using partition property and suffix property at a low cost and also uses 1-prefix projected databases as the searching space in order to reduce the cost of estimating the support of candidate patterns.

  • PDF

Mining Approximate Sequential Patterns in a Large Sequence Database (대용량 순차 데이터베이스에서 근사 순차패턴 탐색)

  • Kum Hye-Chung;Chang Joong-Hyuk
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.199-206
    • /
    • 2006
  • Sequential pattern mining is an important data mining task with broad applications. However, conventional methods may meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find interesting patterns shared by many sequences. In this paper, to overcome these problems, we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. The proposed method works in two steps: one is to cluster target sequences by their similarities and the other is to find consensus patterns that ire similar to the sequences in each cluster directly through multiple alignment. For this purpose, a novel structure called weighted sequence is presented to compress the alignment result, and the longest consensus pattern that represents each cluster is generated from its weighted sequence. Finally, the effectiveness of the proposed method is verified by a set of experiments.

Sequential Transition Patterns of Social Play by Children's Social Competence (유아의 사회적 능력에 따른 사회적 놀이의 연속적 변화 패턴)

  • Kim, Soon Jeong;Kim, Hee Jin
    • Korean Journal of Child Studies
    • /
    • v.28 no.1
    • /
    • pp.17-35
    • /
    • 2007
  • This study examined whether sequential transition patterns of social play differed by children's social competence. The social competence of sixty 5-year-old children was rated by their teachers using the Social Competence Scale(NICHD Early Child Research Network, 1996). Children's social play was observed during free play and coded by criteria developed by Robinson et al(2003). Results showed differences in children's social play behaviors by social competence and differences in the transition patterns of children's social play level by social competence. Children with higher social competence showed a transition pattern moving toward cooperative-social interaction, whereas children with lower social competence showed a transition pattern moving backward to solitary or onlooker behavior.

  • PDF

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Trend-based Sequential Pattern Discovery from Time-Series Data (시계열 데이터로부터의 경향성 기반 순차패턴 탐색)

  • 오용생;이동하;남도원;이전영
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.27-45
    • /
    • 2001
  • Sequential discovery from time series data has mainly concerned about events or item sets. Recently, the research has stated to applied to the numerical data. An example is sensor information generated by checking a machine state. The numerical data hardly have the same valuers while making patterns. So, it is important to extract suitable number of pattern features, which can be transformed to events or item sets and be applied to sequential pattern mining tasks. The popular methods to extract the patterns are sliding window and clustering. The results of these methods are sensitive to window sine or clustering parameters; that makes users to apply data mining task repeatedly and to interpret the results. This paper suggests the method to retrieve pattern features making numerical data into vector of an angle and a magnitude. The retrieved pattern features using this method make the result easy to understand and sequential patterns finding fast. We define an inclusion relation among pattern features using angles and magnitudes of vectors. Using this relation, we can fad sequential patterns faster than other methods, which use all data by reducing the data size.

  • PDF

Multiple-Group Latent Transition Model for the Analysis of Sequential Patterns of Early-Onset Drinking Behaviors among U.S. Adolescents

  • Chung, Hwan
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.709-719
    • /
    • 2011
  • We investigate the latent stage-sequential patterns of drinking behaviors of U.S. adolescents who have started to drink by age 14 years (seven years before the legal drinking age). A multiple-group latent transition analysis(LTA) with logistic regression is employed to identify the subsequent patterns of drinking behaviors among early-onset drinkers. A sample of 1407 early-onset adolescents from the National Longitudinal Survey of Youth(NLSY97) is analyzed using maximum-likelihood estimation. The analysis demonstrates that early-onset adolescents' drinking behaviors can be represented by four latent classes and their prevalence and transition are influenced by demographic factors of gender, age, and race.