• Title/Summary/Keyword: sequential pattern analysis

Search Result 113, Processing Time 0.027 seconds

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

A Methodology for Improving fitness of the Latent Growth Modeling using Association Rule Mining (연관규칙을 이용한 잠재성장모형의 개선방법론)

  • Cho, Yeong Bin;Jun, Jae-Hoon;Choi, Byungwoo
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.217-225
    • /
    • 2019
  • The Latent Growth Modeling(LGM) is known as the typical analysis method of longitudinal data and it could be classified into unconditional model and conditional model. It is common to assume that the growth trajectory of unconditional model of LGM is linear. In the case of quasi-linear, the methodology for improving the model fitness using Sequential Pattern of Association Rule Mining is suggested. To do this, we divide longitudinal data into quintiles and extract periodic changes of the longitudinal data in each quintiles and make sequential pattern based on this periodic changes. To evaluate the effectiveness, the LGM module in SPSS AMOS was used and the dataset of the Youth Panel from 2001 to 2006 of Korea Employment Information Service. Our methodology was able to increase the fitness of the model compared to the simple linear growth trajectory.

A Study on the Inference of Detailed Protocol Structure in Protocol Reverse Engineering (상세한 프로토콜 구조를 추론하는 프로토콜 리버스 엔지니어링 방법에 대한 연구)

  • Chae, Byeong-Min;Moon, Ho-Won;Goo, Young-Hoon;Shim, Kyu-Seok;Lee, Min-Seob;Kim, Myung-Sup
    • KNOM Review
    • /
    • v.22 no.1
    • /
    • pp.42-51
    • /
    • 2019
  • Recently, the amount of internet traffic is increasing due to the increase in speed and capacity of the network environment, and protocol data is increasing due to mobile, IoT, application, and malicious behavior. Most of these private protocols are unknown in structure. For efficient network management and security, analysis of the structure of private protocols must be performed. Many protocol reverse engineering methodologies have been proposed for this purpose, but there are disadvantages to applying them. In this paper, we propose a methodology for inferring a detailed protocol structure based on network trace analysis by hierarchically combining CSP (Contiguous Sequential Pattern) and SP (Sequential Pattern) Algorithm. The proposed methodology is designed and implemented in a way that improves the preceeding study, A2PRE, We describe performance index for comparing methodologies and demonstrate the superiority of the proposed methodology through the example of HTTP, DNS protocol.

Application of sequential analysis in internet shopping malls (인터넷 쇼핑몰에서의 축차분석법 활용 방안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1009-1014
    • /
    • 2009
  • The Internet has changed the daily lives of human being in Korea and elsewhere in the world. It has changed the paradigms of traditional commercial activities and created immense opportunities for new business models. Recently, there has been much attention to the internet shopping mall as a means of commercial transaction. To make internet shopping mall competitive, effective customer satisfaction service should be provided and it is necessary to dynamic analysis method for customers' purchasing pattern. In this paper we apply the sequential analysis to comparison of two kinds of sales through the analysis of customers' purchasing pattern.

  • PDF

SEQUENTIAL EM LEARNING FOR SUBSPACE ANALYSIS

  • Park, Seungjin
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.698-701
    • /
    • 2002
  • Subspace analysis (which includes PCA) seeks for feature subspace (which corresponds to the eigenspace), given multivariate input data and has been widely used in computer vision and pattern recognition. Typically data space belongs to very high dimension, but only a few principal components need to be extracted. In this paper I present a fast sequential algorithm for subspace analysis or tracking. Useful behavior of the algorithm is confirmed by numerical experiments.

  • PDF

A Study on Partial Pattern Estimation for Sequential Agglomerative Hierarchical Nested Model (SAHN 모델의 부분적 패턴 추정 방법에 대한 연구)

  • Jang, Kyung-Won;Ahn, Tae-Chon
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.143-145
    • /
    • 2005
  • In this paper, an empirical study result on pattern estimation method is devoted to reveal underlying data patterns with a relatively reduced computational cost. Presented method performs crisp type clustering with given n number of data samples by means of the sequential agglomerative hierarchical nested model (SAHN). Conventional SAHN based clustering requires large computation time in the initial step of algorithm. To deal with this concern, we modified overall process with a partial approach. In the beginning of this method, we divide given data set to several sub groups with uniform sampling and then each divided sub data group is applied to SAHN based method. The advantage of this method reduces computation time of original process and gives similar results. Proposed is applied to several test data set and simulation result with conceptual analysis is presented.

  • PDF

RSP-DS: Real Time Sequential Patterns Analysis in Data Streams (RSP-DS: 데이터 스트림에서의 실시간 순차 패턴 분석)

  • Shin Jae-Jyn;Kim Ho-Seok;Kim Kyoung-Bae;Bae Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1118-1130
    • /
    • 2006
  • Existed pattern analysis algorithms in data streams environment have researched performance improvement and effective memory usage. But when new data streams come, existed pattern analysis algorithms have to analyze patterns again and have to generate pattern tree again. This approach needs many calculations in real situation that needs real time pattern analysis. This paper proposes a method that continuously analyzes patterns of incoming data streams in real time. This method analyzes patterns fast, and thereafter obtains real time patterns by updating previously analyzed patterns. The incoming data streams are divided into several sequences based on time based window. Informations of the sequences are inputted into a hash table. When the number of the sequences are over predefined bound, patterns are analyzed from the hash table. The patterns form a pattern tree, and later created new patterns update the pattern tree. In this way, real time patterns are always maintained in the pattern tree. During pattern analysis, suffixes of both new pattern and existed pattern in the tree can be same. Then a pointer is created from the new pattern to the existed pattern. This method reduce calculation time during duplicated pattern analysis. And old patterns in the tree are deleted easily by FIFO method. The advantage of our algorithm is proved by performance comparison with existed method, MILE, in a condition that pattern is changed continuously. And we look around performance variation by changing several variable in the algorithm.

  • PDF

Finding associations between genes by time-series microarray sequential patterns analysis

  • Nam, Ho-Jung;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.161-164
    • /
    • 2005
  • Data mining techniques can be applied to identify patterns of interest in the gene expression data. One goal in mining gene expression data is to determine how the expression of any particular gene might affect the expression of other genes. To find relationships between different genes, association rules have been applied to gene expression data set [1]. A notable limitation of association rule mining method is that only the association in a single profile experiment can be detected. It cannot be used to find rules across different condition profiles or different time point profile experiments. However, with the appearance of time-series microarray data, it became possible to analyze the temporal relationship between genes. In this paper, we analyze the time-series microarray gene expression data to extract the sequential patterns which are similar to the association rules between genes among different time points in the yeast cell cycle. The sequential patterns found in our work can catch the associations between different genes which express or repress at diverse time points. We have applied sequential pattern mining method to time-series microarray gene expression data and discovered a number of sequential patterns from two groups of genes (test, control) and more sequential patterns have been discovered from test group (same CO term group) than from the control group (different GO term group). This result can be a support for the potential of sequential patterns which is capable of catching the biologically meaningful association between genes.

  • PDF

A Local Feature-Based Robust Approach for Facial Expression Recognition from Depth Video

  • Uddin, Md. Zia;Kim, Jaehyoun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1390-1403
    • /
    • 2016
  • Facial expression recognition (FER) plays a very significant role in computer vision, pattern recognition, and image processing applications such as human computer interaction as it provides sufficient information about emotions of people. For video-based facial expression recognition, depth cameras can be better candidates over RGB cameras as a person's face cannot be easily recognized from distance-based depth videos hence depth cameras also resolve some privacy issues that can arise using RGB faces. A good FER system is very much reliant on the extraction of robust features as well as recognition engine. In this work, an efficient novel approach is proposed to recognize some facial expressions from time-sequential depth videos. First of all, efficient Local Binary Pattern (LBP) features are obtained from the time-sequential depth faces that are further classified by Generalized Discriminant Analysis (GDA) to make the features more robust and finally, the LBP-GDA features are fed into Hidden Markov Models (HMMs) to train and recognize different facial expressions successfully. The depth information-based proposed facial expression recognition approach is compared to the conventional approaches such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) where the proposed one outperforms others by obtaining better recognition rates.

A Data Based Methodology for Estimating the Unconditional Model of the Latent Growth Modeling (잠재성장모형의 무조건적 모델 추정을 위한 데이터 기반 방법론)

  • Cho, Yeong Bin
    • Journal of Digital Convergence
    • /
    • v.16 no.6
    • /
    • pp.85-93
    • /
    • 2018
  • The Latent Growth Modeling(LGM) is known as the arising analysis method of longitudinal data and it could be classified into unconditional model and conditional model. Unconditional model requires estimated value of intercept and slope to complete a model of fitness. However, the existing LGM is in absence of a structured methodology to estimate slope when longitudinal data is neither simple linear function nor the pre-defined function. This study used Sequential Pattern of Association Rule Mining to calculate slope of unconditional model. The applied dataset is 'the Youth Panel 2001-2006' from Korea Employment Information Service. The proposed methodology was able to identify increasing fitness of the model comparing to the existing simple linear function and visualizing process of slope estimation.