• Title/Summary/Keyword: Sequence Mining

Search Result 164, Processing Time 0.021 seconds

A Protein Sequence Prediction Method by Mining Sequence Data (서열 데이타마이닝을 통한 단백질 서열 예측기법)

  • Cho, Sun-I;Lee, Do-Heon;Cho, Kwang-Hwi;Won, Yong-Gwan;Kim, Byoung-Ki
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.261-266
    • /
    • 2003
  • A protein, which is a linear polymer of amino acids, is one of the most important bio-molecules composing biological structures and regulating bio-chemical reactions. Since the characteristics and functions of proteins are determined by their amino acid sequences in principle, protein sequence determination is the starting point of protein function study. This paper proposes a protein sequence prediction method based on data mining techniques, which can overcome the limitation of previous bio-chemical sequencing methods. After applying multiple proteases to acquire overlapped protein fragments, we can identify candidate fragment sequences by comparing fragment mass values with peptide databases. We propose a method to construct multi-partite graph and search maximal paths to determine the protein sequence by assembling proper candidate sequences. In addition, experimental results based on the SWISS-PROT database showing the validity of the proposed method is presented.

MOTIF BASED PROTEIN FUNCTION ANALYSIS USING DATA MINING

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.812-815
    • /
    • 2006
  • Proteins are essential agents for controlling, effecting and modulating cellular functions, and proteins with similar sequences have diverged from a common ancestral gene, and have similar structures and functions. Function prediction of unknown proteins remains one of the most challenging problems in bioinformatics. Recently, various computational approaches have been developed for identification of short sequences that are conserved within a family of closely related protein sequence. Protein function is often correlated with highly conserved motifs. Motif is the smallest unit of protein structure and function, and intends to make core part among protein structural and functional components. Therefore, prediction methods using data mining or machine learning have been developed. In this paper, we describe an approach for protein function prediction of motif-based models using data mining. Our work consists of three phrases. We make training and test data set and construct classifier using a training set. Also, through experiments, we evaluate our classifier with other classifiers in point of the accuracy of resulting classification.

  • PDF

Research on a Multi-level Space Vector Modulation Strategy in Non-orthogonal Three-dimensional Coordinate Systems

  • Zhang, Chuan-Jin;Wei, Rui-Peng;Tang, Yi;Wang, Ke
    • Journal of Power Electronics
    • /
    • v.17 no.5
    • /
    • pp.1160-1172
    • /
    • 2017
  • A novel space vector modulation strategy in the non-orthogonal three-dimensional coordinate system for multi-level three-phase four-wire inverters is proposed in this paper. This new non-orthogonal three-dimensional space vector modulation converts original trigonometric functions in the orthogonal three-dimensional space coordinate into simple algebraic operations, which greatly reduces the algorithm complexity of three-dimensional space vector modulation and preserves the independent control of the zero-sequence component. Experimental results have verified the correctness and effectiveness of the proposed three-dimensional space vector modulation in the new non-orthogonal three-dimensional coordinate system.

Community Structures of Arbuscular Mycorrhizal Fungi in Soils and Plant Roots Inhabiting Abandoned Mines of Korea

  • Park, Hyeok;Lee, Eun-Hwa;Ka, Kang-Hyeon;Eom, Ahn-Heum
    • Mycobiology
    • /
    • v.44 no.4
    • /
    • pp.277-282
    • /
    • 2016
  • In this study, we collected rhizosphere soils and root samples from a post-mining area and a natural forest area in Jecheon, Korea. We extracted spores of arbuscular mycorrhizal fungi (AMF) from rhizospheres, and then examined the sequences of 18S rDNA genes of the AMF from the collected roots of plants. We compared the AMF communities in the post-mining area and the natural forest area by sequence analysis of the AMF spores from soils and of the AMF clones from roots. Consequently, we confirmed that the structure of AMF communities varied between the post-mining area and the natural forest area and showed significant relationship with heavy metal contents in soils. These results suggest that heavy metal contamination by mining activity significantly affects the AMF community structure.

Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams

  • Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
    • Journal of Information Processing Systems
    • /
    • v.6 no.1
    • /
    • pp.79-90
    • /
    • 2010
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Data mining over data streams should support the flexible trade-off between processing time and mining accuracy. In many application areas, mining frequent itemsets has been suggested to find important frequent itemsets by considering the weight of itemsets. In this paper, we present an efficient algorithm WSFI (Weighted Support Frequent Itemsets)-Mine with normalized weight over data streams. Moreover, we propose a novel tree structure, called the Weighted Support FP-Tree (WSFP-Tree), that stores compressed crucial information about frequent itemsets. Empirical results show that our algorithm outperforms comparative algorithms under the windowed streaming model.

Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure (시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링)

  • 오승준;김재련
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2004.11a
    • /
    • pp.221-229
    • /
    • 2004
  • Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

  • PDF

BLOCS: Block Correlation Aware Sequential Pattern Mining based Caching Algorithm for Hybrid Storages (BLOCS: 블록 상관관계를 인지하는 시퀀스 패턴 마이닝 기반 하이브리드 스토리지 캐슁 알고리즘)

  • Lee, Seongjin;Won, Youjip
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.7
    • /
    • pp.113-130
    • /
    • 2014
  • In this paper, we propose BLOCS algorithm to find sequence of data that should be saved in cache device of hybrid storage system which uses SSD as a cache device. BLOCS algorithm which uses a sequence pattern mining scheme, creates a set of frequently requested sectors with respect to requested order of sectors. To compare the performance of the proposed scheme, we introduce Distance (DIST) based scheme, Request Frequency (FREQ) based scheme, and Frequency times Size (F-S) based scheme. We measure the hit ratio and I/O latency of different caching schemes using hybrid storage caching simulator. We acquired booting workload along with ten scenarios of launching applications and use the workloads as input to the cache simulator. After experiment with booting workload, we find that BLOCS scheme gives hit ratio of 61% which is about 15% higher than the least performing DIST scheme.

Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events

  • Ashok Kumar, P.M.;Vaidehi, V.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.1
    • /
    • pp.169-189
    • /
    • 2015
  • Detection of anomalous events from video streams is a challenging problem in many video surveillance applications. One such application that has received significant attention from the computer vision community is traffic video surveillance. In this paper, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting spatio-temporal abnormal events (such as a traffic violation at junction) from sequences of video streams. The proposed approach relies mainly on spatial abstractions of each object, mining frequent temporal patterns in a sequence of video frames to form a regular temporal pattern. In order to detect each object in every frame, the input video is first pre-processed by applying Gaussian Mixture Models. After the detection of foreground objects, the tracking is carried out using block motion estimation by the three-step search method. The primitive events of the object are represented by assigning spatial and temporal symbols corresponding to their location and time information. These primitive events are analyzed to form a temporal pattern in a sequence of video frames, representing temporal relation between various object's primitive events. This is repeated for each window of sequences, and the support for temporal sequence is obtained based on LC-STP to discover regular patterns of normal events. Events deviating from these patterns are identified as anomalies. Unlike the traditional frequent item set mining methods, the proposed method generates maximal frequent patterns without candidate generation. Furthermore, experimental results show that the proposed method performs well and can detect video anomalies in real traffic video data.

Design of a Personalized Web Mining System Using a Sequence Association Rule (스퀀스 연관규칙을 이용한 개인화 웹 마이닝 설계)

  • Yun, Jong-Chan;Youn, Sung-Dae
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.9
    • /
    • pp.1106-1116
    • /
    • 2007
  • Recently e-commerce trade on the web has grown rapidly in scale and complexity, just as web site designs and web servers have become more complicated. In view of these complexities, it is obviously difficult to analyse web user's data since they web users employ so many different web paths. The existing association rule investigation algorithms identify all items with a high correlation. However even though users often only want to find items in which they have interest, it is still difficult to find the rules they want out of all of the many association rules found by existing algorithms. In this paper, we propose a system linking each node with the sequence association rule, linking all routes after finding a path corresponding to a user with the association rule-one of the data mining techniques which identify user patterns in web user paths. The suggested system helps us construct individualized or customer-subdivided sites using the sequence association rule in order to harmonize the paths of web users with user characters.

  • PDF

Mining Frequent Sequential Patterns over Sequence Data Streams with a Gap-Constraint (순차 데이터 스트림에서 발생 간격 제한 조건을 활용한 빈발 순차 패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.35-46
    • /
    • 2010
  • Sequential pattern mining is one of the essential data mining tasks, and it is widely used to analyze data generated in various application fields such as web-based applications, E-commerce, bioinformatics, and USN environments. Recently data generated in the application fields has been taking the form of continuous data streams rather than finite stored data sets. Considering the changes in the form of data, many researches have been actively performed to efficiently find sequential patterns over data streams. However, conventional researches focus on reducing processing time and memory usage in mining sequential patterns over a target data stream, so that a research on mining more interesting and useful sequential patterns that efficiently reflect the characteristics of the data stream has been attracting no attention. This paper proposes a mining method of sequential patterns over data streams with a gap constraint, which can help to find more interesting sequential patterns over the data streams. First, meanings of the gap for a sequential pattern and gap-constrained sequential patterns are defined, and subsequently a mining method for finding gap-constrained sequential patterns over a data stream is proposed.