• Title/Summary/Keyword: Frequent Patterns

Search Result 571, Processing Time 0.028 seconds

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework (맵리듀스 프레임웍 상에서 맵리듀스 함수 호출을 최적화하는 순차 패턴 마이닝 기법)

  • Kim, Jin-Hyun;Shim, Kyu-Seok
    • The KIPS Transactions:PartD
    • /
    • v.18D no.2
    • /
    • pp.81-88
    • /
    • 2011
  • Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.

Discovery of Frequent Traversal Patterns from Weighted Traversals and Performance Enhancement by Traversal Split (가중치 순회로부터 빈발 순회패턴의 탐사 및 순회분할을 통한 성능향상)

  • Lee, Seong-Dae;Park, Hyu-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.5
    • /
    • pp.940-948
    • /
    • 2007
  • Many real world problems can be modeled as a graph and traversals on the graph. The structure of Web pages can be represented as a graph, for example, and user's navigation paths on the Web pages can be model as a traversal on the graph. It is interesting to discover valuable patterns, such as frequent patterns, from such traversals. In this paper, we propose an algorithm to discover frequent traversal patterns when a directed graph and weighted traversals on the graph are given. Furthermore, we propose a performance enhancement by traversal split and then verify it through experiments.

Time and Spatial Distribution of Probabilistic Typhoon Storms and Winds in Korean Peninsula (한반도에 내습한 태풍의 확률강우 및 풍속의 시공적 분포 특성)

  • 윤경덕;서승덕
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.36 no.3
    • /
    • pp.122-134
    • /
    • 1994
  • The objective of this study is to provide with the hydrometeological and probabilistic characteristics of the storms and winds of typhoons that have been passed through the Korea peninsula during the last twenty-three years since 1961. The paths and intensities of the typhoons were analyzed. Fifty weather stations were selected and the rainfall and wind data during typhoon periods were collected. Rainfall data were analyzed for the patterns and probabilistic distributions. The results were presented to describe the areal distributions of probabilistic characteristics. Wind data were also analysed for their probabilistic distributions. The results obtained from this study can be summarized as follows: 1. The most frequent typhoon path that have passed through the Korean peninsula was type E, which was followed by types CWE, W, WE, and S. The most frequent typhoon intensity was type B, that was followed by A, super A, and C types, respectively. 2. The third quartile typhoon rainfall patterns appear most frequently followed by the second, first, and quartiles, respectively, in Seoul, Pusan, Taegu, Kwangju and Taejon. The single typhoon rainfalls with long rainfall durations tended to show delayed type rainfall patterns predominantly compared to the single rainfalls with short rainfall durations. 3. The most frequent probabilistic distribution for typhoon rainfall event is Pearson type-III, followed by Two-parameter lognormal distribution, and Type-I extremal distribution. 4. The most frequent probability distribution model of seashore location was Pearson type-III distribution. The most frequent probability distribution model of inland location was two parameter lognormal distribution. 5. The most frequent probabilistic distribution for typhoon wind events was Type-I xtremal distribution, followed by Two-parameter lognormal distribution, and Normal distribution.

  • PDF

Overview of frequent pattern mining

  • Jurg Ott;Taesung Park
    • Genomics & Informatics
    • /
    • v.20 no.4
    • /
    • pp.39.1-39.9
    • /
    • 2022
  • Various methods of frequent pattern mining have been applied to genetic problems, specifically, to the combined association of two genotypes (a genotype pattern, or diplotype) at different DNA variants with disease. These methods have the ability to come up with a selection of genotype patterns that are more common in affected than unaffected individuals, and the assessment of statistical significance for these selected patterns poses some unique problems, which are briefly outlined here.

Analysis on the Tattoo Patterns used among Tattoo-related Internet Communities - Focusing on the Domestic and International Web Sites - (타투 관련 인터넷 동호회 사이트에 나타난 타투 문양 분석 - 국내.외 사이트를 중심으로 -)

  • Chung, Kyung-Hee;Lee, Mi-Sook
    • Journal of the Korean Society of Costume
    • /
    • v.57 no.3 s.112
    • /
    • pp.1-13
    • /
    • 2007
  • The Purpose of this study is to analyze the kinds and positions of tattoo patterns on the body in tattoo-related internet communities and professional web sites. for this purpose, 1,892 tattoo patterns were analyzed by sex(man and woman). The results were as fellows; First, animal patterns(30.2%) occupied most, followed by character patterns(24.1%), geometric patterns(13.0%), natural patterns(10.3%), plant patterns(4.7%), mixed patterns(2.5%), and artificial patterns(2.2%). In patterns, dragon(10.3%) occupied most, followed by star(8.7%), trival(8.6%), woman(7.6%), skeleton(4.9%), and letter(4.8%). Second, men's preference to pattern groups included animal patterns(30.8%), character patterns (28.3%), geometric patterns (14.6%), and natural patterns(6.0%). Among patterns, dragon(13.4%) was the most frequent, followed by trival(10.9%), woman(10.7%), and skeleton(7.1%). Women's preference to patterns groups included animal patterns(31.4%), natural patterns(17.3%), character patterns(17.2%), geometric patterns(10.5%), and plant patterns(10.0%). Among patterns, star(15.3%) was the most frequent, followed by butter- fly(10.5%), elf(9.2%), and dragon(9.2%). Third, the positions of tattoos on the body included upper arm(26.6%), shoulder(10.8%), back(10.5%), the wrist(10.0%), the calf(7.5%), back bottom(7.0%) and the breast(6.3%). While men's preference to pattern positions included upper arm(38.2%), the wrist(13.7%), back(10.5%), the calf(9.4%), and shoulder(8.0%), women's preference to positions included back bottom(17.7%), shoulder(15.5%), back(10.5%), front bottom(8.2%), and the breast(7.8%).

CONSTRUCTING GENE REGULATORY NETWORK USING FREQUENT GENE EXPRESSION PATTERN MINING AND CHAIN RULES

  • Park, Hong-Kyu;Lee, Heon-Gyu;Cho, Kyung-Hwan;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.623-626
    • /
    • 2006
  • Group of genes controls the functioning of a cell by complex interactions. These interacting gene groups are called Gene Regulatory Networks (GRNs). Two previous data mining approaches, clustering and classification have been used to analyze gene expression data. While these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rule. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and detect gene expression patterns applying FP-growth algorithm. And then, we construct gene regulatory network from frequent gene patterns using chain rule. Finally, we validated our proposed method by showing that our experimental results are consistent with published results.

  • PDF

An Extended Dynamic Web Page Recommendation Algorithm Based on Mining Frequent Traversal Patterns (빈발 순회패턴 탐사에 기반한 확장된 동적 웹페이지 추천 알고리즘)

  • Lee KeunSoo;Lee Chang Hoon;Yoon Sun-Hee;Lee Sang Moon;Seo Jeong Min
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.9
    • /
    • pp.1163-1176
    • /
    • 2005
  • The Web is the largest distributed information space but, the individual's capacity to read and digest contents is essentially fixed. In these Web environments, mining traversal patterns is an important problem in Web mining with a host of application domains including system design and information services. Conventional traversal pattern mining systems use the inter-pages association in sessions with only a very restricted mechanism (based on vector or matrix) for generating frequent K-Pagesets. We extend a family of novel algorithms (termed WebPR - Web Page Recommend) for mining frequent traversal patterns and then pageset to recommend. We add a WebPR(A) algorithm into a family of WebPR algorithms, and propose a new winWebPR(T) algorithm introducing a window concept on WebPR(T). Including two extended algorithms, our experimentation with two real data sets, including LadyAsiana and KBS media server site, clearly validates that our method outperforms conventional methods.

  • PDF

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Frequently Occurred Information Extraction from a Collection of Labeled Trees (라벨 트리 데이터의 빈번하게 발생하는 정보 추출)

  • Paik, Ju-Ryon;Nam, Jung-Hyun;Ahn, Sung-Joon;Kim, Ung-Mo
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.65-78
    • /
    • 2009
  • The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns from them. Because mining frequent tree patterns has a wide range of applications such as xml mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) the computationally high cost of the candidate maintenance, (3) the repetitious input dataset scans, and (4) the high memory dependency. These problems stem from that most of these algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in their algorithms. To solve the problems, we base a pattern-growth approach rather than the apriori approach, and choose to extract maximal frequent subtree patterns instead of frequent subtree patterns. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.

  • PDF

Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events

  • Ashok Kumar, P.M.;Vaidehi, V.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.1
    • /
    • pp.169-189
    • /
    • 2015
  • Detection of anomalous events from video streams is a challenging problem in many video surveillance applications. One such application that has received significant attention from the computer vision community is traffic video surveillance. In this paper, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting spatio-temporal abnormal events (such as a traffic violation at junction) from sequences of video streams. The proposed approach relies mainly on spatial abstractions of each object, mining frequent temporal patterns in a sequence of video frames to form a regular temporal pattern. In order to detect each object in every frame, the input video is first pre-processed by applying Gaussian Mixture Models. After the detection of foreground objects, the tracking is carried out using block motion estimation by the three-step search method. The primitive events of the object are represented by assigning spatial and temporal symbols corresponding to their location and time information. These primitive events are analyzed to form a temporal pattern in a sequence of video frames, representing temporal relation between various object's primitive events. This is repeated for each window of sequences, and the support for temporal sequence is obtained based on LC-STP to discover regular patterns of normal events. Events deviating from these patterns are identified as anomalies. Unlike the traditional frequent item set mining methods, the proposed method generates maximal frequent patterns without candidate generation. Furthermore, experimental results show that the proposed method performs well and can detect video anomalies in real traffic video data.