• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.026 seconds

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Influence of explosives distribution on coal fragmentation in top-coal caving mining

  • Liu, Fei;Silva, Jhon;Yang, Shengli;Lv, Huayong;Zhang, Jinwang
    • Geomechanics and Engineering
    • /
    • v.18 no.2
    • /
    • pp.111-119
    • /
    • 2019
  • Due to certain geological characteristics (high thickness, rocky properties), some underground coal mines require the use of explosives. This paper explores the effects of fragmentation of different decks detonated simultaneously in a single borehole with the use of numerical analysis. ANSYS/LS-DYNA code was used for the implementation of the models. The models include an erosion criterion to simulate the cracks generated by the explosion. As expected, the near-borehole area was damaged by compression stresses, while far zones and the free surface of the boundary were subjected to tensile damage. With the increase of the number of decks in the borehole, different changes in the fracture pattern were observed, and the superposition effects of the stress wave became evident, affecting the fragmentation results. The superposition effect is more evident in close distances to the borehole, and its effect attenuates when the distance to the borehole increase.

User Access Patterns Discovery based on Apriori Algorithm under Web Logs (웹 로그에서의 Apriori 알고리즘 기반 사용자 액세스 패턴 발견)

  • Ran, Cong-Lin;Joung, Suck-Tae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.681-689
    • /
    • 2019
  • Web usage pattern discovery is an advanced means by using web log data, and it's also a specific application of data mining technology in Web log data mining. In education Data Mining (DM) is the application of Data Mining techniques to educational data (such as Web logs of University, e-learning, adaptive hypermedia and intelligent tutoring systems, etc.), and so, its objective is to analyze these types of data in order to resolve educational research issues. In this paper, the Web log data of a university are used as the research object of data mining. With using the database OLAP technology the Web log data are preprocessed into the data format that can be used for data mining, and the processing results are stored into the MSSQL. At the same time the basic data statistics and analysis are completed based on the processed Web log records. In addition, we introduced the Apriori Algorithm of Web usage pattern mining and its implementation process, developed the Apriori Algorithm program in Python development environment, then gave the performance of the Apriori Algorithm and realized the mining of Web user access pattern. The results have important theoretical significance for the application of the patterns in the development of teaching systems. The next research is to explore the improvement of the Apriori Algorithm in the distributed computing environment.

Constructing Gene Regulatory Networks using Frequent Gene Expression Pattern and Chain Rules (빈발 유전자 발현 패턴과 연쇄 규칙을 이용한 유전자 조절 네트워크 구축)

  • Lee, Heon-Gyu;Ryu, Keun-Ho;Joung, Doo-Young
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.9-20
    • /
    • 2007
  • Groups of genes control the functioning of a cell by complex interactions. Such interactions of gene groups are tailed Gene Regulatory Networks(GRNs). Two previous data mining approaches, clustering and classification, have been used to analyze gene expression data. Though these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rules. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and gene expression patterns we detected by applying the FP-growth algorithm. Next, we construct a gene regulatory network from frequent gene patterns using chain rules. Finally, we validate our proposed method through our experimental results, which are consistent with published results.

Mining High Utility Sequential Patterns Using Sequence Utility Lists (시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.51-62
    • /
    • 2018
  • High utility sequential pattern (HUSP) mining has been considered as an important research topic in data mining. Although some algorithms have been proposed for this topic, they incur the problem of producing a large search space for HUSPs. The tighter utility upper bound of a sequence can prune more unpromising patterns early in the search space. In this paper, we propose a sequence expected utility (SEU) as a new utility upper bound of each sequence, which is the maximum expected utility of a sequence and all its descendant sequences. A sequence utility list for each pattern is used as a new data structure to maintain essential information for mining HUSPs. We devise an algorithm, high sequence utility list-span (HSUL-Span), to identify HUSPs by employing SEU. Experimental results on both synthetic and real datasets from different domains show that HSUL-Span generates considerably less candidate patterns and outperforms other algorithms in terms of execution time.

Schema Mapping Method using Frequent Pattern Mining (빈발패턴을 이용한 스키마 매핑)

  • Chai, Duck Jin;Ban, Kyeong-Jin;Kim, Eung-Kon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.1
    • /
    • pp.93-101
    • /
    • 2010
  • Currently lots of studies to solve meta-data interoperability in between schema attributes are conducted. But the accuracy in previous schema mapping studies is low since the studies just use the similarity in between attributes. So the studies are not suitable for the schema mapping such as document conversion, system integration, etc. In this paper, we propose a method which can conduct the schema mapping interactively using frequent pattern mining. The method can conduct more accurate mapping process because the method use the description element which is an element among each schema element for the metadata standard. A performance study has been conducted to compare the accuracy performance of the method using metadata standards.

An Efficient Mining Algorithm for Generating Probabilistic Multidimensional Sequential Patterns (확률적 다차원 연속패턴의 생성을 위한 효율적인 마이닝 알고리즘)

  • Lee Chang-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.75-84
    • /
    • 2005
  • Sequential pattern mining is an important data mining problem with broad applications. While the current methods are generating sequential patterns within a single attribute, the proposed method is able to detect them among different attributes. By incorporating these additional attributes, the sequential patterns found are richer and more informative to the user This paper proposes a new method for generating multi-dimensional sequential patterns with the use of Hellinger entropy measure. Unlike the Previously used methods, the proposed method can calculate the significance of each sequential pattern. Two theorems are proposed to reduce the computational complexity of the proposed system. The proposed method is tested on some synthesized purchase transaction databases.

Analysis and Evaluation of Frequent Pattern Mining Technique based on Landmark Window (랜드마크 윈도우 기반의 빈발 패턴 마이닝 기법의 분석 및 성능평가)

  • Pyun, Gwangbum;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.101-107
    • /
    • 2014
  • With the development of online service, recent forms of databases have been changed from static database structures to dynamic stream database structures. Previous data mining techniques have been used as tools of decision making such as establishment of marketing strategies and DNA analyses. However, the capability to analyze real-time data more quickly is necessary in the recent interesting areas such as sensor network, robotics, and artificial intelligence. Landmark window-based frequent pattern mining, one of the stream mining approaches, performs mining operations with respect to parts of databases or each transaction of them, instead of all the data. In this paper, we analyze and evaluate the techniques of the well-known landmark window-based frequent pattern mining algorithms, called Lossy counting and hMiner. When Lossy counting mines frequent patterns from a set of new transactions, it performs union operations between the previous and current mining results. hMiner, which is a state-of-the-art algorithm based on the landmark window model, conducts mining operations whenever a new transaction occurs. Since hMiner extracts frequent patterns as soon as a new transaction is entered, we can obtain the latest mining results reflecting real-time information. For this reason, such algorithms are also called online mining approaches. We evaluate and compare the performance of the primitive algorithm, Lossy counting and the latest one, hMiner. As the criteria of our performance analysis, we first consider algorithms' total runtime and average processing time per transaction. In addition, to compare the efficiency of storage structures between them, their maximum memory usage is also evaluated. Lastly, we show how stably the two algorithms conduct their mining works with respect to the databases that feature gradually increasing items. With respect to the evaluation results of mining time and transaction processing, hMiner has higher speed than that of Lossy counting. Since hMiner stores candidate frequent patterns in a hash method, it can directly access candidate frequent patterns. Meanwhile, Lossy counting stores them in a lattice manner; thus, it has to search for multiple nodes in order to access the candidate frequent patterns. On the other hand, hMiner shows worse performance than that of Lossy counting in terms of maximum memory usage. hMiner should have all of the information for candidate frequent patterns to store them to hash's buckets, while Lossy counting stores them, reducing their information by using the lattice method. Since the storage of Lossy counting can share items concurrently included in multiple patterns, its memory usage is more efficient than that of hMiner. However, hMiner presents better efficiency than that of Lossy counting with respect to scalability evaluation due to the following reasons. If the number of items is increased, shared items are decreased in contrast; thereby, Lossy counting's memory efficiency is weakened. Furthermore, if the number of transactions becomes higher, its pruning effect becomes worse. From the experimental results, we can determine that the landmark window-based frequent pattern mining algorithms are suitable for real-time systems although they require a significant amount of memory. Hence, we need to improve their data structures more efficiently in order to utilize them additionally in resource-constrained environments such as WSN(Wireless sensor network).

Cause Analysis of Ship Rework Problems Based on Process Mining (프로세스 마이닝 기반의 선박 재작업 문제에 대한 원인 분석)

  • 신승훈;전정환
    • 산업혁신연구
    • /
    • v.34 no.4
    • /
    • pp.231-254
    • /
    • 2018
  • The purpose of this study is to analyze and remedy the irregular rework problems in shipbuilding industry. We collected and refined the rework event log of the SAP (ERP) outsourcing company of SPP shipbuilding. The process discovery and analysis were performed by fuzzy mining, pattern analysis and bottle neck analysis among process mining methods. The reworking of SPP shipbuilding was found to cause production rework and design change rework as ship owner 's requirement, design improvement, and field work were also causes of immaturity. In addition, method of case analysis result should be carried out in order to prevent PB(ship painting)-2A(ship owner request)-P1(re-work of production) which is the most common pattern in rework process. First, in the shipbuilding industry, the reworking process model and the analysis of the cause of rework contributed to prevention and minimization of rework, productivity improvement and cost reduction. Second, the process mining methodology is analyzed and the guidelines for future research are presented. Finally, it has a significance in that it contributes to improvement of business ability by systematically showing process model, problems and causes to enterprise managers.

Adaptive Frequent Pattern Algorithm using CAWFP-Tree based on RHadoop Platform (RHadoop 플랫폼기반 CAWFP-Tree를 이용한 적응 빈발 패턴 알고리즘)

  • Park, In-Kyu
    • Journal of Digital Convergence
    • /
    • v.15 no.6
    • /
    • pp.229-236
    • /
    • 2017
  • An efficient frequent pattern algorithm is essential for mining association rules as well as many other mining tasks for convergence with its application spread over a very broad spectrum. Models for mining pattern have been proposed using a FP-tree for storing compressed information about frequent patterns. In this paper, we propose a centroid frequent pattern growth algorithm which we called "CAWFP-Growth" that enhances he FP-Growth algorithm by making the center of weights and frequencies for the itemsets. Because the conventional constraint of maximum weighted support is not necessary to maintain the downward closure property, it is more likely to reduce the search time and the information loss of the frequent patterns. The experimental results show that the proposed algorithm achieves better performance than other algorithms without scarifying the accuracy and increasing the processing time via the centroid of the items. The MapReduce framework model is provided to handle large amounts of data via a pseudo-distributed computing environment. In addition, the modeling of the proposed algorithm is required in the fully distributed mode.