• Title/Summary/Keyword: Closed Sequential Pattern

Search Result 4, Processing Time 0.019 seconds

An Efficient Mining for Closed Frequent Sequences (효율적인 닫힌 빈발 시퀀스 마이닝)

  • Kim, Hyung-Geun;Whang, Whan-Kyu
    • Journal of Industrial Technology
    • /
    • v.25 no.A
    • /
    • pp.163-173
    • /
    • 2005
  • Recent sequential pattern mining algorithms mine all of the frequent sequences satisfying a minimum support threshold in a large database. However, when a frequent sequence becomes very long, such mining will generate an explosive number of frequent sequence, which is prohibitively expensive in time. In this paper, we proposed a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm extends the sequence by depth-first search strategy with effective pruning. Using bitmap representation of underlying databases, we can obtain a closed frequent sequence considerably faster than the currently reported methods.

  • PDF

Mining Frequent Closed Sequences using a Bitmap Representation (비트맵을 사용한 닫힌 빈발 시퀀스 마이닝)

  • Kim Hyung-Geun;Whang Whan-Kyu
    • The KIPS Transactions:PartD
    • /
    • v.12D no.6 s.102
    • /
    • pp.807-816
    • /
    • 2005
  • Sequential pattern mining finds all of the frequent sequences satisfying a minimum support threshold in a large database. However, when mining long frequent sequences, or when using very low support thresholds, the performance of currently reported algorithms often degrades dramatically. In this paper, we propose a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm generates the candidate sequences by depth-first search strategy in order to effectively prune. using bitmap representation of underlying databases, we can effectively calculate supports in terms of bit operations and prune sequences in much less time. Performance study shows that our algorithm outperforms the previous algorithms.

Defining the Boundary of Estuarine Management Zone for Estuarine Environmental Management (하구 환경관리를 위한 관리구역 경계 설정방안)

  • Lee, Kang-Hyun;Cho, Hyun-Jeong;Rho, Baik-Ho;Lee, Chang-Hee
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.17 no.4
    • /
    • pp.203-224
    • /
    • 2012
  • Definition of estuary and its administrative boundaries is needed for the practical management of estuarine environment. However, the majority of Korean estuaries are lack of scientific data required for defining the administrative boundaries. For this reason, a systematic way to set the boundaries has not been developed so far. This study proposed adaptive and comprehensive criteria in defining the boundary of an estuary based on the available data which includes physiochemical, geographical and topographical characteristics and regional data such as land uses and socio-economic conditions. An estuary boundary is defined with a sequential manner. First of all, according to the estuarine circulation pattern, 463 estuaries in Korea were classified into open estuaries and closed estuaries. Then an individual belonging to each type of estuary is defined an water zone boundary considering the physiochemical, geographical and topographical characteristics and land uses. Finally, a land zone boundary is set along the catchment. According to the proposed criteria, we have delineated 274 estuarine management zones in Korea on a trial basis and found that it was possible to set a boundary reasonably considering the difference of level of available data and regional characteristics in each estuary.

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining (베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템)

  • Yun, Jiyoung;Shin, Gun-Yoon;Kim, Dong-Wook;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.77-87
    • /
    • 2021
  • With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.