• Title/Summary/Keyword: Workflow Clustering

Search Result 10, Processing Time 0.011 seconds

A MapReduce-Based Workflow BIG-Log Clustering Technique (맵리듀스기반 워크플로우 빅-로그 클러스터링 기법)

  • Jin, Min-Hyuck;Kim, Kwanghoon Pio
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.87-96
    • /
    • 2019
  • In this paper, we propose a MapReduce-supported clustering technique for collecting and classifying distributed workflow enactment event logs as a preprocessing tool. Especially, we would call the distributed workflow enactment event logs as Workflow BIG-Logs, because they are satisfied with as well as well-fitted to the 5V properties of BIG-Data like Volume, Velocity, Variety, Veracity and Value. The clustering technique we develop in this paper is intentionally devised for the preprocessing phase of a specific workflow process mining and analysis algorithm based upon the workflow BIG-Logs. In other words, It uses the Map-Reduce framework as a Workflow BIG-Logs processing platform, it supports the IEEE XES standard data format, and it is eventually dedicated for the preprocessing phase of the ${\rho}$-Algorithm that is a typical workflow process mining algorithm based on the structured information control nets. More precisely, The Workflow BIG-Logs can be classified into two types: of activity-based clustering patterns and performer-based clustering patterns, and we try to implement an activity-based clustering pattern algorithm based upon the Map-Reduce framework. Finally, we try to verify the proposed clustering technique by carrying out an experimental study on the workflow enactment event log dataset released by the BPI Challenges.

Workflow Clustering Methodology Using Structural Similarity Metrics (프로세스 유사성을 이용한 워크플로우 클러스터링)

  • Jung, Jae-Yoon;Bae, Joonsoo;Kang, Suk-Ho
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.33 no.1
    • /
    • pp.99-109
    • /
    • 2007
  • To realize process-driven management, so many companies have been launching business process managementsystems. Business process is collection of standardized and structured tasks inducing value creation of acompany. Moreover, it is recognized as one of significant intangible business assets to achieve competitiveadvantages. This research introduces a novel approach of workflow process analysis, which has more and moresignificance as process-aware information systems are spreading widely into a lot of companies, In this paper, amethodology of workflow clustering based on process similarity has been proposed. The purpose of workflowclustering is to analyze accumulated process definitions in order to assist design of new processes andimprovement of existing ones. The proposed methodology exploits measures of structural similarity of workflowprocesses.The methodology has been experimented with synthetic process models for illustrating the implicationofworkflow clustering.

Scalability Estimations of a Workcase-based Workflow Engine (워크케이스 기반 워크플로우 엔진의 초대형성 성능 평가)

  • Ahn, Hyung-Jin;Park, Min-Jae;Lee, Ki-Won;Kim, Kwang-Hoon
    • Journal of Internet Computing and Services
    • /
    • v.9 no.6
    • /
    • pp.89-97
    • /
    • 2008
  • Recently, many organizations such as companies or institutions have demanded induction of very large-scale workflow management system in order to process a large number of business-instances. Workflow-related vendors have focused on physical extension of workflow engines based on device-level clustering, so as to provide very large-scale workflow services. Performance improvement of workflow engine by simple physical-connection among computer systems which don't consider logical-level software architecture lead to wastes of time and cost for construction of very large-scale workflow service environment. In this paper, we propose methodology for performance improvement based on logical software architectures of workflow engine. We also evaluate scalable performance between workflow engines using the activity instance based architecture and workcase based architecture, our proposed architecture. Through analysis of this test's result, we can observe that software architectures to be applied on a workflow engine have an effect on scalable performance.

  • PDF

Design and Implementation of a Very Large-Scale Workflow Management System (초대형 워크플로우 관리 시스템의 설계 및 구현)

  • Ahn, Hyung-Jin;Kim, Kwang-Hoon
    • Journal of Internet Computing and Services
    • /
    • v.10 no.6
    • /
    • pp.205-217
    • /
    • 2009
  • Recently, many organizations such as companies or institutions have demanded induction of very large-scale workflow management system in order to process a large number of business-instances. Workflow vendors have focused on physical extension of workflow engines based on device-level clustering, so as to provide very large-scale workflow services. Performance improvement of workflow engine by simple physical-connection among computer systems which don't consider logical-level software architecture leads to wastes of time or cost for construction of very large-scale workflow service environment. In this paper, we propose workcase-based workflow architecture and implement a very large-scale workflow management system based on the architecture. We prove that software architectures to be applied on a workflow engine have an effect on scalability and performance through workcase response-time evaluation of our proposed system.

  • PDF

Improving Process Mining with Trace Clustering (자취 군집화를 통한 프로세스 마이닝의 성능 개선)

  • Song, Min-Seok;Gunther, C.W.;van der Aalst, W.M.P.;Jung, Jae-Yoon
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.34 no.4
    • /
    • pp.460-469
    • /
    • 2008
  • Process mining aims at mining valuable information from process execution results (called "event logs"). Even though process mining techniques have proven to be a valuable tool, the mining results from real process logs are usually too complex to interpret. The main cause that leads to complex models is the diversity of process logs. To address this issue, this paper proposes a trace clustering approach that splits a process log into homogeneous subsets and applies existing process mining techniques to each subset. Based on log profiles from a process log, the approach uses existing clustering techniques to derive clusters. Our approach are implemented in ProM framework. To illustrate this, a real-life case study is also presented.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

The application of machine learning for the prognostics and health management of control element drive system

  • Oluwasegun, Adebena;Jung, Jae-Cheon
    • Nuclear Engineering and Technology
    • /
    • v.52 no.10
    • /
    • pp.2262-2273
    • /
    • 2020
  • Digital twin technology can provide significant value for the prognostics and health management (PHM) of critical plant components by improving insight into system design and operating conditions. Digital twinning of systems can be utilized for anomaly detection, diagnosis and the estimation of the system's remaining useful life in order to optimize operations and maintenance processes in a nuclear plant. In this regard, a conceptual framework for the application of digital twin technology for the prognosis of Control Element Drive Mechanism (CEDM), and a data-driven approach to anomaly detection using coil current profile are presented in this study. Health management of plant components can capitalize on the data and signals that are already recorded as part of the monitored parameters of the plant's instrumentation and control systems. This work is focused on the development of machine learning algorithm and workflow for the analysis of the CEDM using the recorded coil current data. The workflow involves features extraction from the coil-current profile and consequently performing both clustering and classification algorithms. This approach provides an opportunity for health monitoring in support of condition-based predictive maintenance optimization and in the development of the CEDM digital twin model for improved plant safety and availability.

A Study on identifying Common/Uncommon Components and clustering Common Components through Extended Workflow Mechanism (확장된 워크플로우 메커니즘을 통한 공통/비공통 컴포넌트 식별 및 공통 컴포넌트의 클러스터링에 관한 연구)

  • Kim, Yun-Jeong;Kim, R. Young-Chul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.199-202
    • /
    • 2004
  • 레거시 시스템을 위한 기존의 도메인 분석의 문제점을 해결하기 위하여 동적인 모델링인 확장된 워크플로우 메커니즘을 기반으로 하는 도메인 분석 방법을 제안하고자 한다. 이 도메인 분석을 통해 공통/비공통의 프로세스 컴포넌트 식별 및 공통 프로세스 컴포넌트들의 클러스터를 추출하고 마지막 단계에서 UML 기법으로 컴포넌트 내의 객체를 추출할 수 있다. 또한 제안한 컴포넌트 가중치 측정 매트릭스에 적용해 사용 빈도수가 많거나 중요한 컴포넌트 및 컴포넌트 클러스터를 찾는 방법을 제시하고자 한다.

  • PDF

Workflow Task Clustering Method Considering Available Resources in Cloud Environments (클라우드 환경에서 가용 자원 활용도를 고려한 워크플로우 작업 클러스터링 기법)

  • Myung, Rohyoung;Jung, Daeyong;Chung, KwangSik;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.160-163
    • /
    • 2015
  • 워크플로우 매니지먼트시스템은 오늘날의 어플리케이션들의 처리를 위한 효율적인 워크플로우 설계와 수행을 가능하게 한다. 그러나 전체물리학, 생물학, 지질학과 같이 과학탐구에 목적을 둔 어플리케이션들의 경우 대용량의 데이터를 연산해야 하기 때문에 단일 컴퓨팅 자원으로는 단 시간내에 작업을 완료하기 어렵다. 클라우드 환경에서 워크플로우를 효율적으로 수행하기 위해서는 여러 자원을 효율적으로 활용하기 위한 분산 병렬처리가 필수적이다. 일반적으로 시스템의 마스터노드에서는 클러스터의 원격노드들에게 어플리케이션 수행을 위해 설계된 워크플로우에 맞게 작업들을 분배하게 되는데 이때 마스터노드와 원격노드의 큐에서의 대기시간과 원격노드에서 할당된 작업들을 위한 스케줄링 시간은 성능을 좋지 않게 만드는 원인이 된다. 따라서 본 논문은 클라우드 환경에서 원격노드에서 작업수행이전까지의 지연시간을 줄이기 위한 최적화 방법으로 컴퓨팅 자원 활용도를 고려한 작업들의 병합 기법을 적용해서 워크플로우의 처리 속도를 향상시킨다.

Patent data analysis using clique analysis in a keyword network (키워드 네트워크의 클릭 분석을 이용한 특허 데이터 분석)

  • Kim, Hyon Hee;Kim, Donggeon;Jo, Jinnam
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1273-1284
    • /
    • 2016
  • In this paper, we analyzed the patents on machine learning using keyword network analysis and clique analysis. To construct a keyword network, important keywords were extracted based on the TF-IDF weight and their association, and network structure analysis and clique analysis was performed. Density and clustering coefficient of the patent keyword network are low, which shows that patent keywords on machine learning are weakly connected with each other. It is because the important patents on machine learning are mainly registered in the application system of machine learning rather thant machine learning techniques. Also, our results of clique analysis showed that the keywords found by cliques in 2005 patents are the subjects such as newsmaker verification, product forecasting, virus detection, biomarkers, and workflow management, while those in 2015 patents contain the subjects such as digital imaging, payment card, calling system, mammogram system, price prediction, etc. The clique analysis can be used not only for identifying specialized subjects, but also for search keywords in patent search systems.