DOI QR코드

DOI QR Code

A Web-based System for Business Process Discovery: Leveraging the SICN-Oriented Process Mining Algorithm with Django, Cytoscape, and Graphviz

  • Thanh-Hai Nguyen (Thai Nguyen University) ;
  • Kyoung-Sook Kim (Contents Convergence Software Research Institute, Kyonggi University) ;
  • Dinh-Lam Pham (Contents Convergence Software Research Institute, Kyonggi University) ;
  • Kwanghoon Pio Kim (Division of AI Computer Science and Engineering, Kyonggi University)
  • Received : 2024.01.04
  • Accepted : 2024.08.04
  • Published : 2024.08.31

Abstract

In this paper, we introduce a web-based system that leverages the capabilities of the ρ(rho)-algorithm, which is a Structure Information Control Net (SICN)-oriented process mining algorithm, with open-source platforms, including Django, Graphviz, and Cytoscape, to facilitate the rediscovery and visualization of business process models. Our approach involves discovering SICN-oriented process models from process instances from the IEEE XESformatted process enactment event logs dataset. This discovering process is facilitated by the ρ-algorithm, and visualization output is transformed into either a JSON or DOT formatted file, catering to the compatibility requirements of Cytoscape or Graphviz, respectively. The proposed system utilizes the robust Django platform, which enables the creation of a userfriendly web interface. This interface offers a clear, concise, modern, and interactive visualization of the rediscovered business processes, fostering an intuitive exploration experience. The experiment conducted on our proposed web-based process discovery system demonstrates its ability and efficiency showing that the system is a valuable tool for discovering business process models from process event logs. Its development not only contributes to the advancement of process mining but also serves as an educational resource. Readers, students, and practitioners interested in process mining can leverage this system as a completely free process miner to gain hands-on experience in rediscovering and visualizing process models from event logs.

Keywords

1. Introduction

In today's Industry 4.0, organizations must continuously innovate and optimize their business processes to remain competitive in a rapidly evolving business environment. Business process discovery and mining tools have emerged as essential support systems for organizations to achieve these goals [1]. Business process mining miners are software systems that leverage process mining techniques to extract data from event logs generated from process-aware information systems [2]. Process discovery tools enable organizations to understand their current business processes better, identify improvement areas, and make process-driven decisions [3] to improve their operations. They allow the discovery of hidden patterns and complicated relationships in the workflow log to apprehend more information about what is happening in the organization [4], leading to increased transparency, reduced costs, and improved organizational performance [5].

In business process discovery, an increasing number tools support various aspects of business operations, such as ProM, BPMN.io, Celonis, and Disco, etc. Business process discovery engines use techniques and algorithms for process mining, such as the sigma-algorithm [6], ρ-algorithm [7], Alpha Miner [1], Inductive Miner [8], and Heuristic Miner [9]. These tools use modelling languages to represent process models, including Petri net, Business Process Model and Notation, Process Tree, or Directly-Follows Graph [4], which is one of the most well-known modelling languages used in modelling business processes. Besides Petri net [10], the Information Control Net (ICN) model [11] is also a popular approach that enables the representation and analysis of business processes.

This paper proposes a web-based business process discovery system that supports the mining process model based on the ρ-algorithm, which is a SICN-oriented process mining algorithm [7]. The algorithm was verified by the functional correctness of a specific process mining algorithm [12] and validated the requirement satisfaction can discover all the SICN-oriented process patterns, such as linear(sequential), disjunctive (selective-OR), conjunctive (parallel-AND), and repetitive (iterative-LOOP) process patterns, from process enactment event logs, and to eventually build a structured business process model by assembling all the discovered structural process patterns. We deployed this tool using the Django Python [13] web platform, which provides a flexible and extensible platform for data management and building interactive web applications. To visualize the SICN-oriented process models generated by the system, we used Graphviz [14] and Cytoscape [15], two open-source graph visualization tools. Graphviz allows us to visualize the process model as high-quality images in the SVG format. Cytoscape provides a platform to visualize the process model interactively.

The core contribution of the paper is to build a web application to discover the SICN-oriented business process model from process enactment XES IEEE event logs [16]. The web-based nature of the proposed system makes it easy to access and expand, making it a robust tool for personnel or organizations seeking to improve their business operations. This paper describes the proposed web application’s architecture, functionality, methodology, and results to explore the discovery process. The rest of the paper is organized as follows: Section 2 discusses background, that as the SICN-oriented process model and ρ-algorithm. Section 3 presents the architecture of the web-based Business Process Discovery System. We describe the experiments and results on the dataset as an assessment of the system's ability by discovering the process model obtained from the process event log in Section 4. Finally, we conclude this work and discuss future research directions in section 5.

2. Background

2.1 Information Control Nets

The ICN model [11] is a popular approach that enables the representation and analysis of business processes. There are four types of process control flow primitive patterns in the ICN-Based process model as shown in Fig. 1. In control-flow primitive patterns [11][17], the circles with the labels inside represent the activity (for example, here αA, αB, αC, αD), and the nodes we call Gateway-Transition, including the OPEN-Transition (Open or Split Gateway Activity) and the corresponding CLOSE-Transition (Close or Join Gateway Activity). In the sequential pattern, the activity follows the activity in sequential order. There is no Gateway-Transition in the sequential pattern. In the conjunctive pattern, an activity that has a conjunctive (or parallel) Gateway-Transition (AND Gate) is graphically represented by a black dot, followed by two or more activities. All the following activities will be performed next. In the disjunctive pattern, an activity has a disjunctive (or decision) Gateway-Transition (OR Gate) is represented graphically by a white dot followed by two or more activities. Only one of the following activities will be performed next. In the repetitive pattern, an activity has an iterative Gateway-Transition (LOOP Gate) represented graphically by a double empty dot. The activities inside the loop area will be performed repeatedly. In the SICN process modeling methodology [7], the definition of structural formation means the SICN model is satisfied by keeping the model matched pairing property and the proper nesting property.

E1KOBZ_2024_v18n8_2316_3_f0001.png 이미지

Fig. 1. The control-flow primitives of the ICN model

2.2 The SICN-oriented process mining algorithm

In the field of process mining, ρ-algorithm [7] is a SICN-oriented process mining algorithm for revealing process models based on the SICN model drawn from the process event log. The ρ-algorithm consists of three distinct steps. These steps are clarified as follows:

• Step 1: The first critical stage of ρ-algorithm is creating groups of Adjacent-Activity pairs arranged chronologically. This task consists of extracting a series of pairs of adjacent activities, strictly following their chronological order in temporary workgroups of the event log associated with process instances.

• Step 2: The Weighted Adjacent-Activity Set and the corresponding Weighted Process Pattern Graph are elaborated in the second step. This step involves methodically assembling all groups of adjacent pairs of activities arranged in chronological order. Each such group corresponds to a specific process instance. This phase creates a comprehensive set of weighted activities, expressed as a Weighted Process Pattern Graph.

• Step 3: The last step is dedicated to the comprehensive discovery of all structural process patterns, leading to the construction of the SICN model. During this step, the algorithm builds the SICN-oriented process model, synthesizing detailed information from all groups of Adjacent-Activity pairs.

3. System architecture

3.1 General architecture

Our system, illustrated in Fig. 2, consists of two main components. The first component is the back-end which extracts information from the event log, then implements the ρ-algorithm to build a SICN-oriented process model. The second component is the front-end performs modeling of process models and statistical results, and data analysis.

E1KOBZ_2024_v18n8_2316_4_f0001.png 이미지

Fig. 2. The architecture of the web-based system.

In the first component, the application back-end is built using Django, a powerful web platform that handles requests, database operations, and server-side logic. To allow users to upload XES-IEEE Formatted file event log data, we have integrated the Django with a file upload library, providing safe and efficient file handling. Python programming language is used to programmatically build a SICN-oriented process model using the ρ-algorithm and then convert the data to a JSON and DOT data file for Cystoscape and Graphviz to generate the respective model as a graph.

The second component, the application's user interface, is built using HTML, CSS, and JavaScript, providing a visualization and analysis view of the discovered process model, allowing users to explore the SICN-oriented process models in various ways, such as drag and drop nodes and highlighting specific nodes or edges. The graph is displayed in the HTML canvas element, allowing for smooth and responsive interaction. The process model is also visualization as a static image that can be downloaded in SVG format or as a single DOT language file. Our web application architecture provides a flexible and scalable solution for visualizing and analyzing process models. By leveraging the power of Django, Graphviz, Cytoscape, HTML, CSS, and JavaScript, we created a functional and visually appealing system that allows users to explore and understand process models in new and exciting ways.

3.2 Cytoscape Supported Business Process Discovery

Cytoscape is integrated into our front-end component, providing an interactive platform for users to visualize and explore the process model. To generate the graph data, we used Python to create a JSON (Python-generated JSON) file containing the process model information as a graph. The JSON file contains the nodes and edges of the process model, along with their associated properties, such as id, label, weights and style. The following describes the way data was created and stored in The Python-generated JSON file.

To generate the graph data, we used Python to create a JSON file containing the process model information in the form of a graph. Using JSON allows us to easily store and transmit chart data in a structured format, from which we can visualize process models through Cytoscape with our web application. The Python-generated JSON file in our web application follows a specific structure compatible with Cytoscape. The structure includes nodes, edges, weights, labels, and directions. Nodes are objects with unique identifiers, which can be numeric or alphanumeric. Each node can have a series of attributes, such as identified, label, occurrence. For example, a NODE object might look like the following:

nodes: {

"data": {

"id": "Activity ID",

"label": "Activity name",

"occurrence": 100}

}

Meanwhile, edges are defined as objects with unique source and destination node identifiers. Like nodes, edges can have many properties, such as id, source, target, and weight. For example, an edge object might look like this:

edges: {

"data": {

"id": "edge id",

"source": "node1",

"target": "node2",

"weight": 10}

}

In which, weights are defined as numeric values representing the magnitude or importance of a node or edge. Weights can be used to control the appearance of a graph, for example, by adjusting the size or color of nodes or edges based on their weights. Labels are string values that provide a human-readable name or description for a node or edge. Labels can be used to help users understand the structure and properties of the graph. Directed edges have a source node and a destination node, and an arrow indicates the direction of the edge. Scalar edges have no direction and are represented as lines.

The Cytoscape style used in our web application includes a wide range of visualization properties to help users interpret and understand the structure and properties of the process models. We use several types of nodes and edges to represent different entities and relationships in the network.

Node types:

- Blue circle: We use circular node shapes to represent activities.

- The circle with a white background represents OR gates in a SICN-oriented process model.

- Circle with a black background: Used to represent AND gates in SICN-oriented process model.

- The double circle with a white background: Used to represent LOOP gates in a SICNoriented process model.

Edge types:

- Black edge: The arc representing the normal connection from the source node to the targeted node in the process pattern graph.

- Pink edge: When two arcs connect nodes a and b and vice versa in the process pattern graph, node b with node a, we will show these two arcs as pink. This could be a normal arc, but it could also be a deliberate noise [7] arc generated at step 2 of the ρ-algorithm.

- Green edge: Representing self-loop arcs of the process pattern graph.

3.3 Graphviz Supported Business Process Discovery

Graphix [14] is also a popular open-source graph visualization software that visually represents the structural information of abstract graphs and networks. In our web application, we use Graphviz in the back-end component because it generates diagrams based on the input provided by the user interface.

The front-end interface allows users to enter event log data, conFig. display layout options, and send data to the back-end for processing. In contrast, the back-end generates output and sends it back to the front-end for display. The Graphviz dot file format is a text-based file format that describes the layout and styling of a chart. The file consists of statements defining graph, node, and edge properties. For example, a basic Graphviz dot file content look like as follow:

digraph G {

node [style = filled]

"START:2" -> "OR_Open_1:2"[label="2"]

"B:1" -> "D:1"[label="1"]

"D:1" -> "OR_Close_1:2"[label="1"]

"A:1" -> "C:1"[label="1"]

"C:1" -> "OR_Close_1:2"[label="1"]

"OR_Open_1:2" -> "B:1"[label="1"]

"OR_Open_1:2" -> "A:1"[label="1"]

"OR_Close_1:2" -> "END:2"[label="2"]

"START:2"[width=0.3, shape=circle, fixedsize=true, label="", fillcolor=green]

"OR_Open_1:2"[width=0.3, shape=circle, fixedsize=true, label="", fillcolor=white]

"OR_Close_1:2"[width=0.3, shape=circle, fixedsize=true, label="", fillcolor=white]

"END:2"[width=0.3, shape=circle, fixedsize=true, label="", fillcolor=teal]

}

In the example, we define a directed graph (digraph) and specify the properties of the nodes and edges. The button statement sets the shape, style, and color of the buttons, while the edge statement sets the color and arrowheads of the edges. Finally, we define the edges between the nodes using the arrow (->) notation.

The difference between Graphviz and Cytoscape here is that the first one automatically displays the process graph as a static image, and the second one supports the user to interactively drag and drop the nodes and edges of the process model in our web application.

4. Experimental Results

4.1 Dataset

To demonstrate and describe the results and evaluate the system's effectiveness, we used the publicly available Teleclaims dataset [18]. This dataset contains information about an insurance company's claims handling process and is commonly used in process mining research. The dataset contains more than 46138 events from over 3512 instances, each containing attributes such as case ID, activity name, timestamp, and resource ID.

4.2 The functionals of the system

In terms of functionals, the system provides six main functions, described as follows:

• Data Importing: This function is the starting point of the system to import the raw input dataset formed in the XES formats as depicted in Fig. 3. The system will verify whether the input file format is by IEEE-XES standards or not, if there is validation is successful, the system will upload the file and extract information for the next stages.

E1KOBZ_2024_v18n8_2316_7_f0001.png 이미지

Fig. 3. The data importing function of the system.

• Data Statistics and Analysis: The statistics function conducts data statistics based on the XES uploaded file. We use data sorting, searching, and filtering functions to perform statistical measurement and generate related reports with traces, events, activities, performers, and others inside the process enactment event log dataset. The analysis function analyzes temporal workcases [6] after they were imported from the XES dataset. The information extracted from the event log will give the knowledge of the uptime of events, information about activities, performers, and other relevant information. We can perform analysis of many different aspects of the data through aggregate functions. Such as, we can analyze which activities were performed over and over again, which can be the cause of bottlenecks in the business/workflow. Fig. 4 depicts the data statistics and analysis functions of the proposed system.

E1KOBZ_2024_v18n8_2316_8_f0001.png 이미지

Fig. 4. The data statistic and analysis functions of the system.

• Graph Visualizing and Manipulating: Fig. 5 depicts the visualizing and manipulating functions of the proposed implementing system. This function uses two powerful opensource platforms, Cytoscape and Graphviz, for graph visualization and analysis. Using these platforms, the system helps us generate and visualize the process pattern as a graph from the result of the implementation of the ρ-algorithm. The system also allows users to have interesting and vivid experiences by directly manipulating process models as graphs in an advantageous useful and easy way. Users can manipulate the elements on the process graph and make control adjustments to the parameters of the process model and the structured information control net model.

E1KOBZ_2024_v18n8_2316_9_f0001.png 이미지

Fig. 5. The visualizing and manipulating functions of the system.

• Data Exporting: Finally, after performing data statistics, data analysis, graph visualization, and graph manipulation, the user can now export the information for further reports or research. For statistics and analysis, the system supports exporting data statistics and data analysis to an Excel file, as depicted in Fig. 6. For the process model, the system supports exporting to a text file in JSON and DOT format or exporting as an image file in SVG format.

E1KOBZ_2024_v18n8_2316_10_f0001.png 이미지

Fig. 6. The exporting function of the system.

4.3 Result of discovering SICN-oriented process model

The SICN-oriented process model is the primary output of our system and represents the flow of information in a business process. Fig. 7, 8 and 9 shows the SICN-oriented process model generated by our system. In which, Fig. 7 depicts the process pattern in Cytoscape format, Fig. 8 depicts the SICN-oriented process model in Cytoscape format and Fig. 9 depicts the SICN-oriented process model in Graphviz format. The obtained SICN-oriented process model provided a detailed view of the relationship between the activities from the control-path point of view regarding the process, helping reveal some key insights into the Teleclaims process. We can easily observe the flow of the process and identify several activities that are closely related and seem to be very important to the process, such as these activities "check if sufficient information is available" at two facilities B and S. Or, by observe AND-gate Transition in the generated graph, we can easily realize the activities "initiate payment activities"; "close claim"; "advise claim" on the process be made simultaneously through in the process. Fig. 10 and Fig. 11 show the corresponding JSON files and DOT files respectively, generated by the system to represent the SICN-oriented process model discovered from the Teleclaims event log.

E1KOBZ_2024_v18n8_2316_11_f0001.png 이미지

Fig. 7. The discovered process pattern in Cytoscape format by the system.

E1KOBZ_2024_v18n8_2316_11_f0002.png 이미지

Fig. 8. The discovered SICN-oriented process model in Cytoscape format by the system.

E1KOBZ_2024_v18n8_2316_12_f0001.png 이미지

Fig. 9. The discovered SICN-oriented process model in Graphviz format by the system.

E1KOBZ_2024_v18n8_2316_13_f0001.png 이미지

Fig. 10. The content of the JSON file generated by the system.

E1KOBZ_2024_v18n8_2316_14_f0001.png 이미지

Fig. 11. The content of the DOT file generated by the system.​​​​​​​

5. Conclusion

In conclusion, this paper introduces a web-based business process discovery system that leverages the SICN-oriented process mining algorithm. Developed with Django, Cytoscape, and Graphviz, the system facilitates the rediscovery and visualization of business process models from event logs. Experimental results on the Teleclaims dataset demonstrate the system's usability, relevance, and simplicity. The web-based application provides a user-friendly and accessible tool for individuals and organizations seeking to enhance their understanding of business processes. The system's versatility and scalability make it a valuable resource for students and practitioners interested in process mining, offering a simple yet powerful platform for learning and exploration. Our future research directions include expanding the system's functionality to incorporate deep learning models for workload predictions and resource allocation [19][20][21]. Our ongoing development aims to further enhance the system's capabilities and contribute to advancements in process mining and business process optimization.

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT, Ministry of Science and ICT), Republic of Korea (Grant No. NRF-2022R1A2C2093002).

References

  1. W. van der Aalst, T. Weijters, and L. Maruster, "Workflow mining: discovering process models from event logs," IEEE Trans. Knowl. Data Eng., vol.16, no.9, pp.1128-1142, 2004. https://doi.org/10.1109/TKDE.2004.47
  2. B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, and W. M. P. van der Aalst, "The ProM Framework: A New Era in Process Mining Tool Support," in Proc. of Applications and Theory of Petri Nets 2005. ICATPN 2005, Lect. Notes Comput. Sci., vol.3536, p.444-454, 2005.
  3. W. M. P. van der Aalst, "Decision Support Based on Process Mining," Handbook on Decision Support Systems 1: Basic Themes, Springer Berlin Heidelberg, pp.637-657, 2008.
  4. Wil M.P. van der Aalst, "Process Mining: A 360 Degree Overview," Process Mining Handbook. Lecture Notes in Business Information Processing, Springer, Cham, vol.448, pp.3-34, 2022.
  5. Deloitte, "Global Process Mining Survey 2021," Global Process Mining Survey 2021. p.36, 2021. [Online]. Available: https://www2.deloitte.com/kz/en/pages/risk/articles/global-process-mining-survey-2021.html
  6. K. Kwanghoon and C. A. Ellis, "σ-Algorithm: Structured Workflow Process Mining Through Amalgamating Temporal Workcases," Advances in Knowledge Discovery and Data Mining. PAKDD 2007, Lect. Notes Comput. Sci., LNAI, vol.4426, pp.119-130, 2007.
  7. K. S. Kim, D. L. Pham, and K. P. Kim, "ρ-Algorithm: A SICN-Oriented Process Mining Framework," IEEE Access, vol.9, pp.139852-139875, 2021. https://doi.org/10.1109/ACCESS.2021.3119011
  8. S. J. J. Leemans, "Inductive visual Miner manual," pp.1-16, 2017. [Online]. Available: http://promtools.org/
  9. A. J. M. M. Weijters, W. M. P. van der Aalst, and A. K. A. De Medeiros, "Process Mining with the HeuristicsMiner Algorithm," Eindhoven : Technische Universiteit Eindhoven, vol.166, 2006.
  10. J. L. Peterson, "Petri Nets," ACM Computing Surveys (CSUR), vol.9, no.3, pp.223-252, 1977. https://doi.org/10.1145/356698.356702
  11. K. H. Kim and C. A. Ellis, "ICN-Based Workflow Model and its Advances," Handb. Res. Bus. Process Model., pp.142-171, 2009.
  12. K. S. Kim, D. L. Pham, Y. I. Park, and K. P. Kim, "Experimental verification and validation of the SICN-oriented process mining algorithm and system," J. King Saud Univ. - Comput. Inf. Sci., vol.34, no.10, pp.9793-9813, 2022.
  13. Django Software Foundation, Django web framework. [Online]. Available: https://www.djangoproject.com/
  14. Graphviz, Graph Visualization platform. [Online]. Available: https://graphviz.org/
  15. Cytoscape Consortium, Cytoscape software platform. [Online]. Available: https://cytoscape.org/
  16. "IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams," IEEE Std 1849-2023 (Revision of IEEE Std 1849-2016), pp.1-55, Sep. 2023.
  17. M. Park and K. Kim, "Control-path Oriented Workflow Intelligence Analyses," J. Inf. Sci. Eng., vol.24, no.2, pp.343-359, 2008.
  18. "Teleclaim dataset, can be found in chapter 8.zip file download," [Online]. Available: https://processmining.org/oldversion/files/chapter 8.zip%0A
  19. M. S. Yeon, Y. K. Lee, D. L. Pham, and K. P. Kim, "Experimental Verification on Human-Centric Network-Based Resource Allocation Approaches for Process-Aware Information Systems," IEEE Access, vol.10, pp.23342-23354, 2022. https://doi.org/10.1109/ACCESS.2022.3152778
  20. A. Abid, M. F. Manzoor, M. S. Farooq, U. Farooq, and M. Hussain, "Challenges and Issues of Resource Allocation Techniques in Cloud Computing," KSII Trans. Internet Inf. Syst., vol.14, no.7, pp.2815-2839, 2020.
  21. D. L. Pham, H. Ahn, K. S. Kim, and K. P. Kim, "Process-Aware Enterprise Social Network Prediction and Experiment Using LSTM Neural Network Models," IEEE Access, vol.9, pp.57922-57940, 2021. https://doi.org/10.1109/ACCESS.2021.3071789