• Title/Summary/Keyword: Log Clustering

Search Result 72, Processing Time 0.032 seconds

PROCL:A Process Log Clustering System (PROCL:프로세스 로그 클러스터링 시스템)

  • Jung, Jae-Yoon
    • The Journal of Society for e-Business Studies
    • /
    • v.13 no.2
    • /
    • pp.181-194
    • /
    • 2008
  • Process mining aims at extracting useful information from system log of business process execution. As process-aware information systems, such as BPMS, ERP, and SCM, spread, researches on process mining get more significance. In this paper, we propose the methodology of clustering process log before process mining and also present the prototype system. The proposed methodology can be used in accompany with the existing process mining algorithms to improve their performance. The process log clustering system PROCLE, presented in this paper, supports to classify the process instances in the system log in order to extract the appropriate level of process model according to the users' need. The proposed methodology was implemented on the open platform for process mining, ProM.

  • PDF

Compositional data analysis by the square-root transformation: Application to NBA USG% data

  • Jeseok Lee;Byungwon Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.3
    • /
    • pp.349-363
    • /
    • 2024
  • Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be effective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.

Improving Process Mining with Trace Clustering (자취 군집화를 통한 프로세스 마이닝의 성능 개선)

  • Song, Min-Seok;Gunther, C.W.;van der Aalst, W.M.P.;Jung, Jae-Yoon
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.34 no.4
    • /
    • pp.460-469
    • /
    • 2008
  • Process mining aims at mining valuable information from process execution results (called "event logs"). Even though process mining techniques have proven to be a valuable tool, the mining results from real process logs are usually too complex to interpret. The main cause that leads to complex models is the diversity of process logs. To address this issue, this paper proposes a trace clustering approach that splits a process log into homogeneous subsets and applies existing process mining techniques to each subset. Based on log profiles from a process log, the approach uses existing clustering techniques to derive clusters. Our approach are implemented in ProM framework. To illustrate this, a real-life case study is also presented.

Clustering Character Tendencies found in the User Log of a Story Database Service and Analysis of Character Types (스토리 검색 서비스의 사용자 기록에 나타난 인물 성향 군집화 및 유형 분석)

  • Kim, Myoung-Jun
    • Journal of Digital Contents Society
    • /
    • v.17 no.5
    • /
    • pp.383-390
    • /
    • 2016
  • is a service providing story synopses that match user's query. This paper presents a classification of character types by clustering of character tendencies found in the user log of . We also present a visualization method of showing genre-action relationships to each character type, and investigate the genre-action relationships of the major character types. We found that a small number of character types can represent more than half of the character tendencies and the character types tend to have a relationship to particular genres and actions. According to this properties, it would be desirable to provide supports for creative writing classified by character types.

Utilization Pattern Analysis of an Enterprise Information System using Event Log Data (로그 데이터를 이용한 기업 정보 시스템의 사용 패턴 분석)

  • Han, Kwan Hee
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.10
    • /
    • pp.723-732
    • /
    • 2022
  • The success of enterprise information system(EIS) is crucial to align with corporate strategies and eventually attain corporate goals. Since one of the factors to information system success is system use, managerial efforts to measure the level of EIS utilization is vital. In this paper, the EIS utilization level is analyzed using system access log data. In particular, process sequence patterns and clustering of similar functions are identified in more detail based on a process mining method, in addition to basic access log statistics. The result of this research can be used to improve existing information system design by finding real IS usage sequences and function clusters.

웹 로그(Web Log) 분석을 통한 정보의 활용

  • 김석기;안정용;한경수;한범수
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.123-127
    • /
    • 2000
  • 인터넷이 데이터 저장 및 서비스를 위한 도구로 폭넓게 활용되고 있으며, 이 과정에서 웹 서버 방문객에 대한 정보인 로그가 발생된다. 이러한 로그는 방문객 주소, 참조 페이지, 방문 시각 등의 정보를 포함하고 있다. 웹 로그에 대하여 패턴분석(pattern analysis), 군집분석(clustering), 판별분석(classification) 등의 통계적 분석을 통하여 방문객이 관심을 가지는 항목이나 항목간의 연관관계 등 새로운 정보를 생성하여 웹 디자인 또는 비즈니스에의 적용에 대한 연구가 활발히 논의되고 있다. 본 연구에서는 웹 로그 분석에 대하여 소개하고 웹 로그 분석을 위한 방안을 제시하고자 한다.

  • PDF

Comparisons on Clustering Methods: Use of LMS Log Variables on Academic Courses

  • Jo, Il-Hyun;PARK, Yeonjeong;SONG, Jongwoo
    • Educational Technology International
    • /
    • v.18 no.2
    • /
    • pp.159-191
    • /
    • 2017
  • Academic analytics guides university decision-makers to assign limited resources more effectively. Especially, diverse academic courses clustered by the usage patterns and levels on Learning Management System(LMS) help understanding instructors' pedagogical approach and the integration level of technologies. Further, the clustering results can contribute deciding proper range and levels of financial and technical supports. However, in spite of diverse analytic methodologies, clustering analysis methods often provide different results. The purpose of this study is to present implications by using three different clustering analysis including Gaussian Mixture Model, K-Means clustering, and Hierarchical clustering. As a case, we have clustered academic courses based on the usage levels and patterns of LMS in higher education using those three clustering techniques. In this study, 2,639 courses opened during 2013 fall semester in a large private university located in South Korea were analyzed with 13 observation variables that represent the characteristics of academic courses. The results of analysis show that the strengths and weakness of each clustering analysis and suggest that academic leaders and university staff should look into the usage levels and patterns of LMS with more elaborated view and take an integrated approach with different analytic methods for their strategic decision on development of LMS.

A Density-based Clustering Method

  • Ahn, Sung Mahn;Baik, Sung Wook
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.3
    • /
    • pp.715-723
    • /
    • 2002
  • This paper is to show a clustering application of a density estimation method that utilizes the Gaussian mixture model. We define "closeness measure" as a clustering criterion to see how close given two Gaussian components are. Closeness measure is defined as the ratio of log likelihood between two Gaussian components. According to simulations using artificial data, the clustering algorithm turned out to be very powerful in that it can correctly determine clusters in complex situations, and very flexible in that it can produce different sizes of clusters based on different threshold valuesold values

A Study on the Search Behavior of Digital Library Users: Focus on the Network Analysis of Search Log Data (디지털 도서관 이용자의 검색행태 연구 - 검색 로그 데이터의 네트워크 분석을 중심으로 -)

  • Lee, Soo-Sang;Wei, Cheng-Guang
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.4
    • /
    • pp.139-158
    • /
    • 2009
  • This paper used the network analysis method to analyse a variety of attributes of searcher's search behaviors which was appeared on search access log data. The results of this research are as follows. First, the structure of network represented depending on the similarity of the query that user had inputed. Second, we can find out the particular searchers who occupied in the central position in the network. Third, it showed that some query were shared with ego-searcher and alter searchers. Fourth, the total number of searchers can be divided into some sub-groups through the clustering analysis. The study reveals a new recommendation algorithm of associated searchers and search query through the social network analysis, and it will be capable of utilization.

  • PDF

Personalized Product Recommendation Method for Analyzing User Behavior Using DeepFM

  • Xu, Jianqiang;Hu, Zhujiao;Zou, Junzhong
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.369-384
    • /
    • 2021
  • In a personalized product recommendation system, when the amount of log data is large or sparse, the accuracy of model recommendation will be greatly affected. To solve this problem, a personalized product recommendation method using deep factorization machine (DeepFM) to analyze user behavior is proposed. Firstly, the K-means clustering algorithm is used to cluster the original log data from the perspective of similarity to reduce the data dimension. Then, through the DeepFM parameter sharing strategy, the relationship between low- and high-order feature combinations is learned from log data, and the click rate prediction model is constructed. Finally, based on the predicted click-through rate, products are recommended to users in sequence and fed back. The area under the curve (AUC) and Logloss of the proposed method are 0.8834 and 0.0253, respectively, on the Criteo dataset, and 0.7836 and 0.0348 on the KDD2012 Cup dataset, respectively. Compared with other newer recommendation methods, the proposed method can achieve better recommendation effect.