• Title/Summary/Keyword: Log Clustering

Search Result 73, Processing Time 0.023 seconds

An Application of the Clustering Threshold Gradient Descent Regularization Method for Selecting Genes in Predicting the Survival Time of Lung Carcinomas

  • Lee, Seung-Yeoun;Kim, Young-Chul
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.95-101
    • /
    • 2007
  • In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n<

A Form Clustering Algorithm for Web-based Application Reengineering (웹 응용 재구성을 위한 폼 클러스터링 알고리즘)

  • 최상수;박학수;이강수
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.2
    • /
    • pp.77-98
    • /
    • 2003
  • A web-based information system, that is a dominant type of information systems, suffers from the "web crisis" in development and maintenance of the system. To cope with the problem, a technology of software clustering to web-based application, which is one of web engineering, is strongly needed. In this paper, we propose a Form Clustering Algorithm along with an application example, which are used for internal-system reengineering to web-based information system. A Form Clustering Algorithm focuses on Page-model which is the feature of the web among the various web-based information system's structural model. Specially, we applying distance matrix to navigation model of graph form for easily analyzing, and web log analysis for identifying core function object that have a highly loading. Also, we create web software structure that can be used to maximize reusability and assign hardware effectively through 2-phase clustering step. Form Clustering Algorithm might be used at web-based information system development and maintenance for reusable web component development and hardware assignment, respectively.

  • PDF

A Study on ALTIBASETM LOG ANALYZER method for highly scalable, high-availability (고확장성, 고가용성을 위한 ALTIBASETM LOG ANALYZER 기법에 관한 연구)

  • Yang, Hyeong-Sik;Kim, Sun-Bae
    • Journal of Digital Convergence
    • /
    • v.12 no.5
    • /
    • pp.1-12
    • /
    • 2014
  • Recently, the need for non-stop service is increasing by the business mission-critical Internet banking, e-payment, e-commerce, home shopping, securities trading, and petition business increases, clustered in a single database of existing, redundant research on high-availability technologies related to technique, etc. is increasing. It provides an API based on the Active Log in addition to the technique of redundancy, ALTIBASE$^{TM}$ Log Analyzer (below, ALA), provides scalability and communication of the same model or between heterogeneous. In this paper, we evaluated the performance of ALA by presenting the design of the database system that you can use the ALA, to satisfy all the synchronization features high scalability and high availability, real-time.

Analysis Framework using Process Mining for Block Movement Process in Shipyards (조선 산업에서 프로세스 마이닝을 이용한 블록 이동 프로세스 분석 프레임워크 개발)

  • Lee, Dongha;Bae, Hyerim
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.39 no.6
    • /
    • pp.577-586
    • /
    • 2013
  • In a shipyard, it is hard to predict block movement due to the uncertainty caused during the long period of shipbuilding operations. For this reason, block movement is rarely scheduled, while main operations such as assembly, outfitting and painting are scheduled properly. Nonetheless, the high operating costs of block movement compel task managers to attempt its management. To resolve this dilemma, this paper proposes a new block movement analysis framework consisting of the following operations: understanding the entire process, log clustering to obtain manageable processes, discovering the process model and detecting exceptional processes. The proposed framework applies fuzzy mining and trace clustering among the process mining technologies to find main process and define process models easily. We also propose additional methodologies including adjustment of the semantic expression level for process instances to obtain an interpretable process model, definition of each cluster's process model, detection of exceptional processes, and others. The effectiveness of the proposed framework was verified in a case study using real-world event logs generated from the Block Process Monitoring System (BPMS).

Intrusion Detection on IoT Services using Event Network Correlation (이벤트 네트워크 상관분석을 이용한 IoT 서비스에서의 침입탐지)

  • Park, Boseok;Kim, Sangwook
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.1
    • /
    • pp.24-30
    • /
    • 2020
  • As the number of internet-connected appliances and the variety of IoT services are rapidly increasing, it is hard to protect IT assets with traditional network security techniques. Most traditional network log analysis systems use rule based mechanisms to reduce the raw logs. But using predefined rules can't detect new attack patterns. So, there is a need for a mechanism to reduce congested raw logs and detect new attack patterns. This paper suggests enterprise security management for IoT services using graph and network measures. We model an event network based on a graph of interconnected logs between network devices and IoT gateways. And we suggest a network clustering algorithm that estimates the attack probability of log clusters and detects new attack patterns.

Curriculum Mining Analysis Using Clustering-Based Process Mining (군집화 기반 프로세스 마이닝을 이용한 커리큘럼 마이닝 분석)

  • Joo, Woo-Min;Choi, Jin Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.45-55
    • /
    • 2015
  • In this paper, we consider curriculum mining as an application of process mining in the domain of education. The basic objective of the curriculum mining is to construct a registration pattern model by using logs of registration data. However, subject registration patterns of students are very unstructured and complicated, called a spaghetti model, because it has a lot of different cases and high diversity of behaviors. In general, it is typically difficult to develop and analyze registration patterns. In the literature, there was an effort to handle this issue by using clustering based on the features of students and behaviors. However, it is not easy to obtain them in general since they are private and qualitative. Therefore, in this paper, we propose a new framework of curriculum mining applying K-means clustering based on subject attributes to solve the problems caused by unstructured process model obtained. Specifically, we divide subject's attribute data into two parts : categorical and numerical data. Categorical attribute has subject name, class classification, and research field, while numerical attribute has ABEEK goal and semester information. In case of categorical attribute, we suggest a method to quantify them by using binarization. The number of clusters used for K-means clustering, we applied Elbow method using R-squared value representing the variance ratio that can be explained by the number of clusters. The performance of the suggested method was verified by using a log of student registration data from an 'A university' in terms of the simplicity and fitness, which are the typical performance measure of obtained process model in process mining.

Design of Resource Grouping for Desktop Grid Computing and Its Application Methods to Fault-Tolerance (데스크톱 그리드 컴퓨팅을 위한 자원 그룹핑 설계 및 결함포용으로의 적용 방안)

  • Shon, Jin Gon;Gil, Joon-Min
    • Journal of Digital Contents Society
    • /
    • v.14 no.2
    • /
    • pp.171-178
    • /
    • 2013
  • Desktop grid computing is the computing paradigm that can execute large-scale computing jobs using the desktop resources with heterogeneity and volatility. However, such the computing environment can not guarantee the stability and reliability of task execution because the desktop resources with different performance can freely participate and leave in task execution. Therefore, in this paper, we design resource grouping scheme using k-means clustering algorithm with an aim to provide desktop grid computing with the stability and reliability of task execution. Moreover, we conduct resource grouping using the execution log data of actual desktop grid systems and present application methods of desktop resource groups to fault-tolerance.

Development of a Personalized Recommendation Procedure Based on Data Mining Techniques for Internet Shopping Malls (인터넷 쇼핑몰을 위한 데이터마이닝 기반 개인별 상품추천방법론의 개발)

  • Kim, Jae-Kyeong;Ahn, Do-Hyun;Cho, Yoon-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.3
    • /
    • pp.177-191
    • /
    • 2003
  • Recommender systems are a personalized information filtering technology to help customers find the products they would like to purchase. Collaborative filtering is the most successful recommendation technology. Web usage mining and clustering analysis are widely used in the recommendation field. In this paper, we propose several hybrid collaborative filtering-based recommender procedures to address the effect of web usage mining and cluster analysis. Through the experiment with real e-commerce data, it is found that collaborative filtering using web log data can perform recommendation tasks effectively, but using cluster analysis can perform efficiently.

  • PDF

Recommendation of Personalized Surveillance Interval of Colonoscopy via Survival Analysis (생존분석을 이용한 맞춤형 대장내시경 검진주기 추천)

  • Gu, Jayeon;Kim, Eun Sun;Kim, Seoung Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.2
    • /
    • pp.129-137
    • /
    • 2016
  • A colonoscopy is important because it detects the presence of polyps in the colon that can lead to colon cancer. How often one needs to repeat a colonoscopy may depend on various factors. The main purpose of this study is to determine personalized surveillance interval of colonoscopy based on characteristics of patients including their clinical information. The clustering analysis using a partitioning around medoids algorithm was conducted on 625 patients who had a medical examination at Korea University Anam Hospital and found several subgroups of patients. For each cluster, we then performed survival analysis that provides the probability of having polyps according to the number of days until next visit. The results of survival analysis indicated that different survival distributions exist among different patients' groups. We believe that the procedure proposed in this study can provide the patients with personalized medical information about how often they need to repeat a colonoscopy.

A Study of the Classification and Application of Digital Broadcast Program Type based on Machine Learning (머신러닝 기반의 디지털 방송 프로그램 유형 분류 및 활용 방안 연구)

  • Yoon, Sang-Hyeak;Lee, So-Hyun;Kim, Hee-Woong
    • Knowledge Management Research
    • /
    • v.20 no.3
    • /
    • pp.119-137
    • /
    • 2019
  • With the recent spread of digital content, more people have been watching the digital content of TV programs on their PCs or mobile devices, rather than on TVs. With the change in such media use pattern, genres(types) of broadcast programs change in the flow of the times and viewers' trends. The programs that were broadcast on TVs have been released in digital content, and thereby people watching such content change their perception. For this reason, it is necessary to newly and differently classify genres(types) of broadcast programs on the basis of digital content, from the conventional classification of program genres(types) in broadcasting companies or relevant industries. Therefore, this study suggests a plan for newly classifying broadcast programs through using machine learning with the log data of people watching the programs in online media and for applying the new classification. This study is academically meaningful in the point that it analyzes and classifies program types on the basis of digital content. In addition, it is meaningful in the point that it makes use of the program classification algorithm developed in relevant industries, and especially suggests the strategy and plan for applying it.