• Title/Summary/Keyword: 2-Step Clustering

Search Result 86, Processing Time 0.026 seconds

High-performance computing for SARS-CoV-2 RNAs clustering: a data science-based genomics approach

  • Oujja, Anas;Abid, Mohamed Riduan;Boumhidi, Jaouad;Bourhnane, Safae;Mourhir, Asmaa;Merchant, Fatima;Benhaddou, Driss
    • Genomics & Informatics
    • /
    • v.19 no.4
    • /
    • pp.49.1-49.11
    • /
    • 2021
  • Nowadays, Genomic data constitutes one of the fastest growing datasets in the world. As of 2025, it is supposed to become the fourth largest source of Big Data, and thus mandating adequate high-performance computing (HPC) platform for processing. With the latest unprecedented and unpredictable mutations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the research community is in crucial need for ICT tools to process SARS-CoV-2 RNA data, e.g., by classifying it (i.e., clustering) and thus assisting in tracking virus mutations and predict future ones. In this paper, we are presenting an HPC-based SARS-CoV-2 RNAs clustering tool. We are adopting a data science approach, from data collection, through analysis, to visualization. In the analysis step, we present how our clustering approach leverages on HPC and the longest common subsequence (LCS) algorithm. The approach uses the Hadoop MapReduce programming paradigm and adapts the LCS algorithm in order to efficiently compute the length of the LCS for each pair of SARS-CoV-2 RNA sequences. The latter are extracted from the U.S. National Center for Biotechnology Information (NCBI) Virus repository. The computed LCS lengths are used to measure the dissimilarities between RNA sequences in order to work out existing clusters. In addition to that, we present a comparative study of the LCS algorithm performance based on variable workloads and different numbers of Hadoop worker nodes.

Customer Segmentation for IPTV Based on Competitive Resources under the Competition Environment among Broadcasting Media (방송 매체 간 경쟁 상황에서의 활용 자원에 기반한 IPTV 고객 세분화)

  • Suh, Bo-Mil
    • Journal of Information Technology Applications and Management
    • /
    • v.19 no.2
    • /
    • pp.97-116
    • /
    • 2012
  • Since 2008 when IPTV service entered the broadcasting market, the competition among interactive broadcasting media has been growing more and more fierce. To make a market strategy under the harsh competition, this study tried to make an IPTV customer segmentation based on the characteristics of interactive broadcasting media. From previous literature, this study drew five characteristics of interactive broadcasting media : ease of use, two-way communications, active control, variety of content, and economic efficiency. Two-step clustering based on these characteristics identified four customer segments. There were statistically significant differences in the five characteristics among the customer segments. This study profiled the customer segments and proposed competitive strategies for each customer segment.

Improved VRP & GA-TSP Model for Multi-Logistics Center (복수물류센터에 대한 VRP 및 GA-TSP의 개선모델개발)

  • Lee, Sang-Cheol;Yu, Jeong-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.5
    • /
    • pp.1279-1288
    • /
    • 2007
  • A vehicle routing problem with time constraint is one of the must important problem in distribution and logistics. In practice, the service for a customer must start and finish within a given delivery time. This study is concerned about the development of a model to optimize vehicle routing problem under the multi-logistics center problem. And we used a two-step approach with an improved genetic algorithm. In step one, a sector clustering model is developed by transfer the multi-logistics center problem to a single logistics center problem which is more easy to be solved. In step two, we developed a GA-TSP model with an improved genetic algorithm which can search a optimize vehicle routing with given time constraints. As a result, we developed a Network VRP computer programs according to the proposed solution VRP used ActiveX and distributed object technology.

  • PDF

Design of a GIS-Based Distribution System with Service Consideration (서비스수준을 고려한 GIS기반의 차량 운송시스템)

  • 황흥석;조규성
    • Korean Management Science Review
    • /
    • v.18 no.2
    • /
    • pp.125-134
    • /
    • 2001
  • This paper is concerned with the development of a GIS-based distribution system with service consideration. The proposed model could be used for a wide range of logistics applications in planning, engineering and operational purpose for logistics system. This research addresses the formulation of those complex prob1ems of two-echelon logistics system to plan the incorporating supply center locations and distribution problems based on GIS. We propose an integrated logistics model for determining the optimal patterns of supply centers and inventory allocations (customers) with a three-step sequential approach. 1) First step, Developing GIS-distance model and stochastic set-covering program to determine Optimel pattern of supply center location. 2) Second step, Optimal sector-clustering to support customers. 3) Third step, Optimal vehicle rouse scheduling based on GIS, GIS-VRP In this research we developed GUI-tree program, the GIS-VRP provide the vehicle to users and freight information in real time. We applied a set of sample examples to this model and demonstrated samp1e results. It has been found that the proposed model is potentially efficient and useful in solving multi-depot problem through examples. However the proposed model can provide logistics decision makers to get the best supply schedule.

  • PDF

Wafer bin map failure pattern recognition using hierarchical clustering (계층적 군집분석을 이용한 반도체 웨이퍼의 불량 및 불량 패턴 탐지)

  • Jeong, Joowon;Jung, Yoonsuh
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.407-419
    • /
    • 2022
  • The semiconductor fabrication process is complex and time-consuming. There are sometimes errors in the process, which results in defective die on the wafer bin map (WBM). We can detect the faulty WBM by finding some patterns caused by dies. When one manually seeks the failure on WBM, it takes a long time due to the enormous number of WBMs. We suggest a two-step approach to discover the probable pattern on the WBMs in this paper. The first step is to separate the normal WBMs from the defective WBMs. We adapt a hierarchical clustering for de-noising, which nicely performs this work by wisely tuning the number of minimum points and the cutting height. Once declared as a faulty WBM, then it moves to the next step. In the second step, we classify the patterns among the defective WBMs. For this purpose, we extract features from the WBM. Then machine learning algorithm classifies the pattern. We use a real WBM data set (WM-811K) released by Taiwan semiconductor manufacturing company.

A Hierarchical Partitioning Method Using Clustering (클러스터링을 이용한 계층적 분할 방법)

  • 김충희;신현철
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.30A no.3
    • /
    • pp.139-145
    • /
    • 1993
  • Partitioning is an important step in the hierarchical design of very large scale integrated circuits. In this research, a new effective partitioning algorithm based on 2-level hierarchy is presented. At the beginning, clusters are formed to reduce the problem size. To overcome the weakness of the iterative improvement techniques that the partitioning result is dependent on the initial partitioning and to consistently produce good results, the cluster-level partitioning is performed several times using several sets of parameters. Then the best result of cluster-partitioning is used as the initial solution for lower level partitioning. For each partitioning, the gradual constraint enforcing partitioning method has been used. The clustering-based partitioning algorithm has been applied to several benchmark examples and produced promising results which show that this algorithm is efficient and effective.

  • PDF

An Introduction of Two-Step K-means Clustering Applied to Microarray Data (마이크로 어레이 데이터에 적용된 2단계 K-means 클러스터링의 소개)

  • Park, Dae-Hun;Kim, Yeon-Tae;Kim, Seong-Sin;Lee, Chun-Hwan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.83-86
    • /
    • 2006
  • 많은 유전자 정보와 그 부산물은 많은 방법을 통해 연구되어 왔다. DNA 마이크로어레이 기술의 사용은 많은 데이터를 가져왔으며, 이렇게 얻은 데이터는 기존의 연구 방법으로는 분석하기 힘들다. 본 눈문에서는 많은 양의 데이터를 처리할 수 있게 하기 위하여 K-means 클러스터링 알고리즘을 이용한 분할 클러스터링을 제안하였다. 제안한 방법을 쌀 유전자로부터 나온 마이크로어레이 데이터에 적용함으로써 제안된 클러스터링 방법의 유용성을 검증하였으며, 기존의 K-means 클러스터링 알고리즘을 적용한 결과와 비교함으로써 제안된 알고리즘의 우수성을 확인 할 수 있었다.

  • PDF

Proposing the Method for Improving the Forecast Accuracy of Loan Underwriting (대출심사의 예측 정확도 향상을 위한 방법 제안)

  • Yang, Yu-Young;Park, Sang-Sung;Shin, Young-Geun;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1419-1429
    • /
    • 2010
  • Industry structure and environment of the domestic bank have been changed by an influx of large foreign-banks and advanced financial products when the currency crisis erupted in Korea. In a competitive environment, accurate forecasts of changes and tendencies are essential for the survival and development. Forecast of whether to approve loan applications for customer or not is an important matter because that is related to profit generation and risk management on the bank. Therefore, this paper proposes the method to improve forecast accuracy of loan underwriting. Processes in experiments are as follows. First, we select the predictor variables which affect significantly to the result of loan underwriting by correlation analysis and feature selection technique, and then cluster the customers by the 2-Step clustering technique based on selected variables. Second, we find the most accurate forecasting model for each clustering by applying LR, NN and SVM. Finally, we compare the forecasting accuracy of the proposed method with the forecasting accuracy of existing application way.

MR Brain Image Segmentation Using Clustering Technique

  • Yoon, Ock-Kyung;Kim, Dong-Whee;Kim, Hyun-Soon;Park, Kil-Houm
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.450-453
    • /
    • 2000
  • In this paper, an automated segmentation algorithm is proposed for MR brain images using T1-weighted, T2-weighted, and PD images complementarily. The proposed segmentation algorithm is composed of 3 steps. In the first step, cerebrum images are extracted by putting a cerebrum mask upon the three input images. In the second step, outstanding clusters that represent inner tissues of the cerebrum are chosen among 3-dimensional (3D) clusters. 3D clusters are determined by intersecting densely distributed parts of 2D histogram in the 3D space formed with three optimal scale images. Optimal scale image best describes the shape of densely distributed parts of pixels in 2D histogram. In the final step, cerebrum images are segmented using FCM algorithm with it’s initial centroid value as the outstanding cluster’s centroid value. The proposed segmentation algorithm complements the defect of FCM algorithm, being influenced upon initial centroid, by calculating cluster’s centroid accurately And also can get better segmentation results from the proposed segmentation algorithm with multi spectral analysis than the results of single spectral analysis.

  • PDF

Segmentation of Multispectral Brain MRI Based on Histogram (히스토그램에 기반한 다중스펙트럼 뇌 자기공명영상의 분할)

  • 윤옥경;김동휘
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.4
    • /
    • pp.46-54
    • /
    • 2003
  • In this paper, we propose segmentation algorithm for MR brain images using the histogram of T1-weighted, T2-weighted and PD images. Segmentation algorithm is composed of 3 steps. The first step involves the extraction of cerebrum images by ram a cerebrum mask over three input images. In the second step, peak ranges are determined from the histogram of the cerebrum image. In the final step, cerebrum images are segmented using coarse to fine clustering technique. We compare the segmentation result and processing time according to peak ranges. Also compare with the other segmentation methods. The proposed algorithm achieved better segmentation results than the other methods.

  • PDF