• 제목/요약/키워드: Co-Clustering

검색결과 221건 처리시간 0.027초

High-performance computing for SARS-CoV-2 RNAs clustering: a data science-based genomics approach

  • Oujja, Anas;Abid, Mohamed Riduan;Boumhidi, Jaouad;Bourhnane, Safae;Mourhir, Asmaa;Merchant, Fatima;Benhaddou, Driss
    • Genomics & Informatics
    • /
    • 제19권4호
    • /
    • pp.49.1-49.11
    • /
    • 2021
  • Nowadays, Genomic data constitutes one of the fastest growing datasets in the world. As of 2025, it is supposed to become the fourth largest source of Big Data, and thus mandating adequate high-performance computing (HPC) platform for processing. With the latest unprecedented and unpredictable mutations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the research community is in crucial need for ICT tools to process SARS-CoV-2 RNA data, e.g., by classifying it (i.e., clustering) and thus assisting in tracking virus mutations and predict future ones. In this paper, we are presenting an HPC-based SARS-CoV-2 RNAs clustering tool. We are adopting a data science approach, from data collection, through analysis, to visualization. In the analysis step, we present how our clustering approach leverages on HPC and the longest common subsequence (LCS) algorithm. The approach uses the Hadoop MapReduce programming paradigm and adapts the LCS algorithm in order to efficiently compute the length of the LCS for each pair of SARS-CoV-2 RNA sequences. The latter are extracted from the U.S. National Center for Biotechnology Information (NCBI) Virus repository. The computed LCS lengths are used to measure the dissimilarities between RNA sequences in order to work out existing clusters. In addition to that, we present a comparative study of the LCS algorithm performance based on variable workloads and different numbers of Hadoop worker nodes.

레이더 군집화를 위한 반복 K-means 클러스터링 알고리즘 (Repeated K-means Clustering Algorithm For Radar Sorting)

  • 박동현;서동호;백지현;이원진;장동의
    • 한국군사과학기술학회지
    • /
    • 제26권5호
    • /
    • pp.384-391
    • /
    • 2023
  • In modern electronic warfare, a number of radar emitters are in operation, causing radar receivers to receive high-density signal pulses that occur simultaneously. To analyze the radar signals more accurately and identify enemies, the sorting process of high-density radar signals is very important before analysis. Recently, machine learning algorithms, specifically K-means clustering, are the subject of research aimed at improving the accuracy of radar signal sorting. One of the challenges faced by these studies is that the clustering results can vary depending on how the initial points are selected and how many clusters number are set. This paper introduces a repeated K-means clustering algorithm that aims to accurately cluster all data by identifying and addressing false clusters in the radar sorting problem. To verify the performance of the proposed algorithm, experiments are conducted by applying it to simulated signals that are generated by a signal generator.

A Clustering-Based Fault Detection Method for Steam Boiler Tube in Thermal Power Plant

  • Yu, Jungwon;Jang, Jaeyel;Yoo, Jaeyeong;Park, June Ho;Kim, Sungshin
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권4호
    • /
    • pp.848-859
    • /
    • 2016
  • System failures in thermal power plants (TPPs) can lead to serious losses because the equipment is operated under very high pressure and temperature. Therefore, it is indispensable for alarm systems to inform field workers in advance of any abnormal operating conditions in the equipment. In this paper, we propose a clustering-based fault detection method for steam boiler tubes in TPPs. For data clustering, k-means algorithm is employed and the number of clusters are systematically determined by slope statistic. In the clustering-based method, it is assumed that normal data samples are close to the centers of clusters and those of abnormal are far from the centers. After partitioning training samples collected from normal target systems, fault scores (FSs) are assigned to unseen samples according to the distances between the samples and their closest cluster centroids. Alarm signals are generated if the FSs exceed predefined threshold values. The validity of exponentially weighted moving average to reduce false alarms is also investigated. To verify the performance, the proposed method is applied to failure cases due to boiler tube leakage. The experiment results show that the proposed method can detect the abnormal conditions of the target system successfully.

Clustering 및 위치정보를 활용한 WSN(Wireless Sensor Network) 성능 향상 방안 연구 (A Study for Improving WSNs(Wireless Sensor Networks) Performance using Clustering and Location Information)

  • 전진한;홍성훈
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2019년도 춘계학술대회
    • /
    • pp.260-263
    • /
    • 2019
  • 접근이 어렵거나 지속적인 모니터링이 필요한 서비스에 적용 가능한 WSN(Wireless Sensor Network) 기술은 최근 그 응용 분야의 확대 및 효율성으로 연구 개발의 필요성이 점증하고 있는 분야이다. 본 논문은 WSN의 패킷 전송률을 증가시키고 센서 노드들의 수명을 연장하기 위해 제시된 기존 연구들을 분석한 후, 센서 노드들에 Clustering 및 위치 기반 기법 적용 시 기존 연구 대비 성능 향상 요인들을 분석하였으며 이를 기반으로 패킷 손실률과 네트워크 수명 측면에서 향후 WSN의 성능 향상을 위한 새로운 기법에 대한 연구를 수행 할 예정이다.

  • PDF

A new Ensemble Clustering Algorithm using a Reconstructed Mapping Coefficient

  • Cao, Tuoqia;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권7호
    • /
    • pp.2957-2980
    • /
    • 2020
  • Ensemble clustering commonly integrates multiple basic partitions to obtain a more accurate clustering result than a single partition. Specifically, it exists an inevitable problem that the incomplete transformation from the original space to the integrated space. In this paper, a novel ensemble clustering algorithm using a newly reconstructed mapping coefficient (ECRMC) is proposed. In the algorithm, a newly reconstructed mapping coefficient between objects and micro-clusters is designed based on the principle of increasing information entropy to enhance effective information. This can reduce the information loss in the transformation from micro-clusters to the original space. Then the correlation of the micro-clusters is creatively calculated by the Spearman coefficient. Therefore, the revised co-association graph between objects can be built more accurately because the supplementary information can well ensure the completeness of the whole conversion process. Experiment results demonstrate that the ECRMC clustering algorithm has high performance, effectiveness, and feasibility.

Spatial Region Estimation for Autonomous CoT Clustering Using Hidden Markov Model

  • Jung, Joon-young;Min, Okgee
    • ETRI Journal
    • /
    • 제40권1호
    • /
    • pp.122-132
    • /
    • 2018
  • This paper proposes a hierarchical dual filtering (HDF) algorithm to estimate the spatial region between a Cloud of Things (CoT) gateway and an Internet of Things (IoT) device. The accuracy of the spatial region estimation is important for autonomous CoT clustering. We conduct spatial region estimation using a hidden Markov model (HMM) with a raw Bluetooth received signal strength indicator (RSSI). However, the accuracy of the region estimation using the validation data is only 53.8%. To increase the accuracy of the spatial region estimation, the HDF algorithm removes the high-frequency signals hierarchically, and alters the parameters according to whether the IoT device moves. The accuracy of spatial region estimation using a raw RSSI, Kalman filter, and HDF are compared to evaluate the effectiveness of the HDF algorithm. The success rate and root mean square error (RMSE) of all regions are 0.538, 0.622, and 0.75, and 0.997, 0.812, and 0.5 when raw RSSI, a Kalman filter, and HDF are used, respectively. The HDF algorithm attains the best results in terms of the success rate and RMSE of spatial region estimation using HMM.

거리-도플러 클러스터링 방법을 사용한 인접한 표적들의 분리 (Separation of Adjacent Targets using Range-Doppler Clustering Method)

  • 공영주;우선걸;박성호;유성현;강연덕
    • 한국인터넷방송통신학회논문지
    • /
    • 제20권2호
    • /
    • pp.67-73
    • /
    • 2020
  • 클러스터링 알고리즘은 유사한 특성을 가진 데이터들을 같은 집단으로 분류하는 방법이다. 레이다 시스템에서는 CFAR 알고리즘 수행한 결과에 대하여 인접한 hit들을 하나로 묶는 방법으로 주로 사용된다. 그러나 인접한 표적의 경우에는 일반적인 클러스터링 방안으로 수행하면 하나의 표적으로 탐지될 경우가 많다. 본 논문에서는 인접한 표적을 분리하기 위한 이중 클러스터링 방안에 대하여 서술한다. 연산시간 단축을 위하여 거리방향으로 클러스터링 수행 후 거리방향 클러스터링 결과를 이용하여 도플러 방향으로 클러스터링을 수행한다. 거리-도플러 방향으로 각각 클러스터링을 수행하기에 표적의 수가 증가하더라도 연산시간의 변화는 극히 적다.

동시인용정보를 이용한 동명이인 저자의 중의성 해소 (Disambiguation of Author Names Using Co-citation)

  • 강인수
    • 정보관리연구
    • /
    • 제42권3호
    • /
    • pp.167-186
    • /
    • 2011
  • 동시인용은 서로 다른 두 연구가 이후의 새로운 연구에서 동시 인용되는 것이다. 이 연구는 동시인용과 저자식별의 관계를 다룬다. 저자식별은 문헌에 출현한 동명의 저자명들을 실 세계 저자로 식별하는 것이다. 동시인용은, 한 사람의 관련된 연구들이 이후 또 다른 연구들에서 타인 혹은 자신에 의해 동시 인용되는 증거를 수집함으로써, 저자식별의 절차와 성능에 영향을 미칠 수 있다. 이 연구는 구글 스칼라로부터 동시인용을 자동 수집하는 절차를 제시하고 동시인용 정보를 저자식별의 기존 자질들과 효율적으로 결합하는 새로운 군집알고리즘을 제안한다. 실험을 통해 동시인용이 저자식별에 미치는 긍정적 효과를 확인하였다.

새로운 클러스터 평가 지표 (A Novel Cluster Validation Index)

  • 서석태;손세호;이인근;정혜천;권순학
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2005년도 추계학술대회 학술발표 논문집 제15권 제2호
    • /
    • pp.171-174
    • /
    • 2005
  • 기존의 클러스터 평가 지표(cluster validation index)는 클러스터의 개수가 커질수록 클러스터 평가 지표 값이 단조 감소하는 경향을 보인다. 최근에 이러한 단점을 보완하는 새로운 클러스터 평가 지표가 본 논문 저자중의 하나에 의해 제안되었으나, over-clustering의 단점 을 지니고 있다. 본 논문에서는, 클러스터 평가 지표 값이 단조 감소 및 over-clustering을 방지할 수 있는 새로운 클러스터 평가 지표를 제안하고, 여러 가지 예제를 통하여 새롭게 제안된 평가 지표의 타당성을 보인다.

  • PDF