• Title/Summary/Keyword: Data Clustering

Search Result 2,754, Processing Time 0.033 seconds

An Adaptive Regional Clustering Scheme Based on Threshold-Dataset in Wireless Sensor Networks for Monitoring of Weather Conditions (기상감시 무선 센서 네트워크에 적합한 Threshold-dataset 기반 지역적 클러스터링 기법)

  • Choi, Dong-Min;Shen, Jian;Chung, Il-Yong
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1287-1302
    • /
    • 2011
  • Clustering protocol that is used in wireless sensor network is an efficient method that extends the lifetime of the network. However, when this method is applied to an environment in which collected data of the sensor node easily overlap, sensor nodes unnecessarily consumes energy. In the case of clustering technique that uses a threshold, the lifetime of the network is extended but the degree of accuracy of collected data is low. Therefore it is hard to trust the data and improvement is needed. In addition, it is hard for the clustering protocol that uses multi-hop transmission to normally collect data because the selection of a cluster head node occurs at random and therefore the link of nodes is often disconnected. Accordingly this paper suggested a cluster-formation algorithm that reduces unnecessary energy consumption and that works with an alleviated link disconnection. According to the result of performance analysis, the suggested method lets the nodes consume less energy than the existing clustering method and the transmission efficiency is increased and the entire lifetime is prolonged by about 30%.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Design of Fuzzy System with Hierarchical Classifying Structures and its Application to Time Series Prediction (계층적 분류구조의 퍼지시스템 설계 및 시계열 예측 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.5
    • /
    • pp.595-602
    • /
    • 2009
  • Fuzzy rules, which represent the behavior of their system, are sensitive to fuzzy clustering techniques. If the classification abilities of such clustering techniques are improved, their systems can work for the purpose more accurately because the capabilities of the fuzzy rules and parameters are enhanced by the clustering techniques. Thus, this paper proposes a new hierarchically structured clustering algorithm that can enhance the classification abilities. The proposed clustering technique consists of two clusters based on correlationship and statistical characteristics between data, which can perform classification more accurately. In addition, this paper uses difference data sets to reflect the patterns and regularities of the original data clearly, and constructs multiple fuzzy systems to consider various characteristics of the differences suitably. To verify effectiveness of the proposed techniques, this paper applies the constructed fuzzy systems to the field of time series prediction, and performs prediction for nonlinear time series examples.

Comparison of Clustering Techniques in Flight Approach Phase using ADS-B Track Data (공항 근처 ADS-B 항적 자료에서의 클러스터링 기법 비교)

  • Jong-Chan Park;Heon Jin Park
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.29-38
    • /
    • 2021
  • Deviation of route in aviation safety management is a dangerous factor that can lead to serious accidents. In this study, the anomaly score is calculated by classifying the tracks through clustering and calculating the distance from the cluster center. The study was conducted by extracting tracks within 100 km of the airport from the ADS-B track data received for one year. The wake was vectorized using linear interpolation. Latitude, longitude, and altitude 3D coordinates were used. Through PCA, the dimension was reduced to an axis representing more than 90% of the overall data distribution, and k-means clustering, hierarchical clustering, and PAM techniques were applied. The number of clusters was selected using the silhouette measure, and an abnormality score was calculated by calculating the distance from the cluster center. In this study, we compare the number of clusters for each cluster technique, and evaluate the clustering result through the silhouette measure.

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.3
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

Efficient Clustering and Data Transmission for Service-Centric Data Gathering in Surveillance Sensor Networks (감시정찰 센서 네트워크에서 서비스 기반 정보수집을 위한 효율적인 클러스터링 및 데이터 전송 기법)

  • Song, Woon-Seop;Jung, Woo-Sung;Seo, Youn;Ko, Young-Bae
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.16 no.3
    • /
    • pp.304-313
    • /
    • 2013
  • Wireless Sensor Networks, especially supporting for surveillance service, are one of the core properties of network-centric warfare(NCW) that is a key factor of victory in future battlefields. Such a tactical surveillance sensor network must be designed not just for energy efficiency but for real-time requirements of emergency data transmission towards a control center. This paper proposes efficient clustering-based methods for supporting mobile sinks so that the network lifetime can be extended while emergency data can be served as well. We analyze the performance of the proposed scheme and compare it with other existing schemes through simulation via Qualnet 5.0.

A Study of HME Model in Time-Course Microarray Data

  • Myoung, Sung-Min;Kim, Dong-Geon;Jo, Jin-Nam
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.415-422
    • /
    • 2012
  • For statistical microarray data analysis, clustering analysis is a useful exploratory technique and offers the promise of simultaneously studying the variation of many genes. However, most of the proposed clustering methods are not rigorously solved for a time-course microarray data cluster and for a fitting time covariate; therefore, a statistical method is needed to form a cluster and represent a linear trend of each cluster for each gene. In this research, we developed a modified hierarchical mixture of an experts model to suggest clustering data and characterize each cluster using a linear mixed effect model. The feasibility of the proposed method is illustrated by an application to the human fibroblast data suggested by Iyer et al. (1999).

Mitigating the ICA Attack against Rotation-Based Transformation for Privacy Preserving Clustering

  • Mohaisen, Abedelaziz;Hong, Do-Won
    • ETRI Journal
    • /
    • v.30 no.6
    • /
    • pp.868-870
    • /
    • 2008
  • The rotation-based transformation (RBT) for privacy preserving data mining is vulnerable to the independent component analysis (ICA) attack. This paper introduces a modified multiple-rotation-based transformation technique for special mining applications, mitigating the ICA attack while maintaining the advantages of the RBT.

  • PDF

Classification of Seoul Metro Stations Based on Boarding/ Alighting Patterns Using Machine Learning Clustering (기계학습 클러스터링을 이용한 승하차 패턴에 따른 서울시 지하철역 분류)

  • Min, Meekyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.4
    • /
    • pp.13-18
    • /
    • 2018
  • In this study, we classify Seoul metro stations according to boarding and alighting patterns using machine earning technique. The target data is the number of boarding and alighting passengers per hour every day at 233 subway stations from 2008 to 2017 provided by the public data portal. Gaussian mixture model (GMM) and K-means clustering are used as machine learning techniques in order to classify subway stations. The distribution of the boarding time and the alighting time of the passengers can be modeled by the Gaussian mixture model. K-means clustering algorithm is used for unsupervised learning based on the data obtained by GMM modeling. As a result of the research, Seoul metro stations are classified into four groups according to boarding and alighting patterns. The results of this study can be utilized as a basic knowledge for analyzing the characteristics of Seoul subway stations and analyzing it economically, socially and culturally. The method of this research can be applied to public data and big data in areas requiring clustering.

Improved Parameter Inference for Low-Cost 3D LiDAR-Based Object Detection on Clustering Algorithms (클러스터링 알고리즘에서 저비용 3D LiDAR 기반 객체 감지를 위한 향상된 파라미터 추론)

  • Kim, Da-hyeon;Ahn, Jun-ho
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.71-78
    • /
    • 2022
  • This paper proposes an algorithm for 3D object detection by processing point cloud data of 3D LiDAR. Unlike 2D LiDAR, 3D LiDAR-based data was too vast and difficult to process in three dimensions. This paper introduces various studies based on 3D LiDAR and describes 3D LiDAR data processing. In this study, we propose a method of processing data of 3D LiDAR using clustering techniques for object detection and design an algorithm that fuses with cameras for clear and accurate 3D object detection. In addition, we study models for clustering 3D LiDAR-based data and study hyperparameter values according to models. When clustering 3D LiDAR-based data, the DBSCAN algorithm showed the most accurate results, and the hyperparameter values of DBSCAN were compared and analyzed. This study will be helpful for object detection research using 3D LiDAR in the future.