• Title/Summary/Keyword: parallel clustering

Search Result 104, Processing Time 0.032 seconds

Construction and Performance Test of a Supercomputing PC System using PC-clustering and Parallel Virtual Machine (PC-Clustering과 병렬가상장치에 의한 수치계산용 슈퍼컴퓨팅 PC 시스템 구축과 성능 테스트)

  • Hong, Woo-Pyo;Kim, Jong-Jae;Oh, Kwang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.2
    • /
    • pp.473-483
    • /
    • 1999
  • We introduce a way to construct a supercomputing capable system with some networked PCs, running the Linux operating system and computing power comparable with expensive commercial workstations, and with the Parallel Virtual Machine (PVM) software which enables one to control the total CPUs and memories of the networked PCs. By benchmarking the system using a PVM parallel program, we find that the system's parallel efficiency is close to 90 %.

  • PDF

Parallel Algorithm For Level Clustering (집단화를 위한 병렬 알고리즘의 구현)

  • Bae, Yong-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.2
    • /
    • pp.148-155
    • /
    • 1995
  • When we analize many amount of patterns, it is necessary for these patterns are to be clustering into several groups according to a certain evaluation function. This process, in case that there are lots of input patterns, needs a considerable amount of computations and is reqired parallel algorithm for these. To solve this problem, this paper propose parallel clustering algorithm which parallelized k-means algorithm and implemented it under the MIMD parallel computer based message passing. The result is through the experiment and performance analysis, that this parallel algorithm is appropriate in case these are many input patterns.

  • PDF

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Performance Analysis of Hierarchical Routing Protocols for Sensor Network (센서 네트워크를 위한 계층적 라우팅 프로토콜의 성능 분석)

  • Seo, Byung-Suk;Yoon, Sang-Hyun;Kim, Jong-Hyun
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.4
    • /
    • pp.47-56
    • /
    • 2012
  • In this study, we use a parallel simulator PASENS(Parallel SEnsor Network Simulator) to predict power consumption and data reception rate of the hierarchical routing protocols for sensor network - LEACH (Low-Energy Adaptive Clustering Hierarchy), TL-LEACH (Two Level Low-Energy Adaptive Clustering Hierarchy), M-LEACH (Multi hop Low-Energy Adaptive Clustering Hierarchy) and LEACH-C (LEACH-Centralized). According to simulation results, M-LEACH routing protocol shows the highest data reception rate for the wider area, since more sensor nodes are involved in the data transmission. And LEACH-C routing protocol, where the sink node considers the entire node's residual energy and location to determine the cluster head, results in the most efficient energy consumption and in the narrow area needed long life of sensor network.

A Linear Clustering Method for the Scheduling of the Directed Acyclic Graph Model with Multiprocessors Using Genetic Algorithm (다중프로세서를 갖는 유방향무환그래프 모델의 스케쥴링을 위한 유전알고리즘을 이용한 선형 클러스터링 해법)

  • Sung, Ki-Seok;Park, Jee-Hyuk
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.24 no.4
    • /
    • pp.591-600
    • /
    • 1998
  • The scheduling of parallel computing systems consists of two procedures, the assignment of tasks to each available processor and the ordering of tasks in each processor. The assignment procedure is same with a clustering. The clustering is classified into linear or nonlinear according to the precedence relationship of the tasks in each cluster. The parallel computing system can be modeled with a Directed Acyclic Graph(DAG). By the granularity theory, DAG is categorized into Coarse Grain Type(CDAG) and Fine Grain Type(FDAG). We suggest the linear clustering method for the scheduling of CDAG using the genetic algorithm. The method utilizes a properly that the optimal schedule of a CDAG is one of linear clustering. We present the computational comparisons between the suggested method for CDAG and an existing method for the general DAG including CDAG and FDAG.

  • PDF

A novel clustering method for examining and analyzing the intellectual structure of a scholarly field (지적 구조 분석을 위한 새로운 클러스터링 기법에 관한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.4 s.62
    • /
    • pp.215-231
    • /
    • 2006
  • Recently there are many bibliometric studies attempting to utilize Pathfinder networks(PFNets) for examining and analyzing the intellectual structure of a scholarly field. Pathfinder network scaling has many advantages over traditional multidimensional scaling, including its ability to represent local details as well as global intellectual structure. However there are some limitations in PFNets including very high time complexity. And Pathfinder network scaling cannot be combined with cluster analysis, which has been combined well with traditional multidimensional scaling method. In this paper, a new method named as Parallel Nearest Neighbor Clustering (PNNC) are proposed for complementing those weak points of PFNets. Comparing the clustering performance with traditional hierarchical agglomerative clustering methods shows that PNNC is not only a complement to PFNets but also a fast and powerful clustering method for organizing informations.

Parallel Processing of k-Means Clustering Algorithm for Unsupervised Classification of Large Satellite Images: A Hybrid Method Using Multicores and a PC-Cluster (대용량 위성영상의 무감독 분류를 위한 k-Means Clustering 알고리즘의 병렬처리: 다중코어와 PC-Cluster를 이용한 Hybrid 방식)

  • Han, Soohee;Song, Jeong Heon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.445-452
    • /
    • 2019
  • In this study, parallel processing codes of k-means clustering algorithm were developed and implemented in a PC-cluster for unsupervised classification of large satellite images. We implemented intra-node code using multicores of CPU (Central Processing Unit) based on OpenMP (Open Multi-Processing), inter-nodes code using a PC-cluster based on message passing interface, and hybrid code using both. The PC-cluster consists of one master node and eight slave nodes, and each node is equipped with eight multicores. Two operating systems, Microsoft Windows and Canonical Ubuntu, were installed in the PC-cluster in turn and tested to compare parallel processing performance. Two multispectral satellite images were tested, which are a medium-capacity LANDSAT 8 OLI (Operational Land Imager) image and a high-capacity Sentinel 2A image. To evaluate the performance of parallel processing, speedup and efficiency were measured. Overall, the speedup was over N / 2 and the efficiency was over 0.5. From the comparison of the two operating systems, the Ubuntu system showed two to three times faster performance. To confirm that the results of the sequential and parallel processing coincide with the other, the center value of each band and the number of classified pixels were compared, and result images were examined by pixel to pixel comparison. It was found that care should be taken to avoid false sharing of OpenMP in intra-node implementation. To process large satellite images in a PC-cluster, code and hardware should be designed to reduce performance degradation caused by file I / O. Also, it was found that performance can differ depending on the operating system installed in a PC-cluster.

Parallel Processing of K-means Clustering Algorithm for Unsupervised Classification of Large Satellite Imagery (대용량 위성영상의 무감독 분류를 위한 K-means 군집화 알고리즘의 병렬처리)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.3
    • /
    • pp.187-194
    • /
    • 2017
  • The present study introduces a method to parallelize k-means clustering algorithm for fast unsupervised classification of large satellite imagery. Known as a representative algorithm for unsupervised classification, k-means clustering is usually applied to a preprocessing step before supervised classification, but can show the evident advantages of parallel processing due to its high computational intensity and less human intervention. Parallel processing codes are developed by using multi-threading based on OpenMP. In experiments, a PC of 8 multi-core integrated CPU is involved. A 7 band and 30m resolution image from LANDSAT 8 OLI and a 8 band and 10m resolution image from Sentinel-2A are tested. Parallel processing has shown 6 time faster speed than sequential processing when using 10 classes. To check the consistency of parallel and sequential processing, centers, numbers of classified pixels of classes, classified images are mutually compared, resulting in the same results. The present study is meaningful because it has proved that performance of large satellite processing can be significantly improved by using parallel processing. And it is also revealed that it easy to implement parallel processing by using multi-threading based on OpenMP but it should be carefully designed to control the occurrence of false sharing.

Parallel Clustering Algorithm for Balancing Problem of a Two-sided Assembly Line (양측 조립라인 균형문제의 병렬군집 알고리즘)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.95-101
    • /
    • 2022
  • The two-sided assembly line balancing problem is a kind of NP-hard problem. This problem primarily can be solved metaheuristic method. This paper suggests parallel clustering algorithm that each left and right-sided workstation assigned by operations with Ti = c* ± α < c, c* = ${\lceil}$W/m*${\rceil}$ such that M* = ${\lceil}$W/c${\rceil}$ for precedence diagram of two-sided assembly line with total complete time W and cycle time c. This clustering performs forward direction from left to right or reverse direction from right to left. For the 4 experimental data with 17 cycle times, the proposed algorithm can be obtain the minimum number of workstations m* and can be reduce the cycle time to Tmax < c then metaheuristic methods. Also, proposed clustering algorithm maximizes the line efficiency and minimizes the variance between workers operation times.

Parallel Structure Modeling of Nonlinear Process Using Clustering Method (클러스터링 기법을 이용한 비선형 공정의 병렬구조 모델링)

  • 박춘성;최재호;오성권;안태천
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1997.10a
    • /
    • pp.383-386
    • /
    • 1997
  • In this paper, We proposed a parallel structure of the Neural Network model to nonlinear complex system. Neural Network was used as basic model which has learning ability and high tolerence level. This paper, we used Neural Network which has BP(Error Back Propagation Algorithm) model. But it sometimes has difficulty to append characteristic of input data to nonlinear system. So that, I used HCM(hard c-Means) method of clustering technique to append property of input data. Clustering Algorithms are used extensively not only to organized categorize data, but are also useful for data compression and model construction. Gas furance, a sewage treatment process are used to evaluate the performance of the proposed model and then obtained higher accuracy than other previous medels.

  • PDF