• Title/Summary/Keyword: Distributed Clustering

Search Result 224, Processing Time 0.032 seconds

Decombined Distributed Parallel VQ Codebook Generation Based on MapReduce (맵리듀스를 사용한 디컴바인드 분산 VQ 코드북 생성 방법)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.15 no.3
    • /
    • pp.365-371
    • /
    • 2014
  • In the era of big data, algorithms for the existing IT environment cannot accept on a distributed architecture such as hadoop. Thus, new distributed algorithms which apply a distributed framework such as MapReduce are needed. Lloyd's algorithm commonly used for vector quantization is developed using MapReduce recently. In this paper, we proposed a decombined distributed VQ codebook generation algorithm based on a distributed VQ codebook generation algorithm using MapReduce to get a result more fast. The result of applying the proposed algorithm to big data showed higher performance than the conventional method.

Variable Clustering Management for Multiple Streaming of Distributed Mobile Service (분산 모바일 서비스의 다중 스트리밍을 위한 가변 클러스터링 관리)

  • Jeong, Taeg-Won;Lee, Chong-Deuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.485-492
    • /
    • 2009
  • In the mobile service environment, patterns generated by temporal synchronization are streamed with different instance values. This paper proposed a variable clustering management method, which manages multiple data streaming dynamically, to support flexible clustering. The method manages synchronization effectively and differently with conventional streaming methods in data streaming environment and manages clustering streaming after the structural presentation level and the fitness presentation level. In the structural presentation level, the stream structure is presented using level matching and accumulation matching, and clustering management is carried out by the management of dynamic segment and static segment. The performance of the proposed method is tested by using k-means method, C/S server method, CDN method, and simulation. The test results showed that the proposed method has better performance than the other methods.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

Coordinated Cognitive Tethering in Dense Wireless Areas

  • Tabrizi, Haleh;Farhadi, Golnaz;Cioffi, John Matthew;Aldabbagh, Ghadah
    • ETRI Journal
    • /
    • v.38 no.2
    • /
    • pp.314-325
    • /
    • 2016
  • This paper examines the resource gain that can be obtained from the creation of clusters of nodes in densely populated areas. A single node within each such cluster is designated as a "hotspot"; all other nodes then communicate with a destination node, such as a base station, through such hotspots. We propose a semi-distributed algorithm, referred to as coordinated cognitive tethering (CCT), which clusters all nodes and coordinates hotspots to tether over locally available white spaces. CCT performs the following these steps: (a) groups nodes based on a modified k-means clustering algorithm; (b) assigns white-space spectrum to each cluster based on a distributed graph-coloring approach to maximize spectrum reuse, and (c) allocates physical-layer resources to individual users based on local channel information. Unlike small cells (for example, femtocells and WiFi), this approach does not require any additions to existing infrastructure. In addition to providing parallel service to more users than conventional direct communication in cellular networks, simulation results show that CCT can increase the average battery life of devices by 30%, on average.

Improving Data Accuracy Using Proactive Correlated Fuzzy System in Wireless Sensor Networks

  • Barakkath Nisha, U;Uma Maheswari, N;Venkatesh, R;Yasir Abdullah, R
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3515-3538
    • /
    • 2015
  • Data accuracy can be increased by detecting and removing the incorrect data generated in wireless sensor networks. By increasing the data accuracy, network lifetime can be increased parallel. Network lifetime or operational time is the time during which WSN is able to fulfill its tasks by using microcontroller with on-chip memory radio transceivers, albeit distributed sensor nodes send summary of their data to their cluster heads, which reduce energy consumption gradually. In this paper a powerful algorithm using proactive fuzzy system is proposed and it is a mixture of fuzzy logic with comparative correlation techniques that ensure high data accuracy by detecting incorrect data in distributed wireless sensor networks. This proposed system is implemented in two phases there, the first phase creates input space partitioning by using robust fuzzy c means clustering and the second phase detects incorrect data and removes it completely. Experimental result makes transparent of combined correlated fuzzy system (CCFS) which detects faulty readings with greater accuracy (99.21%) than the existing one (98.33%) along with low false alarm rate.

Partial Discharge Diagnosis of Interface Defect by the Distribution Statistical Analysis (분포 통계 해석에 의한 계면 결함 부분방전 진단)

  • Cho, Kyung-Soon;Lee, Kang-Won;Kim, Won-Jong;Hong, Jin-Woong;Shin, Jong-Yeol
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.21 no.4
    • /
    • pp.348-353
    • /
    • 2008
  • Most of the high voltage insulation systems, such as the power cable joint having hetero interface, are composed of more than two different insulators to improve insulating performance. The partial discharge(PD) in these hetero interface is expected to affect the total insulation performance. Thus, it is important to study electrical properties on these interfaces. This study described the influence of copper and semiconductive substance defects on $\Phi$-q-n distribution between the interface of the model cable joints to classify PD source. PD was sequentially detected for 600 cycles of the applied voltage. The K-means cluster analysis has been analyzed to investigate the $\Phi$-q-n distribution. The skewness-kurtosis(Sk-Ku) plot from K-means clustering results was defined to quantify cluster distribution and classify distribution patterns. The Sk-Ku plot is composed of skewness and kurtosis along abscissa and ordinate which indicate the asymmetry and the sharpness of distribution. As a result of the Sk-Ku plot, it was confirmed that the data was distributed in 1st 2nd and 3rd quadrant at copper foreign substance defect, but in case of semiconductive foreign substance, the data was distributed in 2nd quadrant only.

Energy-Aware Self-Stabilizing Distributed Clustering Protocol for Ad Hoc Networks: the case of WSNs

  • Ba, Mandicou;Flauzac, Olivier;Haggar, Bachar Salim;Makhloufi, Rafik;Nolot, Florent;Niang, Ibrahima
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.11
    • /
    • pp.2577-2596
    • /
    • 2013
  • In this paper, we present an Energy-Aware Self-Stabilizing Distributed Clustering protocol based on message-passing model for Ad Hoc networks. The latter does not require any initialization. Starting from an arbitrary configuration, the network converges to a stable state in a finite time. Our contribution is twofold. We firstly give the formal proof that the stabilization is reached after at most n+2 transitions and requires at most $n{\times}log(2n+{\kappa}+3)$ memory space, where n is the number of network nodes and ${\kappa}$ represents the maximum hops number in the clusters. Furthermore, using the OMNeT++ simulator, we perform an evaluation of our approach. Secondly, we propose an adaptation of our solution in the context of Wireless Sensor Networks (WSNs) with energy constraint. We notably show that our protocol can be easily used for constructing clusters according to multiple criteria in the election of cluster-heads, such as nodes' identity, residual energy or degree. We give a comparison under the different election metrics by evaluating their communication cost and energy consumption. Simulation results show that in terms of number of exchanged messages and energy consumption, it is better to use the Highest-ID metric for electing CHs.

Performance of Distributed Clustering Protocol in Heterogeneous Wireless Sensor Networks (불균일 무선 센서네트워크에서의 분산 클러스터링 프로토콜 성능)

  • Nguyen, Quoc Kien;Jeon, Taehyun
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.3
    • /
    • pp.123-126
    • /
    • 2016
  • Energy efficiency in heterogeneous network is considered as one of the main issues when deploying the wireless sensor network. In heterogeneous network, the random distribution of initial energy at each node could lead to an instability of the network. Therefore, a resonable policy must be established in order to maintain the fairness in energy consumption and extend the working time of each node in the network. In this paper, we evaluate the performance of the distributed clustering protocol (DCP) in heterogeneous network on different scenarios. Simulation results are compared with the results of a LEACH protocol in a heterogeneous network. In addition, the performance of system in heterogeneous network are also compared with the homogeneous network to illustrate the effect of imbalance in the initial energy on the life time of each node in the system. The result illustrates that the DCP protocol demonstrates better performance than LEACH protocol in both the heterogeneous and the homogeneous networks.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.