• Title/Summary/Keyword: Optimizing the Number of Clusters

Search Result 7, Processing Time 0.02 seconds

A Study on Optimizing the Number of Clusters using External Cluster Relationship Criterion (외부 군집 연관 기준 정보를 이용한 군집수 최적화)

  • Lee, Hyun-Jin;Jee, Tae-Chang
    • Journal of Digital Contents Society
    • /
    • v.12 no.3
    • /
    • pp.339-345
    • /
    • 2011
  • The k-means has been one of the popular, simple and faster clustering algorithms, but the right value of k is unknown. The value of k (the number of clusters) is a very important element because the result of clustering is different depending on it. In this paper, we present a novel algorithm based on an external cluster relationship criterion which is an evaluation metric of clustering result to determine the number of clusters dynamically. Experimental results show that our algorithm is superior to other methods in terms of the accuracy of the number of clusters.

Advance Neuro-Fuzzy Modeling Using a New Clustering Algorithm (새로운 클러스터링 알고리듬을 적용한 향상된 뉴로-퍼지 모델링)

  • 김승석;김성수;유정웅
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.7
    • /
    • pp.536-543
    • /
    • 2004
  • In this paper, we proposed a new method of modeling a neuro-fuzzy system using a hybrid clustering algorithm. The initial parameters and the number of clusters of the proposed system are optimally chosen simultaneously with respect to the process of regression, which is a unique characteristics of the proposed system. The proposed algorithm presented in this work improves the overall performance of the proposed a neuro-fuzzy system by choosing a proper number of clusters adaptively according the characteristics of given data. The process of clustering is performed by deciding on the number of classes, which yields the property of convergence of the system. In experiments, the superiority of the proposed neuro-fuzzy system is demonstrated, especially the process of optimizing parameters and clustering of learning speed.

Determining the number of Clusters in On-Line Document Clustering Algorithm (온라인 문서 군집화에서 군집 수 결정 방법)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.513-522
    • /
    • 2007
  • Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.

Segmentation of Color Image by Subtractive and Gravity Fuzzy C-means Clustering (차감 및 중력 fuzzy C-means 클러스터링을 이용한 칼라 영상 분할에 관한 연구)

  • Jin, Young-Goun;Kim, Tae-Gyun
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.93-100
    • /
    • 1997
  • In general, fuzzy C-means clustering method was used on the segmentation of true color image. However, this method requires number of clusters as an input. In this study, we suggest new method that uses subtractive and gravity fuzzy C-means clustering. We get number of clusters and initial cluster centers by applying subtractive clustering on color image. After coarse segmentation of the image, we apply gravity fuzzy C-means for optimizing segmentation of the image. We show efficiency of the proposed algorithm by qualitative evaluation.

  • PDF

Optimizing the maximum reported cluster size for normal-based spatial scan statistics

  • Yoo, Haerin;Jung, Inkyung
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.4
    • /
    • pp.373-383
    • /
    • 2018
  • The spatial scan statistic is a widely used method to detect spatial clusters. The method imposes a large number of scanning windows with pre-defined shapes and varying sizes on the entire study region. The likelihood ratio test statistic comparing inside versus outside each window is then calculated and the window with the maximum value of test statistic becomes the most likely cluster. The results of cluster detection respond sensitively to the shape and the maximum size of scanning windows. The shape of scanning window has been extensively studied; however, there has been relatively little attention on the maximum scanning window size (MSWS) or maximum reported cluster size (MRCS). The Gini coefficient has recently been proposed by Han et al. (International Journal of Health Geographics, 15, 27, 2016) as a powerful tool to determine the optimal value of MRCS for the Poisson-based spatial scan statistic. In this paper, we apply the Gini coefficient to normal-based spatial scan statistics. Through a simulation study, we evaluate the performance of the proposed method. We illustrate the method using a real data example of female colorectal cancer incidence rates in South Korea for the year 2009.

Big Data Based Dynamic Flow Aggregation over 5G Network Slicing

  • Sun, Guolin;Mareri, Bruce;Liu, Guisong;Fang, Xiufen;Jiang, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4717-4737
    • /
    • 2017
  • Today, smart grids, smart homes, smart water networks, and intelligent transportation, are infrastructure systems that connect our world more than we ever thought possible and are associated with a single concept, the Internet of Things (IoT). The number of devices connected to the IoT and hence the number of traffic flow increases continuously, as well as the emergence of new applications. Although cutting-edge hardware technology can be employed to achieve a fast implementation to handle this huge data streams, there will always be a limit on size of traffic supported by a given architecture. However, recent cloud-based big data technologies fortunately offer an ideal environment to handle this issue. Moreover, the ever-increasing high volume of traffic created on demand presents great challenges for flow management. As a solution, flow aggregation decreases the number of flows needed to be processed by the network. The previous works in the literature prove that most of aggregation strategies designed for smart grids aim at optimizing system operation performance. They consider a common identifier to aggregate traffic on each device, having its independent static aggregation policy. In this paper, we propose a dynamic approach to aggregate flows based on traffic characteristics and device preferences. Our algorithm runs on a big data platform to provide an end-to-end network visibility of flows, which performs high-speed and high-volume computations to identify the clusters of similar flows and aggregate massive number of mice flows into a few meta-flows. Compared with existing solutions, our approach dynamically aggregates large number of such small flows into fewer flows, based on traffic characteristics and access node preferences. Using this approach, we alleviate the problem of processing a large amount of micro flows, and also significantly improve the accuracy of meeting the access node QoS demands. We conducted experiments, using a dataset of up to 100,000 flows, and studied the performance of our algorithm analytically. The experimental results are presented to show the promising effectiveness and scalability of our proposed approach.

Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter (SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법)

  • Yoon, Dabin;Kim, Deok-Hwan
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.5
    • /
    • pp.60-70
    • /
    • 2018
  • A software defined storage (SDS) is being deployed in cloud environment to allow multiple users to virtualize physical servers, but a solution for optimizing space efficiency with limited physical resources is needed. In the conventional data deduplication system, it is difficult to deduplicate redundant data uploaded to distributed storages. In this paper, we propose a distributed deduplication method using similarity-based clustering and multi-layer bloom filter. Rabin hash is applied to determine the degree of similarity between virtual machine servers and cluster similar virtual machines. Therefore, it improves the performance compared to deduplication efficiency for individual storage nodes. In addition, a multi-layer bloom filter incorporated into the deduplication process to shorten processing time by reducing the number of the false positives. Experimental results show that the proposed method improves the deduplication ratio by 9% compared to deduplication method using IP address based clusters without any difference in processing time.