• Title/Summary/Keyword: K-means cluster

Search Result 615, Processing Time 0.033 seconds

Classification of Healthy Family Indicators in Indonesia Based on a K-means Cluster Analysis

  • Herti Maryani;Anissa Rizkianti;Nailul Izza
    • Journal of Preventive Medicine and Public Health
    • /
    • v.57 no.3
    • /
    • pp.234-241
    • /
    • 2024
  • Objectives: Health development is a key element of national development. The goal of improving health development at the societal level will be readily achieved if it is directed from the smallest social unit, namely the family. This was the goal of the Healthy Indonesia Program with a Family Approach. The objective of the study was to analyze variables of family health indicators across all provinces in Indonesia to identify provincial disparities based on the status of healthy families. Methods: This study examined secondary data for 2021 from the Indonesia Health Profile, provided by the Ministry of Health of the Republic of Indonesia, and from the 2021 welfare statistics by Statistics Indonesia (BPS). From these sources, we identified 10 variables for analysis using the k-means method, a non-hierarchical method of cluster analysis. Results: The results of the cluster analysis of healthy family indicators yielded 5 clusters. In general, cluster 1 (Papua and West Papua Provinces) had the lowest average achievements for healthy family indicators, while cluster 5 (Jakarta Province) had the highest indicator scores. Conclusions: In Indonesia, disparities in healthy family indicators persist. Nutrition, maternal health, and child health are among the indicators that require government attention.

Sensitivity Enhancement of RF Plasma Etch Endpoint Detection With K-means Cluster Analysis

  • Lee, Honyoung;Jang, Haegyu;Lee, Hak-Seung;Chae, Heeyeop
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2015.08a
    • /
    • pp.142.2-142.2
    • /
    • 2015
  • Plasma etch endpoint detection (EPD) of SiO2 and PR layer is demonstrated by plasma impedance monitoring in this work. Plasma etching process is the core process for making fine pattern devices in semiconductor fabrication, and the etching endpoint detection is one of the essential FDC (Fault Detection and Classification) for yield management and mass production. In general, Optical emission spectrocopy (OES) has been used to detect endpoint because OES can be a simple, non-invasive and real-time plasma monitoring tool. In OES, the trend of a few sensitive wavelengths is traced. However, in case of small-open area etch endpoint detection (ex. contact etch), it is at the boundary of the detection limit because of weak signal intensities of reaction reactants and products. Furthemore, the various materials covering the wafer such as photoresist (PR), dielectric materials, and metals make the analysis of OES signals complicated. In this study, full spectra of optical emission signals were collected and the data were analyzed by a data-mining approach, modified K-means cluster analysis. The K-means cluster analysis is modified suitably to analyze a thousand of wavelength variables from OES. This technique can improve the sensitivity of EPD for small area oxide layer etching processes: about 1.0 % oxide area. This technique is expected to be applied to various plasma monitoring applications including fault detections as well as EPD.

  • PDF

Comparison of Initial Seeds Methods for K-Means Clustering (K-Means 클러스터링에서 초기 중심 선정 방법 비교)

  • Lee, Shinwon
    • Journal of Internet Computing and Services
    • /
    • v.13 no.6
    • /
    • pp.1-8
    • /
    • 2012
  • Clustering method is divided into hierarchical clustering, partitioning clustering, and more. K-Means algorithm is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. It has disadvantage that the random initial centers cause different result. So, the better choice is to place them as far away as possible from each other. We propose a new method of selecting initial centers in K-Means clustering. This method uses triangle height for initial centers of clusters. After that, the centers are distributed evenly and that result is more accurate than initial cluster centers selected random. It is time-consuming, but can reduce total clustering time by minimizing the number of allocation and recalculation. We can reduce the time spent on total clustering. Compared with the standard algorithm, average consuming time is reduced 38.4%.

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.4
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

An Improved K-means Document Clustering using Concept Vectors

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.853-861
    • /
    • 2003
  • An improved K-means document clustering method has been presented, where a concept vector is manipulated for each cluster on the basis of cosine similarity of text documents. The concept vectors are unit vectors that have been normalized on the n-dimensional sphere. Because the standard K-means method is sensitive to initial starting condition, our improvement focused on starting condition for estimating the modes of a distribution. The improved K-means clustering algorithm has been applied to a set of text documents, called Classic3, to test and prove efficiency and correctness of clustering result, and showed 7% improvements in its worst case.

  • PDF

Support Vector Data Description using Mean Shift Clustering (평균 이동 알고리즘 기반의 지지 벡터 영역 표현 방법)

  • Chang, Hyung-Jin;Kim, Pyo-Jae;Choi, Jung-Hwan;Choi, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.307-309
    • /
    • 2007
  • SVDD의 scale prob1em을 해결하기 위하여, 학습 데이터를 sub-groupings하여 group 단위로 SVDD를 통해 학습함으로써 학습 시간을 줄이는, K-means clustering을 이용한 SVDD 방범(KMSVDD)이 제안되었다. 하지만 KMSVDD는 K-means clustering 알고리즘의 본질상 최적의 K값을 정하기 힘들다는 문제와, 동일한 데이터를 학습할지라도 clustered group이 램덤하게 형성되기 때문에 매번 학습의 결과가 달라지는 문제점이 있었다. 또한 데이터의 분포 상태와 관계없이 무조건 타원(dlliptic) 형태의 K개의 cluster로 나누기 때문에 각각의 나눠진 cluster들은 데이터 분포에 대한 특징을 나타내기 힘들게 된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 데이터 분포에서 mode를 먼저 찾은 후 이 mode를 기준으로 clustering하는 Mean Shift clustering 방법을 이용한 SVDD를 제안하고자 한다. 제안된 알고리즘은 KMSVDD와 비교해 데이터 학습 속도에서는 큰 차이가 없으면서도 데이터의 분포 상태를 고려한 형태로 clustering 한 sub-group을 학습하므로 학습의 정확도가 일정하게 되며, 각각의 cluster는 데이터 분표의 특징을 포함하는 효과가 있다. 또한 Mean Shift Kernel의 bandwidth의 결정은 K-Means의 K와는 달리 어느 정도 여유를 갖고 결정되어도 학습 결과에는 차이가 없다. 다양한 데이터들을 이용한 모의실험을 통하여 위의 내용들을 검증하도록 한다.

  • PDF

Analysis Process based on Modify K-means for Efficiency Improvement of Electric Power Data Pattern Detection (전력데이터 패턴 추출의 효율성 향상을 위한 변형된 K-means 기반의 분석 프로세스)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Yong Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.12
    • /
    • pp.1960-1969
    • /
    • 2017
  • There have been ongoing researches to identify and analyze the patterns of electric power IoT data inside sensor nodes to supplement the stable supply of power and the efficiency of energy consumption. This study set out to propose an analysis process for electric power IoT data with the K-means algorithm, which is an unsupervised learning technique rather than a supervised one. There are a couple of problems with the old K-means algorithm, and one of them is the selection of cluster number K in a heuristic or random method. That approach is proper for the age of standardized data. The investigator proposed an analysis process of selecting an automated cluster number K through principal component analysis and the space division of normal distribution and incorporated it into electric power IoT data. The performance evaluation results show that it recorded a higher level of performance than the old algorithm in the cluster classification and analysis of pitches and rolls included in the communication bodies of utility poles.

Improved Paired Cluster-Based Routing Protocol in Vehicular Ad-Hoc Networks

  • Kim, Wu Woan
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.22-32
    • /
    • 2018
  • In VANET, frequent movement of nodes causes dynamic changes of the network topology. Therefore the routing protocol, which is stable to effectively respond the changes of the network topology, is required. Moreover, the existing cluster-based routing protocol, that is the hybrid approach, has routing delay due to the frequent re-electing of the cluster header. In addition, the routing table of CBRP has only one hop distant neighbor nodes. PCBRP (Paired CBRP), proposed in this paper, ties two clusters in one pair of clusters to make longer radius. Then the pair of the cluster headers manages and operates corresponding member nodes. In the current CBRP, when the cluster header leaves the cluster the delay, due to the re-electing a header, should be occurred. However, in PCBRP, another cluster header of the paired cluster takes the role instead of the left cluster header. This means that this method reduces the routing delay. Concurrently, PCBRP reduces the delay when routing nodes in the paired cluster internally. Therefore PCBRP shows improved total delay of the network and improved performance due to the reduced routing overhead.

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

  • 허명회;이용구
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.135-144
    • /
    • 2004
  • We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF