Search | Korea Science

Variable Selection and Outlier Detection for Automated K-means Clustering

Kim, Sung-Soo
- Communications for Statistical Applications and Methods
- /
- v.22 no.1
- /
- pp.55-67
- /
- 2015
An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.
https://doi.org/10.5351/CSAM.2015.22.1.055 인용 PDF KSCI

Noisy Image Segmentation via Swarm-based Possibilistic C-means

Yu, Jeongmin
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.12
- /
- pp.35-41
- /
- 2018
In this paper, we propose a swarm-based possibilistic c-means(PCM) algorithm in order to overcome the problems of PCM, which are sensitiveness of clustering performance due to initial cluster center's values and producing coincident or close clusters. To settle the former problem of PCM, we adopt a swam-based global optimization method which can be provided the optimal initial cluster centers. Furthermore, to settle the latter problem of PCM, we design an adaptive thresholding model based on the optimized cluster centers that yields preliminary clustered and un-clustered dataset. The preliminary clustered dataset plays a role of preventing coincident or close clusters and the un-clustered dataset is lastly clustered by PCM. From the experiment, the proposed method obtains a better performance than other PCM algorithms on a simulated magnetic resonance(MR) brain image dataset which is corrupted by various noises and bias-fields.
https://doi.org/10.9708/jksci.2018.23.12.035 인용 PDF KSCI HTML

Improved TI-FCM Clustering Algorithm in Big Data (빅데이터에서 개선된 TI-FCM 클러스터링 알고리즘)

Lee, Kwang-Kyug
- Journal of IKEEE
- /
- v.23 no.2
- /
- pp.419-424
- /
- 2019
The FCM algorithm finds the optimal solution through iterative optimization technique. In particular, there is a difference in execution time depending on the initial center of clustering, the location of noise, the location and number of crowded densities. However, this method gradually updates the center point, and the center of the initial cluster is shifted to one side. In this paper, we propose a TI-FCM(Triangular Inequality-Fuzzy C-Means) clustering algorithm that determines the cluster center density by maximizing the distance between clusters using triangular inequality. The proposed method is an effective method to converge to real clusters compared to FCM even in large data sets. Experiments show that execution time is reduced compared to existing FCM.
https://doi.org/10.7471/ikeee.2019.23.2.419 인용 PDF KSCI HTML

Cluster analysis by month for meteorological stations using a gridded data of numerical model with temperatures and precipitation (기온과 강수량의 수치모델 격자자료를 이용한 기상관측지점의 월별 군집화)

Kim, Hee-Kyung;Kim, Kwang-Sub;Lee, Jae-Won;Lee, Yung-Seop
- Journal of the Korean Data and Information Science Society
- /
- v.28 no.5
- /
- pp.1133-1144
- /
- 2017
Cluster analysis with meteorological data allows to segment meteorological region based on meteorological characteristics. By the way, meteorological observed data are not adequate for cluster analysis because meteorological stations which observe the data are located not uniformly. Therefore the clustering of meteorological observed data cannot reflect the climate characteristic of South Korea properly. The clustering of $5km{\times}5km$ gridded data derived from a numerical model, on the other hand, reflect it evenly. In this study, we analyzed long-term grid data for temperatures and precipitation using cluster analysis. Due to the monthly difference of climate characteristics, clustering was performed by month. As the result of K-Means cluster analysis is so sensitive to initial values, we used initial values with Ward method which is hierarchical cluster analysis method. Based on clustering of gridded data, cluster of meteorological stations were determined. As a result, clustering of meteorological stations in South Korea has been made spatio-temporal segmentation.
https://doi.org/10.7465/jkdi.2017.28.5.1133 인용 PDF KSCI

A Variable Selection Procedure for K-Means Clustering

Kim, Sung-Soo
- The Korean Journal of Applied Statistics
- /
- v.25 no.3
- /
- pp.471-483
- /
- 2012
One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".
https://doi.org/10.5351/KJAS.2012.25.3.471 인용 PDF KSCI

Identification of Cosmic Voids as Massive Cluster Counterparts

Shim, Junsup;Park, Changbom;Kim, Juhan;Hwang, Ho Seong
- The Bulletin of The Korean Astronomical Society
- /
- v.45 no.1
- /
- pp.48.2-48.2
- /
- 2020
We present a new void definition that connects voids with clusters, the high-density counterpart. We use a pair of ΛCDM simulations whose initial density fields are sign inverted versions to each other, and study the relation between the effective void volume and the corresponding cluster mass. Massive cluster halos (M ≥ 1013M⊙/h) are identified in one simulation at z=0 by linking dark matter particles. The corresponding void to each cluster is defined in the other simulation as the region occupied by the member particles of the cluster. We find a universal functional form of density profiles at z=0 and 1. We also find a power-law relation between the void effective radius and the corresponding cluster mass. Based on these findings, we identify cluster-counterpart voids directly from a density field without using the pair information by utilizing three parameters such as the smoothing scale, density threshold, and minimum core fraction. We identified voids corresponding to clusters more massive than M ≥ 3 × 1014M⊙/h at approximately 70-74 \% level of completeness and reliability. Our results suggest that we can detect voids comparable to clusters of a particular mass-scale.
PDF

Broadband Photometric Study of Two Open Clusters: Westerlund 1 and IC 1848

Lim, Beomdu
- The Bulletin of The Korean Astronomical Society
- /
- v.39 no.1
- /
- pp.83.1-83.1
- /
- 2014
Open clusters consisting of a co-spatial and coeval population with a similar chemical composition are a superb astrophysical test bed in both stellar and galactic astronomy. We introduce not only several scientific issues relating to these objects but also comprehensive studies of the two young open clusters Westerlund 1 and IC 1848 formed in extremely different star-forming conditions. Westerlund 1 is known as the most massive starburst cluster in the Galaxy. Located in the Scutum-Centaurus spiral arm, the cluster is relatively close to the Galactic Center. The apparent surface density is very high. On the other hand, IC 1848 is a core cluster within the large-scale star-forming region W5 lying in the Perseus arm. Unlike Westerlund 1, IC 1848 with a putatively low metallicity exhibits a low surface density. We present the fundamental parameters of those young clusters, such as reddening, distance, and age, obtained from the broadband photometric analysis. The stellar initial mass function (IMF) of the clusters is used to investigate the effects of the different star-forming conditions on the star formation activity. With the results of previous studies for several young open clusters, our preliminary results support a possibility that star formation activity may be affected by the environmental factors or the initial condition of natal clouds. In addition, we shortly discuss the age scale and spread of pre-main sequence stars to understand the formation processes of star clusters.
PDF

Sejong Open cluster Survey (SOS). VII. A Photometric Study of the Young Open Cluster IC 1590

Kim, Seulgi;Sung, Hwangyung;Bessell, Michael S.;Lim, Beomdu
- The Bulletin of The Korean Astronomical Society
- /
- v.45 no.1
- /
- pp.50.3-50.3
- /
- 2020
We present deep UBVIc and Hα photometry for the young open cluster IC 1590 which is at the center of the HII region NGC 281. From Ha index, 39 Hα emission stars and 15 Hα emission candidates are selected. The reddening law toward IC 1590 is slightly abnormal (RV,cl = 3.6 ± 0.2). The distance modulus of IC 1590 obtained from the reddening-free (Q', QVλ) diagrams is 12.4 ± 0.1 mag (d = 3.02 ± 0.14 kpc), which is consistent with distance d = 2.91 ± 0.42 kpc from the parallax of Gaia DR2 catalogue within the error range. We also determined the age and mass function of IC 1590 using the stellar evolution models and PMS evolutionary tracks. The median age of PMS stars is 2.4 ± 2.2 Myr. The initial mass function (IMF) of IC 1590 is the Salpeter-type IMF with a slope of �� = -1.26 ± 0.14 for m > 1 M⊙ stars.
PDF

A Geometrical Center based Two-way Search Heuristic Algorithm for Vehicle Routing Problem with Pickups and Deliveries

Shin, Kwang-Cheol
- Journal of Information Processing Systems
- /
- v.5 no.4
- /
- pp.237-242
- /
- 2009
The classical vehicle routing problem (VRP) can be extended by including customers who want to send goods to the depot. This type of VRP is called the vehicle routing problem with pickups and deliveries (VRPPD). This study proposes a novel way to solve VRPPD by introducing a two-phase heuristic routing algorithm which consists of a clustering phase and uses the geometrical center of a cluster and route establishment phase by applying a two-way search of each route after applying the TSP algorithm on each route. Experimental results show that the suggested algorithm can generate better initial solutions for more computer-intensive meta-heuristics than other existing methods such as the giant-tour-based partitioning method or the insertion-based method.
https://doi.org/10.3745/JIPS.2009.5.4.237 인용 PDF KSCI

Partial Discharge Data Analysis with Unsupervised Classification (무감독분류 기법에 의한 부분방전 데이터 분석)

Cho, Kyungsoon;Hong, Seonhack
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.14 no.4
- /
- pp.9-16
- /
- 2018
This study described partial discharge(PD) distribution analysis between the XLPE(Cross-Linked PolyEthylene)and EPDM(Ethylene Propylene Diene Monomer) interface with unsupervised classification. The ${\phi}-q-n$ patterns were analyzed using phase resolved partial discharge(PRPD). K-means cluster analysis forms a cluster based on similarities and distances among scattered individuals, and analyzes the characteristics of the formed clusters, dividing the multivariate data into several groups according to the similarity of each characteristic, Is a statistical analysis that makes it easier to navigate. It was confirmed that the phase angle of the cluster with the maximum discharge charge was concentrated around $0^{\circ}$ and $180^{\circ}$ at 30 kV after the initial phase distribution localized around $90^{\circ}$ and $300^{\circ}$ expanded to the whole phase angle according to the voltage rise. The Euclidean distance between the center of gravity and the discharge charge in the ${\Phi}-q$ cluster increased with increasing applied voltage.
https://doi.org/10.17662/ksdim.2018.14.4.009 인용 PDF KSCI

Search Result 40, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)