• Title/Summary/Keyword: k-means clustering Algorithm

Search Result 547, Processing Time 0.022 seconds

An eigenspace projection clustering method for structural damage detection

  • Zhu, Jun-Hua;Yu, Ling;Yu, Li-Li
    • Structural Engineering and Mechanics
    • /
    • v.44 no.2
    • /
    • pp.179-196
    • /
    • 2012
  • An eigenspace projection clustering method is proposed for structural damage detection by combining projection algorithm and fuzzy clustering technique. The integrated procedure includes data selection, data normalization, projection, damage feature extraction, and clustering algorithm to structural damage assessment. The frequency response functions (FRFs) of the healthy and the damaged structure are used as initial data, median values of the projections are considered as damage features, and the fuzzy c-means (FCM) algorithm are used to categorize these features. The performance of the proposed method has been validated using a three-story frame structure built and tested by Los Alamos National Laboratory, USA. Two projection algorithms, namely principal component analysis (PCA) and kernel principal component analysis (KPCA), are compared for better extraction of damage features, further six kinds of distances adopted in FCM process are studied and discussed. The illustrated results reveal that the distance selection depends on the distribution of features. For the optimal choice of projections, it is recommended that the Cosine distance is used for the PCA while the Seuclidean distance and the Cityblock distance suitably used for the KPCA. The PCA method is recommended when a large amount of data need to be processed due to its higher correct decisions and less computational costs.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

A Personalized Music Recommendation System with a Time-weighted Clustering (시간 가중치와 가변형 K-means 기법을 이용한 개인화된 음악 추천 시스템)

  • Kim, Jae-Kwang;Yoon, Tae-Bok;Kim, Dong-Moon;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.504-510
    • /
    • 2009
  • Recently, personalized-adaptive services became the center of interest in the world. However the services about music are not widely diffused out. That is because the analyzing of music information is more difficult than analyzing of text information. In this paper, we propose a music recommendation system which provides personalized services. The system keeps a user's listening list and analyzes it to select pieces of music similar to the user's preference. For analysis, the system extracts properties from the sound wave of music and the time when the user listens to music. Based on the properties, a piece of music is mapped into a point in the property space and the time is converted into the weight of the point. At this time, if we select and analyze the group which is selected by user frequently, we can understand user's taste. However, it is not easy to predict how many groups are formed. To solve this problem, we apply the K-means clustering algorithm to the weighted points. We modified the K-means algorithm so that the number of clusters is dynamically changed. This manner limits a diameter so that we can apply this algorithm effectively when we know the range of data. By this algorithm we can find the center of each group and recommend the similar music with the group. We also consider the time when music is released. When recommending, the system selects pieces of music which is close to and released contemporarily with the user's preference. We perform experiments with one hundred pieces of music. The result shows that our proposed algorithm is effective.

Study on Application of Neural Network for Unsupervised Training of Remote Sensing Data (신경망을 이용한 원격탐사자료의 군집화 기법 연구)

  • 김광은;이태섭;채효석
    • Spatial Information Research
    • /
    • v.2 no.2
    • /
    • pp.175-188
    • /
    • 1994
  • A competitive learning network was proposed as unsupervised training method of remote sensing data, Its performance and computational re¬quirements were compared with conventional clustering techniques such as Se¬quential and K - Means. An airborne remote sensing data set was used to study the performance of these classifiers. The proposed algorithm required a little more computational time than the conventional techniques. However, the perform¬ance of competitive learning network algorithm was found to be slightly more than those of Sequential and K - Means clustering techniques.

  • PDF

An Optimization Approach to Data Clustering

  • Kim, Ju-Mi;Olafsson, Sigurdur
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.621-628
    • /
    • 2005
  • Scalability of clustering algorithms is critical issues facing the data mining community. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving scalability but a pervasive problem with this approach is how to deal with the noise that this introduces in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithms specifically designed for noisy performance. Numerical results illustrate that with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality.

  • PDF

Fast Outlier Removal for Image Registration based on Modified K-means Clustering

  • Soh, Young-Sung;Qadir, Mudasar;Kim, In-Taek
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.1
    • /
    • pp.9-14
    • /
    • 2015
  • Outlier detection and removal is a crucial step needed for various image processing applications such as image registration. Random Sample Consensus (RANSAC) is known to be the best algorithm so far for the outlier detection and removal. However RANSAC requires a cosiderable computation time. To drastically reduce the computation time while preserving the comparable quality, a outlier detection and removal method based on modified K-means is proposed. The original K-means was conducted first for matching point pairs and then cluster merging and member exclusion step are performed in the modification step. We applied the methods to various images with highly repetitive patterns under several geometric distortions and obtained successful results. We compared the proposed method with RANSAC and showed that the proposed method runs 3~10 times faster than RANSAC.

Improved FCM Clustering Image Segmentation (개선된 FCM 클러스터링 영상 분할)

  • Lee, Kwang-Kyug
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.127-131
    • /
    • 2020
  • Fuzzy C-Means(FCM) algorithm is frequently used as a representative image segmentation method using clustering. FCM divides the image space into cluster regions with similar pixel values, which requires a lot of segmentation time. In particular, the processing speed problem for analyzing various patterns of the current users of the web is more important. To solve this speed problem, this paper proposes an improved FCM (Improved FCM : IFCM) algorithm for segmenting the image into the Otsu threshold and FCM. In the proposed method, the threshold that maximizes the variance between classes of Otsu is determined, applied to the FCM, and the image is segmented. Experiments show that IFCM improves performance by shortening image segmentation time compared to conventional FCM.

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.2
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

The Design of Fuzzy Controller by Means of Genetic Optimization and Estimation Algorithms

  • Oh, Sung-Kwun;Rho, Seok-Beom
    • KIEE International Transaction on Systems and Control
    • /
    • v.12D no.1
    • /
    • pp.17-26
    • /
    • 2002
  • In this paper, a new design methodology of the fuzzy controller is presented. The performance of the fuzzy controller is sensitive to the variety of scaling factors. The design procedure is based on evolutionary computing (more specifically, a genetic algorithm) and estimation algorithm to adjust and estimate scaling factors respectively. The tuning of the soiling factors of the fuzzy controller is essential to the entire optimization process. And then we estimate scaling factors of the fuzzy controller by means of two types of estimation algorithms such as HCM (Hard C-Means) and Neuro-Fuzzy model[7]. The validity and effectiveness of the proposed estimation algorithm for the fuzzy controller are demonstrated by the inverted pendulum system.

  • PDF

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF