• 제목/요약/키워드: K-means방법

Search Result 2,406, Processing Time 0.037 seconds

A Non-linear Variant of Global Clustering Using Kernel Methods (커널을 이용한 전역 클러스터링의 비선형화)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon;Woo, Young-Woon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.4
    • /
    • pp.11-18
    • /
    • 2010
  • Fuzzy c-means (FCM) is a simple but efficient clustering algorithm using the concept of a fuzzy set that has been proved to be useful in many areas. There are, however, several well known problems with FCM, such as sensitivity to initialization, sensitivity to outliers, and limitation to convex clusters. In this paper, global fuzzy c-means (G-FCM) and kernel fuzzy c-means (K-FCM) are combined to form a non-linear variant of G-FCM, called kernel global fuzzy c-means (KG-FCM). G-FCM is a variant of FCM that uses an incremental seed selection method and is effective in alleviating sensitivity to initialization. There are several approaches to reduce the influence of noise and accommodate non-convex clusters, and K-FCM is one of them. K-FCM is used in this paper because it can easily be extended with different kernels. By combining G-FCM and K-FCM, KG-FCM can resolve the shortcomings mentioned above. The usefulness of the proposed method is demonstrated by experiments using artificial and real world data sets.

A Codebook Generation Algorithm Using a New Updating Condition (새로운 갱신조건을 적용한 부호책 생성 알고리즘)

  • 김형철;조제황
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.3
    • /
    • pp.205-209
    • /
    • 2004
  • The K-means algorithm is the most widely used method among the codebook generation algorithms in vector quantization. In this paper, we propose a codebook generation algorithm using a new updating condition to enhance the codebook performance. The conventional K-means algorithm uses a fixed weight of the distance for all training iterations, but the proposed method uses different weights according to the updating condition from the new codevectors for training iterations. Then, different weights can be applied to generate codevectors at each iteration according to this condition, and it can have a similar effect to variable weights. Experimental results show that the proposed algorithm has the better codebook performance than that of K-means algorithm.

  • PDF

Analysis of Combined Yeast Cell Cycle Data by Using the Integrated Analysis Program for DNA chip (DNA chip 통합분석 프로그램을 이용한 효모의 세포주기 유전자 발현 통합 데이터의 분석)

  • 양영렬;허철구
    • KSBB Journal
    • /
    • v.16 no.6
    • /
    • pp.538-546
    • /
    • 2001
  • An integrated data analysis program for DNA chip containing normalization, FDM analysis, various kinds of clustering methods, PCA, and SVD was applied to analyze combined yeast cell cycle data. This paper includes both comparisons of some clustering algorithms such as K-means, SOM and furry c-means and their results. For further analysis, clustering results from the integrated analysis program was used for function assignments to each cluster and for motif analysis. These results show an integrated analysis view on DNA chip data.

  • PDF

Segmental Corrective Training for HMM Parameter Estimation in Speech Recognition (음성인식 시스템의 HMM 파라메터 추정을 위한 분절단위 교정 학습)

  • 김회린;이황수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.2E
    • /
    • pp.5-11
    • /
    • 1993
  • 본 논문에서 HMM 파라메터 추정을 위해 분절단위 정보를 이용하는 수정된 교정학습방법을 제안한다. 수정된 교정학습방법은 기존의 교정학습 방법에서 사용하는 전향·후향 알고리즘 대신에 분절단위 K-means 알고리즘을 사용하여 HMM 파라메터를 교정한다. 이 방식은 분절단위 K-means 알고리즘이 음성신호내의 공통의 통계적 특성을 가지는 상태단위 정보를 강조한다는 사실을 이용하였다. 화자종속 음소 및 단어인식 실험에서 제안된 알고리즘이 기존의 교정학습 방법보다 적은 계산량으로도 향상된 인식률을 보여주었다. 이것은 HMM 교정학습에서 상태다누이 정보가 중요함을 보여준다.

  • PDF

Topic-based Multi-document Summarization Using Non-negative Matrix Factorization and K-means (비음수 행렬 분해와 K-means를 이용한 주제기반의 다중문서요약)

  • Park, Sun;Lee, Ju-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.4
    • /
    • pp.255-264
    • /
    • 2008
  • This paper proposes a novel method using K-means and Non-negative matrix factorization (NMF) for topic -based multi-document summarization. NMF decomposes weighted term by sentence matrix into two sparse non-negative matrices: semantic feature matrix and semantic variable matrix. Obtained semantic features are comprehensible intuitively. Weighted similarity between topic and semantic features can prevent meaningless sentences that are similar to a topic from being selected. K-means clustering removes noises from sentences so that biased semantics of documents are not reflected to summaries. Besides, coherence of document summaries can be enhanced by arranging selected sentences in the order of their ranks. The experimental results show that the proposed method achieves better performance than other methods.

Wavelet을 이용한 K-means clustering algorithm의 초기화

  • Kim Guk-Hwan;Jang U-Jin;Lee Jun-Seok
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.05a
    • /
    • pp.305-312
    • /
    • 2006
  • K-means clustering algorithm 에서 주로 이루어지는 랜덤 초기화 (random initialization) 방법은 전역 최적화된 해(global minimum)를 찾아내기에 문제점을 지니고 있다. 즉, 여러 횟수의 알고리듬 반복(iteration)을 실행하더라도 전역 최적화된 해를 찾아내기가 매우 힘들며 주어진 자료의 크기(data size)가 큰 경우에 있어서 이는 거의 불가능하다. 본 논문은 이러한 문제점들을 극복하기 위한 방안으로, wavelet을 이용하여 최적의 초기 군집 중심점(initial clustering center)들을 선택하는 방법을 제시한다. 즉, 웨이블릿을 이용한 효과적인 초기화 (initialization)를 통해서 작은 알고리듬 반복 횟수만으로도 전역 최적화에 도달하는 초기화 방법을 기술한다. 이런 초기화 방법이 군집 알고리즘에 사용될 경우, 온라인상에서 실시간 이루어지는 군집 분석에 큰 도움이 된 수 있다.

  • PDF

Initial Prototype Selection in Fuzzy C-Means Using Kernel Density Estimation (커널 밀도 추정을 이용한 Fuzzy C-means의 초기 원형 설정)

  • Cho, Hyun-Hak;Heo, Gyeong-Yong;Kim, Kwang-Beak
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2011.01a
    • /
    • pp.85-88
    • /
    • 2011
  • Fuzzy C-Means (FCM) 알고리듬은 가장 널리 사용되는 군집화 알고리듬 중 하나로 다양한 응용 분야에서 사용되고 있다. 하지만 FCM은 여러 가지 문제점을 가지고 있으며 초기 원형 설정이 그 중 하나이다. FCM은 국부 최적해에 수렴하므로 초기 원형 설정에 따라 클러스터링 결과가 달라진다. 이 논문에서는 이러한 FCM의 초기 원형 설정 문제를 개선하기 위하여 커널밀도 추정 (kernel density estimation) 기법을 활용하는 방법을 제안한다. 제안한 방법에서는 먼저 커널 밀도 추정을 수행한 후 밀도가 높은 지역에 클러스터의 초기 원형을 설정하고 원형이 설정된 영역의 밀도를 감소시키는 과정을 반복함으로써 효율적으로 초기 원형을 설정할 수 있다. 제안된 방법이 일반적으로 사용되는 무작위 초기화 방법에 비해 효율적이라는 사실은 실험결과를 통해 확인할 수 있다.

  • PDF

Text Detection and Binarization using Color Variance and an Improved K-means Color Clustering in Camera-captured Images (카메라 획득 영상에서의 색 분산 및 개선된 K-means 색 병합을 이용한 텍스트 영역 추출 및 이진화)

  • Song Young-Ja;Choi Yeong-Woo
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.205-214
    • /
    • 2006
  • Texts in images have significant and detailed information about the scenes, and if we can automatically detect and recognize those texts in real-time, it can be used in various applications. In this paper, we propose a new text detection method that can find texts from the various camera-captured images and propose a text segmentation method from the detected text regions. The detection method proposes color variance as a detection feature in RGB color space, and the segmentation method suggests an improved K-means color clustering in RGB color space. We have tested the proposed methods using various kinds of document style and natural scene images captured by digital cameras and mobile-phone camera, and we also tested the method with a portion of ICDAR[1] contest images.

Irregular Sound Detection using the K-means Algorithm (K-means 알고리듬을 이용한 비정상 사운드 검출)

  • Chong Ui-pil;Lee Jae-yeal;Cho Sang-jin
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.6 no.1
    • /
    • pp.23-26
    • /
    • 2005
  • This paper describes the algorithm for deciding the status of the operating machines in the power plants. It is very important to decide whether the status of the operating machines is good or not in the industry to protect the accidents of machines and improve the operation efficiency of the plants. There are two steps to analyze the status of the running machines. First, we extract the features from the input original data. Second, we classify those features into normal/abnormal condition of the machines using the wavelet transform and the input RMS vector through the K-means algorithm. In this paper we developed the algorithm to detect the fault operation using the K-means method from the sound of the operating machines.

  • PDF

Document Clustering Technique by K-means Algorithm and PCA (주성분 분석과 k 평균 알고리즘을 이용한 문서군집 방법)

  • Kim, Woosaeng;Kim, Sooyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.3
    • /
    • pp.625-630
    • /
    • 2014
  • The amount of information is increasing rapidly with the development of the internet and the computer. Since these enormous information is managed by the document forms, it is necessary to search and process them efficiently. The document clustering technique which clusters the related documents through the similarity between the documents help to classify, search, and process the large amount of documents automatically. This paper proposes a method to find the initial seed points through principal component analysis when the documents represented by vectors in the feature vector space are clustered by K-means algorithm in order to increase clustering performance. The experiment shows that our method has a better performance than the traditional K-means algorithm.