• Title/Summary/Keyword: K-평균 군집화

Search Result 173, Processing Time 0.031 seconds

A Study On Predicting Stock Prices Of Hallyu Content Companies Using Two-Stage k-Means Clustering (2단계 k-평균 군집화를 활용한 한류컨텐츠 기업 주가 예측 연구)

  • Kim, Jeong-Woo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.169-179
    • /
    • 2021
  • This study shows that the two-stage k-means clustering method can improve prediction performance by predicting the stock price, To this end, this study introduces the two-stage k-means clustering algorithm and tests the prediction performance through comparison with various machine learning techniques. It selects the cluster close to the prediction target obtained from the k-means clustering, and reapplies the k-means clustering method to the cluster to search for a cluster closer to the actual value. As a result, the predicted value of this method is shown to be closer to the actual stock price than the predicted values of other machine learning techniques. Furthermore, it shows a relatively stable predicted value despite the use of a relatively small cluster. Accordingly, this method can simultaneously improve the accuracy and stability of prediction, and it can be considered as the new clustering method useful for small data. In the future, developing the two-stage k-means clustering is required for the large-scale data application.

K-Means Clustering in the PCA Subspace using an Unified Measure (통합 측도를 사용한 주성분해석 부공간에서의 k-평균 군집화 방법)

  • Yoo, Jae-Hung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.4
    • /
    • pp.703-708
    • /
    • 2022
  • K-means clustering is a representative clustering technique. However, there is a limitation in not being able to integrate the performance evaluation scale and the method of determining the minimum number of clusters. In this paper, a method for numerically determining the minimum number of clusters is introduced. The explained variance is presented as an integrated measure. We propose that the k-means clustering method should be performed in the subspace of the PCA in order to simultaneously satisfy the minimum number of clusters and the threshold of the explained variance. It aims to present an explanation in principle why principal component analysis and k-means clustering are sequentially performed in pattern recognition and machine learning.

Double K-Means Clustering (이중 K-평균 군집화)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.343-352
    • /
    • 2000
  • In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

  • PDF

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

  • 허명회;이용구
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.135-144
    • /
    • 2004
  • We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.

Study on Application of Neural Network for Unsupervised Training of Remote Sensing Data (신경망을 이용한 원격탐사자료의 군집화 기법 연구)

  • 김광은;이태섭;채효석
    • Spatial Information Research
    • /
    • v.2 no.2
    • /
    • pp.175-188
    • /
    • 1994
  • A competitive learning network was proposed as unsupervised training method of remote sensing data, Its performance and computational re¬quirements were compared with conventional clustering techniques such as Se¬quential and K - Means. An airborne remote sensing data set was used to study the performance of these classifiers. The proposed algorithm required a little more computational time than the conventional techniques. However, the perform¬ance of competitive learning network algorithm was found to be slightly more than those of Sequential and K - Means clustering techniques.

  • PDF

XML Document Clustering Technique by K-means algorithm through PCA (주성분 분석의 K 평균 알고리즘을 통한 XML 문서 군집화 기법)

  • Kim, Woo-Saeng
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.339-342
    • /
    • 2011
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and storing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. We use a K-means algorithm with a Principal Component Analysis(PCA) to cluster XML documents after they are represented by vectors in the feature vector space by transferring them as names and levels of the elements of the corresponding trees. The experiment shows that our proposed method has a good result.

Tree-structured Clustering for Continuous Data (연속형 자료에 대한 나무형 군집화)

  • Huh Myung-Hoe;Yang Kyung-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.661-671
    • /
    • 2005
  • The aim of this study is to propose a clustering method, called tree-structured clustering, by recursively partitioning continuous multivariate dat a based on overall $R^2$ criterion with a practical node-splitting decision rule. The clustering method produces easily interpretable clustering rules of tree types with the variable selection function. In numerical examples (Fisher's iris data and a Telecom case), we note several differences between tree-structured clustering and K-means clustering.

Comparison of Document Clustering Performance Using Various Dimension Reduction Methods (다양한 차원 축소 기법을 적용한 문서 군집화 성능 비교)

  • Cho, Heeryon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.437-438
    • /
    • 2018
  • 문서 군집화 성능을 높이기 위한 한 방법으로 차원 축소를 적용한 문서 벡터로 군집화를 실시하는 방법이 있다. 본 발표에서는 특이값 분해(SVD), 커널 주성분 분석(Kernel PCA), Doc2Vec 등의 차원 축소 기법을, K-평균 군집화(K-means clustering), 계층적 병합 군집화(hierarchical agglomerative clustering), 스펙트럼 군집화(spectral clustering)에 적용하고, 그 성능을 비교해 본다.

News Clustering and Multi-Document Summarization for Real-time Issue Analysis (실시간 이슈 분석을 위한 뉴스 군집화 및 다중 문서 요약)

  • Yu, Hongyeon;Lee, Seungwoo;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.132-137
    • /
    • 2018
  • 뉴스 기반의 실시간 이슈 분석을 위해서는 실시간으로 생성되는 다중 뉴스 기사 집합을 입력으로 받아 점증적으로 군집화 하고, 각 군집별 정보를 자동으로 요약하는 기술이 필요하다. 기존에는 정적인 데이터 기반의 군집화와 요약 각각에 대한 연구는 활발히 진행되고 있지만, 실시간으로 입력되는 대량의 데이터를 위한 점증적인 군집화와 요약에 대한 연구는 매우 부족하다. 따라서 본 논문에서는 실시간으로 입력되는 대량의 뉴스 기사 집합을 분석하기 위한 점증적이고 계층적인 뉴스 군집화 및 다중 문서 요약 방법을 제안한다. 평가를 위해서 2016년 10월, 11월 두 달간의 실제 데이터를 사용 하였으며, 전문 교육을 받은 연구원들이 Precision at k 기반의 정성평가를 진행하였다. 그 결과, 자동으로 생성된 12개의 군집에서 군집 성능은 평균 66% (상위계층 $l_1$: 82%, 하위계층 $l_2$: 43%), 요약 성능은 평균 92%를 얻었다.

  • PDF

연속형 자료에 대한 나무형 군집화

  • Heo, Myeong-Hui;Yang, Gyeong-Suk
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.49-51
    • /
    • 2005
  • 본 연구는 반복분할(recursive partitioning)에 의한 군집화 방법을 제안하고 활용 예를 제시한다. 이 방법은 나무 형태의 해석하기 쉬운 단순한 규칙을 제공하면서 동시에 변수선택기능을 제공한다.

  • PDF