• Title/Summary/Keyword: kmeans clustering

Search Result 12, Processing Time 0.038 seconds

K-means Clustering Method according to Documentation Numbers (문서 수에 따른 가중치를 적용한 K-means 문서 클러스터링)

  • 조시성;안동언;정성종;이신원
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1557-1560
    • /
    • 2003
  • 본 논문에서는 이 문서 클러스터링 방법 중 계층적 방법인 Kmeans 클러스터링 알고리즘을 이용하여 문서를 클러스터링 하고자 한다. 기존의 Kmeans 클러스터링 알고리즘은 문서의 수가 많을 경우 하나의 클러스터링에 너무 많은 문서들이 할당되는 문제점이 있다. 이 치우침을 완화하고자 각 클러스터링에 할당된 문서 수에 따라서 문서에 가중치를 부여한 후 다시 클러스터링을 하는 방법을 제안하였다. 실험 결과는 정확률, 재현율을 결합한 조화 평균(F-measure)을 사용하여 평가하였으며 기존 알고리즘보다 9%이상의 성능 향상을 나타냈다.

  • PDF

RHadoop platform for K-Means clustering of big data (빅데이터 K-평균 클러스터링을 위한 RHadoop 플랫폼)

  • Shin, Ji Eun;Oh, Yoon Sik;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.609-619
    • /
    • 2016
  • RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. In this paper, we implement K-Means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. The main idea introduces a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. We showed that our K-Means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases. We also implemented Elbow method with MapReduce for finding the optimum number of clusters for K-Means clustering on large dataset. Comparison with our MapReduce implementation of Elbow method and classical kmeans() in R with small data showed similar results.

Analysis of Radioactive Contamination Normal Level of Numerical Isotope using Clustering Methods (클러스터링 방법을 이용한 방사능 정상수치의 동위원소별 오염 분석)

  • Jung, Yong-Gyu;Choi, Jung-Ah;Cha, Byung-Heun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.41-46
    • /
    • 2014
  • As the radioactive-related incidents have been occurred frequently such as Fukushima nuclear exposure incident, it is always considered radioactivity normal levels in radiation exposure as a most risk components at several government agencies. In this paper, the data were analyzed by information in the data beyond range of the attributes. The clustering analysis method is used by EM and SimpleKMeans algorithm. The experimental results about US Radioactive associated data is depending on the method of data analysis. It can be seen that the method of the algorithm is different depending on local value of the normal range. The governments need to pay attention to increase the investigation frequency.

Statistical bioinformatics for gene expression data

  • Lee, Jae-K.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.08a
    • /
    • pp.103-127
    • /
    • 2001
  • Gene expression studies require statistical experimental designs and validation before laboratory confirmation. Various clustering approaches, such as hierarchical, Kmeans, SOM are commonly used for unsupervised learning in gene expression data. Several classification methods, such as gene voting, SVM, or discriminant analysis are used for supervised lerning, where well-defined response classification is possible. Estimating gene-condition interaction effects require advanced, computationally-intensive statistical approaches.

  • PDF

K-means Clustering Method according to Documentation Numbers (문서 수에 따른 가중치를 적용한 K-means 문서 클러스터링)

  • Cho, Cea-Sung;An, Dong-Un;Jeong, Sung-Jong;Lee, Shin-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05a
    • /
    • pp.345-348
    • /
    • 2003
  • 본 논문에서는 이 문서 클러스터링 방법 중 계층적 방법인 Kmeans 클러스터링 알고리즘을 이용하여 문서를 클러스터링 하고자 한다 기존의 Kmeans 클러스터링 알고리즘은 문서의 수가 많을 경우 하나의 클러스터링에 너무 많은 문서들이 할당되는 문제점이 있다. 이 치우침을 완화하고자 각 클러스터링에 할당된 문서 수에 따라서 문서에 가중치를 부여한 후 다시 클러스터링을 하는 방법을 제안하였다. 실험 결과는 정확률, 재현율을 결합한 조화 평균(F-measure)를 사용하여 평가하였으며 기존 알고리즘보다 9%이상의 성능 향상을 나타냈다.

  • PDF

Predicting Learning Achievement Using Big Data Cluster Analysis - Focusing on Longitudinal Study (빅데이터 군집 분석을 이용한 학습성취도 예측 - 종단 연구를 중심으로)

  • Ko, Sujeong
    • Journal of Digital Contents Society
    • /
    • v.19 no.9
    • /
    • pp.1769-1778
    • /
    • 2018
  • As the value of using Big Data is increasing, various researches are being carried out utilizing big data analysis technology in the field of education as well as corporations. In this paper, we propose a method to predict learning achievement using big data cluster analysis. In the proposed method, students in Korea Children and Youth Panel Survey(KCYPS) are classified into groups with similar learning habits using the Kmeans algorithm based on the learning habits of students of the first year at middle school, and group features are extracted. Next, using the extracted features of groups, the first grade students at the middle school in the test group were classified into groups having similar learning habits using the cosine similarity, and then the neighbors were selected and the learning achievement was predicted. The method proposed in this paper has proved that the learning habits at middle school are closely related to at the university, and they make it possible to predict the learning achievement at high school and the satisfaction with university and major.

3-D K-means clustering method considering internal chemical state variation of self-dischareg of Li-ion battery (리튬 이온 배터리의 자가 방전에 따른 내부 화학적 상태를 고려한 3-D K-means Clustering 스크리닝 기법 연구)

  • Han, Dongho;Kwon, Sanguk;Kim, Seungwoo;Lim, Cheolwoo;Kim, Jonghoon
    • Proceedings of the KIPE Conference
    • /
    • 2019.11a
    • /
    • pp.150-151
    • /
    • 2019
  • 리튬 이온 배터리가 전기 자동차 및 다양한 어플리케이션에 적용됨에 따라 폐배터리의 수요 또한 증가하고 있다. 내부 화학적 상태가 상이한 배터리의 전기적 특성실험을 통해 파라미터를 선정할 수 있으며 전기적 특성 실험 전 후의 시간차에 따른 파라미터 변화를 반영하는 것이 필수적이다. 제조 공정과정의 파라미터의 측정값과 특성실험 후의 파라미터 재측정값을 비교함으로써 이를 3-D Kmeans Clustering 알고리즘에 반영하여 더욱 정밀한 셀 선별을 실시하였다.

  • PDF

Nonlinear structural finite element model updating with a focus on model uncertainty

  • Mehrdad, Ebrahimi;Reza Karami, Mohammadi;Elnaz, Nobahar;Ehsan Noroozinejad, Farsangi
    • Earthquakes and Structures
    • /
    • v.23 no.6
    • /
    • pp.549-580
    • /
    • 2022
  • This paper assesses the influences of modeling assumptions and uncertainties on the performance of the non-linear finite element (FE) model updating procedure and model clustering method. The results of a shaking table test on a four-story steel moment-resisting frame are employed for both calibrations and clustering of the FE models. In the first part, simple to detailed non-linear FE models of the test frame is calibrated to minimize the difference between the various data features of the models and the structure. To investigate the effect of the specified data feature, four of which include the acceleration, displacement, hysteretic energy, and instantaneous features of responses, have been considered. In the last part of the work, a model-based clustering approach to group models of a four-story frame with similar behavior is introduced to detect abnormal ones. The approach is a composition of property derivation, outlier removal based on k-Nearest neighbors, and a K-means clustering approach using specified data features. The clustering results showed correlations among similar models. Moreover, it also helped to detect the best strategy for modeling different structural components.

Generic Document Summarization using Coherence of Sentence Cluster and Semantic Feature (문장군집의 응집도와 의미특징을 이용한 포괄적 문서요약)

  • Park, Sun;Lee, Yeonwoo;Shim, Chun Sik;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.12
    • /
    • pp.2607-2613
    • /
    • 2012
  • The results of inherent knowledge based generic summarization are influenced by the composition of sentence in document set. In order to resolve the problem, this papser propses a new generic document summarization which uses clustering of semantic feature of document and coherence of document cluster. The proposed method clusters sentences using semantic feature deriving from NMF(non-negative matrix factorization), which it can classify document topic group because inherent structure of document are well represented by the sentence cluster. In addition, the method can improve the quality of summarization because the importance sentences are extracted by using coherence of sentence cluster and the cluster refinement by re-cluster. The experimental results demonstrate appling the proposed method to generic summarization achieves better performance than generic document summarization methods.

Beta-wave Correlation Analysis Model based on Unsupervised Machine Learning (비지도학습 머신러닝에 기반한 베타파 상관관계 분석모델)

  • Choi, Sung-Ja
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.221-226
    • /
    • 2019
  • The characteristic of the beta wave among the EEG waves corresponds to the stress area of human perception. The over-bandwidth of the stress is extracted by analyzing the beta-wave correlation between the low-bandwidth and high-bandwidth. We present a KMeans clustering analysis model for unsupervised machine learning to construct an analytical model for analyzing and extracting the beta-wave correlation. The proposed model classifies the beta wave region into clusters of similar regions and identifies anomalous waveforms in the corresponding clustering category. The abnormal group of waveform clusters and the normal category leaving region are discriminated from the stress risk group. Using this model, it is possible to discriminate the degree of stress of the cognitive state through the EEG waveform, and it is possible to manage and apply the cognitive state of the individual.