• Title/Summary/Keyword: kmeans 군집

Search Result 6, Processing Time 0.027 seconds

Predicting Learning Achievement Using Big Data Cluster Analysis - Focusing on Longitudinal Study (빅데이터 군집 분석을 이용한 학습성취도 예측 - 종단 연구를 중심으로)

  • Ko, Sujeong
    • Journal of Digital Contents Society
    • /
    • v.19 no.9
    • /
    • pp.1769-1778
    • /
    • 2018
  • As the value of using Big Data is increasing, various researches are being carried out utilizing big data analysis technology in the field of education as well as corporations. In this paper, we propose a method to predict learning achievement using big data cluster analysis. In the proposed method, students in Korea Children and Youth Panel Survey(KCYPS) are classified into groups with similar learning habits using the Kmeans algorithm based on the learning habits of students of the first year at middle school, and group features are extracted. Next, using the extracted features of groups, the first grade students at the middle school in the test group were classified into groups having similar learning habits using the cosine similarity, and then the neighbors were selected and the learning achievement was predicted. The method proposed in this paper has proved that the learning habits at middle school are closely related to at the university, and they make it possible to predict the learning achievement at high school and the satisfaction with university and major.

Generic Document Summarization using Coherence of Sentence Cluster and Semantic Feature (문장군집의 응집도와 의미특징을 이용한 포괄적 문서요약)

  • Park, Sun;Lee, Yeonwoo;Shim, Chun Sik;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.12
    • /
    • pp.2607-2613
    • /
    • 2012
  • The results of inherent knowledge based generic summarization are influenced by the composition of sentence in document set. In order to resolve the problem, this papser propses a new generic document summarization which uses clustering of semantic feature of document and coherence of document cluster. The proposed method clusters sentences using semantic feature deriving from NMF(non-negative matrix factorization), which it can classify document topic group because inherent structure of document are well represented by the sentence cluster. In addition, the method can improve the quality of summarization because the importance sentences are extracted by using coherence of sentence cluster and the cluster refinement by re-cluster. The experimental results demonstrate appling the proposed method to generic summarization achieves better performance than generic document summarization methods.

Analysis of Radioactive Contamination Normal Level of Numerical Isotope using Clustering Methods (클러스터링 방법을 이용한 방사능 정상수치의 동위원소별 오염 분석)

  • Jung, Yong-Gyu;Choi, Jung-Ah;Cha, Byung-Heun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.41-46
    • /
    • 2014
  • As the radioactive-related incidents have been occurred frequently such as Fukushima nuclear exposure incident, it is always considered radioactivity normal levels in radiation exposure as a most risk components at several government agencies. In this paper, the data were analyzed by information in the data beyond range of the attributes. The clustering analysis method is used by EM and SimpleKMeans algorithm. The experimental results about US Radioactive associated data is depending on the method of data analysis. It can be seen that the method of the algorithm is different depending on local value of the normal range. The governments need to pay attention to increase the investigation frequency.

RHadoop platform for K-Means clustering of big data (빅데이터 K-평균 클러스터링을 위한 RHadoop 플랫폼)

  • Shin, Ji Eun;Oh, Yoon Sik;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.609-619
    • /
    • 2016
  • RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. In this paper, we implement K-Means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. The main idea introduces a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. We showed that our K-Means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases. We also implemented Elbow method with MapReduce for finding the optimum number of clusters for K-Means clustering on large dataset. Comparison with our MapReduce implementation of Elbow method and classical kmeans() in R with small data showed similar results.

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.

Cluster Analysis of the 1000-hPa Height Field around the Korean Peninsula (한반도 주변 1000-hPa 고도장의 군집분석)

  • Jeong, Young-Kun
    • Journal of the Korean earth science society
    • /
    • v.33 no.4
    • /
    • pp.337-349
    • /
    • 2012
  • In this study, we classify the 1000 hPa geopotential height fields around the Korean peninsula through the Kmeans cluster analysis and investigate the occurrence characteristics of each cluster pattern. The 11 clusters are identified as the typical pressure patterns, applying the pattern correlation as a similarity among clusters and the criterion of cluster similarity 0.8, of which three pressure patterns are associated with the extension of Siberia air mass, other three with the latitudes of the longest symmetry axis of North Pacific highs, two with the trough largely under the air mass of Siberia or North Pacific, and the remaining three, the migratory high patterns generally occurring in spring and autumn, are disjointed according to the direction of the longest symmetry axis of highs. The occurrence rate of air masses affecting the Korean peninsula, estimated from the number of occurrence days of 11 pressure patterns, is 55.4% Siberian, 29.3% North Pacific, 12.8% Yangtze-River, 2.5% Okhotsk sea and 68.2% of all these is the continental air masses. The wintertime pressure patterns around the Korean peninsula are nearly contrary to those in summertime, each dominated by the highs extended from the stationary air masses over the Central Siberia and the North Pacific ocean. The migratory highs occur largely in spring and autumn while transferring from the wintertime patterns to summertime patterns, or vice versa. Recently, the occurrence frequency of the highs extended from the North Pacific is on the decrease and while the wintertime pressure patterns occur frequently in spring and autumn, the occurrence frequency of the pressure patterns with trough is on the increase and the migratory highs occur in nearly all seasons.