• Title/Summary/Keyword: k means cluster analysis

Search Result 370, Processing Time 0.029 seconds

A dimensional reduction method in cluster analysis for multidimensional data: principal component analysis and factor analysis comparison (다차원 데이터의 군집분석을 위한 차원축소 방법: 주성분분석 및 요인분석 비교)

  • Hong, Jun-Ho;Oh, Min-Ji;Cho, Yong-Been;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.135-143
    • /
    • 2020
  • This paper proposes a pre-processing method and a dimensional reduction method in the analysis of shopping carts where there are many correlations between variables when dividing the types of consumers in the agri-food consumer panel data. Cluster analysis is a widely used method for dividing observational objects into several clusters in multivariate data. However, cluster analysis through dimensional reduction may be more effective when several variables are related. In this paper, the food consumption data surveyed of 1,987 households was clustered using the K-means method, and 17 variables were re-selected to divide it into the clusters. Principal component analysis and factor analysis were compared as the solution for multicollinearity problems and as the way to reduce dimensions for clustering. In this study, both principal component analysis and factor analysis reduced the dataset into two dimensions. Although the principal component analysis divided the dataset into three clusters, it did not seem that the difference among the characteristics of the cluster appeared well. However, the characteristics of the clusters in the consumption pattern were well distinguished under the factor analysis method.

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

  • 허명회;이용구
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.135-144
    • /
    • 2004
  • We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.

A Study of Library Grouping using Cluster Analysis Methods (군집분석 기법을 이용한 공공도서관 그룹화에 대한 연구)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.31 no.3
    • /
    • pp.79-99
    • /
    • 2020
  • The purpose of this study is to investigate the model of cluster analysis techniques for grouping public libraries and analyze their characteristics. Statistical data of public libraries of the National Library Statistics System were used, and three models of cluster analysis were applied. As a result of the study, cluster analysis was conducted based on the size of public libraries, and it was largely divided into two clusters. The size of the cluster was largely skewed to one side. For grouping based on size, the ward method of hierarchical cluster analysis and the k-means cluster analysis model were suitable. Three suggestions were presented as implications of the grouping method of public libraries. First, it is necessary to collect library service-related data in addition to statistical data. Second, an analysis model suitable for the data set to be analyzed must be applied. Third, it is necessary to study the possibility of using cluster analysis techniques in various fields other than library grouping.

Categorization of Community Types Based on Childcare Resource Supply for Infants and Toddlers (영유아 자녀돌봄 자원 공급 수준에 따른 지역사회 유형화)

  • Soyoung Kim;Jaeeon Yoo
    • Human Ecology Research
    • /
    • v.61 no.2
    • /
    • pp.233-245
    • /
    • 2023
  • The aim of this study was to identify community-level childcare infrastructure for infants and toddlers and to use the data to categorize community types using K-Means cluster analysis with spatial constraints. Seven indicators of childcare resource supply were used for the purpose of categorization and the results revealed six types of community cluster. Communities in the Type 1 cluster provided sufficient parks, libraries, and kindergartens, but lacked pediatric facilities and private education institutions. This cluster comprised small cities and rural areas in Gangwon-do, Gyeongsangbuk-do, Chungcheongbuk-do, and Jeollabuk-do. The Type 2 cluster had numerous pediatric facilities and childcare centers, but lacked other childcare infrastructure. This comprised small and medium-sized cities in Gyeonggi-do, some areas in Chungcheongnam-do, Chungcheongbuk-do, and Gangwon-do bordering Gyeonggi-do. The Type 3 cluster comprised Busan, Daegu, and Gyeongsangnam-do, but had insufficient childcare infrastructure as a whole. Type 4 had the largest number of childcare centers, libraries, and private education institutions and comprised Jeollabuk-do, areas near Gwangju, and Jeju-do. Type 5, consisting of Seoul, Incheon and the southern part of Gyeonggi-do had many pediatric facilities and certified childcare centers, but lacked other childcare infrastructure. Type 6, being the rural areas and islands in Jeollanam-do, had sufficient kindergartens, but other infrastructure was insufficient. These results are expected to provide local government with policy implications in terms of relieving the childcare burden on residents with infants and toddlers.

DNA Marker Mining of BMS1167 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won;Kwon, Jae-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.325-333
    • /
    • 2006
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS1167 resulted in three cluster groups. We conclude that the major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 are markers 100bp, 108bp and 110bp.

  • PDF

A Major DNA Marker Mining of BMS941 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.913-921
    • /
    • 2005
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS941 resulted in three cluster groups. We conclude that the major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 are markers 80bp, 85bp 90bp and 105bp.

  • PDF

A Study of Similarity Measure Algorithms for Recomendation System about the PET Food (반려동물 사료 추천시스템을 위한 유사성 측정 알고리즘에 대한 연구)

  • Kim, Sam-Taek
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.11
    • /
    • pp.159-164
    • /
    • 2019
  • Recent developments in ICT technology have increased interest in the care and health of pets such as dogs and cats. In this paper, cluster analysis was performed based on the component data of pet food to be used in various fields of the pet industry. For cluster analysis, the similarity was analyzed by analyzing the correlation between components of 300 dogs and cats in the market. In this paper, clustering techniques such as Hierarchical, K-Means, Partitioning around medoids (PAM), Density-based, Mean-Shift are clustered and analyzed. We also propose a personalized recommendation system for pets. The results of this paper can be used for personalized services such as feed recommendation system for pets.

Multiscale Clustering and Profile Visualization of Malocclusion in Korean Orthodontic Patients : Cluster Analysis of Malocclusion

  • Jeong, Seo-Rin;Kim, Sehyun;Kim, Soo Yong;Lim, Sung-Hoon
    • International Journal of Oral Biology
    • /
    • v.43 no.2
    • /
    • pp.101-111
    • /
    • 2018
  • Understanding the classification of malocclusion is a crucial issue in Orthodontics. It can also help us to diagnose, treat, and understand malocclusion to establish a standard for definite class of patients. Principal component analysis (PCA) and k-means algorithms have been emerging as data analytic methods for cephalometric measurements, due to their intuitive concepts and application potentials. This study analyzed the macro- and meso-scale classification structure and feature basis vectors of 1020 (415 male, 605 female; mean age, 25 years) orthodontic patients using statistical preprocessing, PCA, random matrix theory (RMT) and k-means algorithms. RMT results show that 7 principal components (PCs) are significant standard in the extraction of features. Using k-means algorithms, 3 and 6 clusters were identified and the axes of PC1~3 were determined to be significant for patient classification. Macro-scale classification denotes skeletal Class I, II, III and PC1 means anteroposterior discrepancy of the maxilla and mandible and mandibular position. PC2 and PC3 means vertical pattern and maxillary position respectively; they played significant roles in the meso-scale classification. In conclusion, the typical patient profile (TPP) of each class showed that the data-based classification corresponds with the clinical classification of orthodontic patients. This data-based study can provide insight into the development of new diagnostic classifications.

Development of Web-based Intelligent Recommender Systems using Advanced Data Mining Techniques (개선된 데이터 마이닝 기술에 의한 웹 기반 지능형 추천시스템 구축)

  • Kim Kyoung-Jae;Ahn Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.12 no.3
    • /
    • pp.41-56
    • /
    • 2005
  • Product recommender system is one of the most popular techniques for customer relationship management. In addition, collaborative filtering (CF) has been known to be one of the most successful recommendation techniques in product recommender systems. However, CF has some limitations such as sparsity and scalability problems. This study proposes hybrid cluster analysis and case-based reasoning (CBR) to address these problems. CBR may relieve the sparsity problem because it recommends products using customer profile and transaction data, but it may still give rise to scalability problem. Thus, this study uses cluster analysis to reduce search space prior to CBR for scalability Problem. For cluster analysis, this study employs hybrid genetic and K-Means algorithms to avoid possibility of convergence in local minima of typical cluster analyses. This study also develops a Web-based prototype system to test the superiority of the proposed model.

  • PDF

Analysis of Characteristics of Clusters of Middle School Students Using K-Means Cluster Analysis (K-평균 군집분석을 활용한 중학생의 군집화 및 특성 분석)

  • Jaebong, Lee
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.6
    • /
    • pp.611-619
    • /
    • 2022
  • The purpose of this study is to explore the possibility of applying big data analysis to provide appropriate feedback to students using evaluation data in science education at a time when interest in educational data mining has recently increased in education. In this study, we use the evaluation data of 2,576 students who took 24 questions of the national assessment of educational achievement. And we use K-means cluster analysis as a method of unsupervised machine learning for clustering. As a result of clustering, students were divided into six clusters. The middle-ranking students are divided into various clusters when compared to upper or lower ranks. According to the results of the cluster analysis, the most important factor influencing clusterization is academic achievement, and each cluster shows different characteristics in terms of content domains, subject competencies, and affective characteristics. Learning motivation is important among the affective domains in the lower-ranking achievement cluster, and scientific inquiry and problem-solving competency, as well as scientific communication competency have a major influence in terms of subject competencies. In the content domain, achievement of motion and energy and matter are important factors to distinguish the characteristics of the cluster. As a result, we can provide students with customized feedback for learning based on the characteristics of each cluster. We discuss implications of these results for science education, such as the possibility of using this study results, balanced learning by content domains, enhancement of subject competency, and improvement of scientific attitude.