• 제목/요약/키워드: k means cluster analysis

검색결과 370건 처리시간 0.021초

대학 강의평가에서 문항 추출에 관한 연구 (A Study on Effective Selection of University Lecture Evaluation)

  • 황세명;김인택
    • 공학교육연구
    • /
    • 제8권1호
    • /
    • pp.31-45
    • /
    • 2005
  • 본 논문에서는, 강의 평가에 필요한 설문을 효과적이며 체계적으로 얻기 위한, 대표 문항 추출 방법을 비교하였다. 비교에 사용한 방법은 요인분석(Factor Analysis: FA), FCM(Fuzzy c-Means) 알고리즘과 군집분석(Cluster Analysis : CA) 등으로 이러한 방법들을 사용하여 고려할 수 있는 다양한 형태의 많은 문항들로부터 적은 수의 문항을 추출한다. 추출된 문항은 많은 수의 문항들이 형성하는 클러스터의 대표 문항을 이루고 있다. 이를 위해 여러 개의 설문지로부터 얻은 120 문항의 강의 평가서를 명지대학교 외 3 개 대학교 646명의 학생들에게 평가를 실시하여 데이터를 얻었는데 학생들은 주어진 문항에 대하여 "매우 그렇다", "그렇다", "보통이다", "그렇지 않다", "매우 그렇지 않다", 그리고 "해당 없다"까지의 6등급으로 응답하였다. 각 문항에 대한 학생들의 응답 성향을 분석하여 약 25문항을 추출하였다. 실험 결과 본 논문에서 비교 분석한 요인분석, FCM알고리즘과 군집분석 등의 기법은 매우 유사한 설문을 추출할 수 있었다.

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 ISIS 2003
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

Classification of Healthy Family Indicators in Indonesia Based on a K-means Cluster Analysis

  • Herti Maryani;Anissa Rizkianti;Nailul Izza
    • Journal of Preventive Medicine and Public Health
    • /
    • 제57권3호
    • /
    • pp.234-241
    • /
    • 2024
  • Objectives: Health development is a key element of national development. The goal of improving health development at the societal level will be readily achieved if it is directed from the smallest social unit, namely the family. This was the goal of the Healthy Indonesia Program with a Family Approach. The objective of the study was to analyze variables of family health indicators across all provinces in Indonesia to identify provincial disparities based on the status of healthy families. Methods: This study examined secondary data for 2021 from the Indonesia Health Profile, provided by the Ministry of Health of the Republic of Indonesia, and from the 2021 welfare statistics by Statistics Indonesia (BPS). From these sources, we identified 10 variables for analysis using the k-means method, a non-hierarchical method of cluster analysis. Results: The results of the cluster analysis of healthy family indicators yielded 5 clusters. In general, cluster 1 (Papua and West Papua Provinces) had the lowest average achievements for healthy family indicators, while cluster 5 (Jakarta Province) had the highest indicator scores. Conclusions: In Indonesia, disparities in healthy family indicators persist. Nutrition, maternal health, and child health are among the indicators that require government attention.

Sensitivity Enhancement of RF Plasma Etch Endpoint Detection With K-means Cluster Analysis

  • Lee, Honyoung;Jang, Haegyu;Lee, Hak-Seung;Chae, Heeyeop
    • 한국진공학회:학술대회논문집
    • /
    • 한국진공학회 2015년도 제49회 하계 정기학술대회 초록집
    • /
    • pp.142.2-142.2
    • /
    • 2015
  • Plasma etch endpoint detection (EPD) of SiO2 and PR layer is demonstrated by plasma impedance monitoring in this work. Plasma etching process is the core process for making fine pattern devices in semiconductor fabrication, and the etching endpoint detection is one of the essential FDC (Fault Detection and Classification) for yield management and mass production. In general, Optical emission spectrocopy (OES) has been used to detect endpoint because OES can be a simple, non-invasive and real-time plasma monitoring tool. In OES, the trend of a few sensitive wavelengths is traced. However, in case of small-open area etch endpoint detection (ex. contact etch), it is at the boundary of the detection limit because of weak signal intensities of reaction reactants and products. Furthemore, the various materials covering the wafer such as photoresist (PR), dielectric materials, and metals make the analysis of OES signals complicated. In this study, full spectra of optical emission signals were collected and the data were analyzed by a data-mining approach, modified K-means cluster analysis. The K-means cluster analysis is modified suitably to analyze a thousand of wavelength variables from OES. This technique can improve the sensitivity of EPD for small area oxide layer etching processes: about 1.0 % oxide area. This technique is expected to be applied to various plasma monitoring applications including fault detections as well as EPD.

  • PDF

군집분석을 이용한 산촌경관 유형 구분 및 특성 분석 (Classification and Characteristic analysis of Mountain Village Landscape Using Cluster Analysis)

  • 고아랑;임정우;김성학
    • 농촌계획
    • /
    • 제26권1호
    • /
    • pp.101-112
    • /
    • 2020
  • Recently, public awareness regarding mountain villages' landscapes is increasing. Thus, this study aimed to provide standards for conservation, management and creation of mountain village landscape by characterizing and classifying those exist. 286 mountain villages' data were collected and 19 variables - extracted from GIS spatial information and statistic data of mountain villages, chosen as right sources according to former studies - were utilized to conduct factor and cluster analysis. As a result of the factor analysis, 7 characteristics of the mountain villages' landscapes were defined - 'Location', 'Cultivation', 'Ecology·Nature', 'Tourism', 'Residence', 'Recreation'. The K-means cluster analysis categorized the mountain villages' landscapes into four types - 'Residential', 'Touristic', 'General', 'Environmentally protected'. The classification was examined to be appropriate by field assessment, and basic guidelines of mountain village landscape management were set. The results of this study are expected to be utilized planning and implementing regarding mountain village landscape in the future.

전력데이터 패턴 추출의 효율성 향상을 위한 변형된 K-means 기반의 분석 프로세스 (Analysis Process based on Modify K-means for Efficiency Improvement of Electric Power Data Pattern Detection)

  • 정세훈;신창선;조용윤;박장우;박명혜;김영현;이승배;심춘보
    • 한국멀티미디어학회논문지
    • /
    • 제20권12호
    • /
    • pp.1960-1969
    • /
    • 2017
  • There have been ongoing researches to identify and analyze the patterns of electric power IoT data inside sensor nodes to supplement the stable supply of power and the efficiency of energy consumption. This study set out to propose an analysis process for electric power IoT data with the K-means algorithm, which is an unsupervised learning technique rather than a supervised one. There are a couple of problems with the old K-means algorithm, and one of them is the selection of cluster number K in a heuristic or random method. That approach is proper for the age of standardized data. The investigator proposed an analysis process of selecting an automated cluster number K through principal component analysis and the space division of normal distribution and incorporated it into electric power IoT data. The performance evaluation results show that it recorded a higher level of performance than the old algorithm in the cluster classification and analysis of pitches and rolls included in the communication bodies of utility poles.

초발 정신병 환자에서 기저핵 구조물 부피의 패턴분석 (Pattern Analysis of Volume of Basal Ganglia Structures in Patients with First-Episode Psychosis)

  • 민세리;이태영;곽유빈;권준수
    • 생물정신의학
    • /
    • 제25권2호
    • /
    • pp.38-43
    • /
    • 2018
  • Objectives Dopamine dysregulation has been regarded as one of the core pathologies in patients with schizophrenia. Since dopamine synthesis capacity has found to be inconsistent in patients with schizophrenia, current classification of patients based on clinical symptoms cannot reflect the neurochemical heterogeneity of the disease. Here we performed new subtyping of patients with first-episode psychosis (FEP) through biotype-based cluster analysis. We specifically suggested basal ganglia structural changes as a biotype, which deeply involves in the dopaminergic circuit. Methods Forty FEP and 40 demographically matched healthy participants underwent 3T T1 MRI. Whole brain parcellation was conducted, and volumes of total 6 regions of basal ganglia have been extracted as features for cluster analysis. We used K-means clustering, and external validation was conducted with Positive and Negative Syndrome Scale (PANSS). Results K-means clustering divided 40 FEP subjects into 2 clusters. Cluster 1 (n = 25) showed substantial volume decrease in 4 regions of basal ganglia compared to Cluster 2 (n = 15). Cluster 1 showed higher positive scales of PANSS compared with Cluster 2 (F = 2.333, p = 0.025). Compared to healthy controls, Cluster 1 showed smaller volumes in 4 regions, whereas Cluster 2 showed larger volumes in 3 regions. Conclusions Two subgroups have been found by cluster analysis, which showed a distinct difference in volume patterns of basal ganglia structures and positive symptom severity. The result possibly reflects the neurobiological heterogeneity of schizophrenia. Thus, the current study supports the importance of paradigm shift toward biotype-based diagnosis, instead of phenotype, for future precision psychiatry.

  • PDF

Dunn 지수를 이용한 최적 강수지역 군집수 분석 (The Analysis of Optimal Cluster Number of Precipitation Region with Dunn Index)

  • 엄명진;정창삼;남우성;정영훈;허준행
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2011년도 학술발표회
    • /
    • pp.87-91
    • /
    • 2011
  • 강수는 지역에 따라 발생양상이 매우 다른 자연현상 중 하나이다. 이러한 강수를 효과적으로 분석하여 확률강수량을 산정하기위해서 수문학에서는 다양한 방법이 시도되어 왔다. 우리나라에서는 지점빈도해석을 통한 확률강수량을 주로 사용해왔으나 최근 들어 Hosking and Wallis(1997)가 제안한 지역빈도해석을 활용을 적극 도모 하고 있는 중이다. 이러한 지역빈도해석 기법은 지점빈도해석 기법에 비하여 한정된 강수자료를 활용하는 측면 등 여러 가지 장점을 가진 확률 강수량 산정방법이다. 그러나 이 기법을 적용하여 확률강수량을 산정하기 위해서는 강수의 지역구분을 먼저 수행하여야 한다. 강수지역의 구분을 위해서는 여러 가지 기법이 존재하나 최근에는 Cluster 기법 중 K-means 방법이나 Fuzzy c-means 방법 등을 주로 적용하여 지역구분을 수행하고 있다. 그러나 K-means 방법이나 Fuzzy c-means 방법 등은 산정 방법내에서 최적 군집수를 결정할 수 있는 알고리즘이 없기 때문에 임의적으로 최적 군집수를 결정하여야 한다. 본 연구에서는 이러한 단점을 극복하기 위하여 Cluster 평가지수 중 하나인 Dunn 지수를 이용하여 최적 군집수를 제시하고자 한다. 본 연구에서 강수지역을 구분하기 위하여 적용한 인자는 월 평균 강수량, 연 평균 강수량, 월 최대 강수량, 경도, 위도, 고도 등이며, 이를 K-means, PAM 및 친근도 전파 기법을 통하여 강수지역을 구분하였다. 적정 군집수를 임의적으로 증가시켜 가면서 Dunn 지수를 산정하였다. 산정된 결과를 통하여 최적 군집수를 결정하였다.

  • PDF

무감독분류 기법에 의한 부분방전 데이터 분석 (Partial Discharge Data Analysis with Unsupervised Classification)

  • 조경순;홍선학
    • 디지털산업정보학회논문지
    • /
    • 제14권4호
    • /
    • pp.9-16
    • /
    • 2018
  • This study described partial discharge(PD) distribution analysis between the XLPE(Cross-Linked PolyEthylene)and EPDM(Ethylene Propylene Diene Monomer) interface with unsupervised classification. The ${\phi}-q-n$ patterns were analyzed using phase resolved partial discharge(PRPD). K-means cluster analysis forms a cluster based on similarities and distances among scattered individuals, and analyzes the characteristics of the formed clusters, dividing the multivariate data into several groups according to the similarity of each characteristic, Is a statistical analysis that makes it easier to navigate. It was confirmed that the phase angle of the cluster with the maximum discharge charge was concentrated around $0^{\circ}$ and $180^{\circ}$ at 30 kV after the initial phase distribution localized around $90^{\circ}$ and $300^{\circ}$ expanded to the whole phase angle according to the voltage rise. The Euclidean distance between the center of gravity and the discharge charge in the ${\Phi}-q$ cluster increased with increasing applied voltage.

Assessing the Differences in Korean View on National Economic Policy with Factor and Cluster Analysis

  • Kim, Hee-Jae;Yun, Young-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권2호
    • /
    • pp.451-461
    • /
    • 2008
  • In this study, factor and cluster analysis have been conducted to group the differences in Korean view on national economic policy in the sample of the 2006 Korean General Social Survey (KGSS). According to the 2006 KGSS, the 6 items with a 5-point Likert scale include the questions about whether or the extent to which each respondent supports the specific types of governmental economic policy. In our study, at first, the factor analysis has converted the original 6 items into the 3 composite variables that account for 81% in the total variability. As the second step of factor analysis, factor scores have been computed. Then, the K-means cluster analysis based on the factor scores has been conducted to group the survey respondents into the 3 clusters. In particular, the cross-tabulation analysis has shown that the distribution of the 3 clusters varies with the respondents' socio-demographic characteristics.

  • PDF