• 제목/요약/키워드: Clustering test

검색결과 377건 처리시간 0.018초

Nonparametric analysis of income distributions among different regions based on energy distance with applications to China Health and Nutrition Survey data

  • Ma, Zhihua;Xue, Yishu;Hu, Guanyu
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.57-67
    • /
    • 2019
  • Income distribution is a major concern in economic theory. In regional economics, it is often of interest to compare income distributions in different regions. Traditional methods often compare the income inequality of different regions by assuming parametric forms of the income distributions, or using summary statistics like the Gini coefficient. In this paper, we propose a nonparametric procedure to test for heterogeneity in income distributions among different regions, and a K-means clustering procedure for clustering income distributions based on energy distance. In simulation studies, it is shown that the energy distance based method has competitive results with other common methods in hypothesis testing, and the energy distance based clustering method performs well in the clustering problem. The proposed approaches are applied in analyzing data from China Health and Nutrition Survey 2011. The results indicate that there are significant differences among income distributions of the 12 provinces in the dataset. After applying a 4-means clustering algorithm, we obtained the clustering results of the income distributions in the 12 provinces.

An Improved K-means Document Clustering using Concept Vectors

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권4호
    • /
    • pp.853-861
    • /
    • 2003
  • An improved K-means document clustering method has been presented, where a concept vector is manipulated for each cluster on the basis of cosine similarity of text documents. The concept vectors are unit vectors that have been normalized on the n-dimensional sphere. Because the standard K-means method is sensitive to initial starting condition, our improvement focused on starting condition for estimating the modes of a distribution. The improved K-means clustering algorithm has been applied to a set of text documents, called Classic3, to test and prove efficiency and correctness of clustering result, and showed 7% improvements in its worst case.

  • PDF

지식 분류의 자동화를 위한 클러스터링 모형 연구 (Development of a Clustering Model for Automatic Knowledge Classification)

  • 정영미;이재윤
    • 정보관리학회지
    • /
    • 제18권2호
    • /
    • pp.203-230
    • /
    • 2001
  • 본 연구에서는 문헌을 기반으로 한 지식의 자동분류를 위해 최적의 클러스터링 모형을 제시하고자 하였다. 클러스터링 실험을 위해서 신문기사 실험집단과 학술논문 초록 실험집단을 구축하였고, 분류 성능 평가 척도인 WACS를 개발하였다. 분류자질로 사용한 용어의 집합은 다양한 자질 축소 기준을 적용하여 생성하였으며, 다양한 용어 가중치를 사용하였다. 유사계수 공식으로는 코사인 계수와 자카드 계수를 적용하였으며, 클러스터링 알고리즘으로는 비계층적 기법인 완전연결 기법과 계층적 기법인 K-means기법을 각각 사용하였다. 실험 결과 신문기사 원문 집단에서의 성능이 좋았으며, 완전연결 기법의 성능이 K-means 기법보다 높게 나타났다. 역문헌빈도의 적용은 완전연결 클러스터링에서는 긍정적인 효과가 나타났으나, K-means 클러스터링에서는 그렇지 못했다. 분류자질은 전체의 7.66%만 사용하였을 경우에도 성능 저하가 크지 않았으며, K-means 클러스터링에서는 오히려 성능 향상 효과가 있었다.

  • PDF

Metastasis Related Gene Exploration Using TwoStep Clustering for Medulloblastoma Microarray Data

  • Ban, Sung-Su;Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 추계학술대회
    • /
    • pp.153-159
    • /
    • 2005
  • Microarray gene expression technology has applications that could refine diagnosis and therapeutic monitoring as well as improve disease prevention through risk assessment and early detection. Especially, microarray expression data can provide important information regarding specific genes related with metastasis through an appropriate analysis. Various methods for clustering analysis microarray data have been introduced so far. We used twostep clustering fot ascertain metastasis related gene through t-test. Through t-test between two groups for two publicly available medulloblastoma microarray data sets, we intended to find significant gene for metastasis. The paper describes the process in detail showing how the process is applied to clustering analysis and t-test for microarray datasets and how the metastasis-associated genes are explorated.

  • PDF

국부 확률을 이용한 데이터 분류에 관한 연구 (A Study on Data Clustering Method Using Local Probability)

  • 손창호;최원호;이재국
    • 제어로봇시스템학회논문지
    • /
    • 제13권1호
    • /
    • pp.46-51
    • /
    • 2007
  • In this paper, we propose a new data clustering method using local probability and hypothesis theory. To cluster the test data set we analyze the local area of the test data set using local probability distribution and decide the candidate class of the data set using mean standard deviation and variance etc. To decide each class of the test data, statistical hypothesis theory is applied to the decided candidate class of the test data set. For evaluating, the proposed classification method is compared to the conventional fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm. The simulation results show more accuracy than results of fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm.

Nonlinear structural finite element model updating with a focus on model uncertainty

  • Mehrdad, Ebrahimi;Reza Karami, Mohammadi;Elnaz, Nobahar;Ehsan Noroozinejad, Farsangi
    • Earthquakes and Structures
    • /
    • 제23권6호
    • /
    • pp.549-580
    • /
    • 2022
  • This paper assesses the influences of modeling assumptions and uncertainties on the performance of the non-linear finite element (FE) model updating procedure and model clustering method. The results of a shaking table test on a four-story steel moment-resisting frame are employed for both calibrations and clustering of the FE models. In the first part, simple to detailed non-linear FE models of the test frame is calibrated to minimize the difference between the various data features of the models and the structure. To investigate the effect of the specified data feature, four of which include the acceleration, displacement, hysteretic energy, and instantaneous features of responses, have been considered. In the last part of the work, a model-based clustering approach to group models of a four-story frame with similar behavior is introduced to detect abnormal ones. The approach is a composition of property derivation, outlier removal based on k-Nearest neighbors, and a K-means clustering approach using specified data features. The clustering results showed correlations among similar models. Moreover, it also helped to detect the best strategy for modeling different structural components.

DNA Marker Mining of BMS1167 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won;Kwon, Jae-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.325-333
    • /
    • 2006
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS1167 resulted in three cluster groups. We conclude that the major DNA markers of BMS1167 microsatellite locus in Hanwoo chromosome 17 are markers 100bp, 108bp and 110bp.

  • PDF

A Major DNA Marker Mining of BMS941 Microsatellite Locus in Hanwoo Chromosome 17

  • Lee, Jea-Young;Lee, Yong-Won
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.913-921
    • /
    • 2005
  • We describe tests for detecting and locating quantitative traits loci (QTL) for traits in Hanwoo. Lod scores and a permutation test have been described. From results of a permutation test to detect QTL, we select major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 for further analysis. K-means clustering analysis applied to four traits and eight DNA markers in BMS941 resulted in three cluster groups. We conclude that the major DNA markers of BMS941 microsatellite locus in Hanwoo chromosome 17 are markers 80bp, 85bp 90bp and 105bp.

  • PDF

시간단위 전력사용량 시계열 패턴의 군집 및 분류분석 (Clustering and classification to characterize daily electricity demand)

  • 박다인;윤상후
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권2호
    • /
    • pp.395-406
    • /
    • 2017
  • 전력 공급 시스템의 효율적인 운영을 위해 전력수요예측은 필수적이다. 본 연구에서는 군집분석과 분류분석을 이용하여 일 단위 시간별 전력수요량 시계열 패턴의 유형을 살펴보고자 한다. 전력거래소에서 수집된 2008년 1월 1일부터 2012년 12월 31일까지의 일 단위 시간별 전력수요량 데이터를 추세성분, 계절성분, 오차 성분으로 구성된 시계열 자료로 변환하여 사용하였다. 추세성분을 제거한 시계열 자료의 패턴을 구분하기 위한 군집 분석방법은 k-평균 군집분석 (k-means), 가우시안혼합모델 혼합 모델 군집분석 (Gaussian mixture model), 함수적 군집분석 (functional clustering)을 고려하였다. 주성분분석을 통해 24시간 자료를 2개의 요인로 축소한 후 k-평균 군집분석과 가우시안 혼합 모델, 함수적 군집분석을 수행하였다. 군집분석 결과를 토대로 2008년부터 2011년까지 총 4년간 데이터를 4가지 분류분석방법인 의사결정나무, RF (random forest), Naive bayes, SVM (support vector machine)을 통해 훈련시켜 2012년 군집을 예측하였다. 분석 결과 가우시안 혼합 분포기반 군집분석과 RF를 이용한 군집예측 결과의 성능이 가장 우수하였다.

OPTIMIZATION OF THE TEST INTERVALS OF A NUCLEAR SAFETY SYSTEM BY GENETIC ALGORITHMS, SOLUTION CLUSTERING AND FUZZY PREFERENCE ASSIGNMENT

  • Zio, E.;Bazzo, R.
    • Nuclear Engineering and Technology
    • /
    • 제42권4호
    • /
    • pp.414-425
    • /
    • 2010
  • In this paper, a procedure is developed for identifying a number of representative solutions manageable for decision-making in a multiobjective optimization problem concerning the test intervals of the components of a safety system of a nuclear power plant. Pareto Front solutions are identified by a genetic algorithm and then clustered by subtractive clustering into "families". On the basis of the decision maker's preferences, each family is then synthetically represented by a "head of the family" solution. This is done by introducing a scoring system that ranks the solutions with respect to the different objectives: a fuzzy preference assignment is employed to this purpose. Level Diagrams are then used to represent, analyze and interpret the Pareto Fronts reduced to the head-of-the-family solutions.