• Title/Summary/Keyword: 군집분

Search Result 224, Processing Time 0.047 seconds

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm (k-Modes 분할 알고리즘에 의한 군집의 상관정보 기반 빅데이터 분석)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.157-164
    • /
    • 2015
  • This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.

Document Clustering using Clustering and Wikipedi (군집과 위키피디아를 이용한 문서군집)

  • Park, Sun;Lee, Seong Ho;Park, Hee Man;Kim, Won Ju;Kim, Dong Jin;Chandra, Abel;Lee, Seong Ro
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.392-393
    • /
    • 2012
  • This paper proposes a new document clustering method using clustering and Wikipedia. The proposed method can well represent the concept of cluster topics by means of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of Wikipedia. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

  • PDF

Feature Extraction by Line-clustering Segmentation Method (선군집분할방법에 의한 특징 추출)

  • Hwang Jae-Ho
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.401-408
    • /
    • 2006
  • In this paper, we propose a new class of segmentation technique for feature extraction based on the statistical and regional classification at each vertical or horizontal line of digital image data. Data is processed and clustered at each line, different from the point or space process. They are designed to segment gray-scale sectional images using a horizontal and vertical line process due to their statistical and property differences, and to extract the feature. The techniques presented here show efficient results in case of the gray level overlap and not having threshold image. Such images are also not easy to be segmented by the global or local threshold methods. Line pixels inform us the sectionable data, and can be set according to cluster quality due to the differences of histogram and statistical data. The total segmentation on line clusters can be obtained by adaptive extension onto the horizontal axis. Each processed region has its own pixel value, resulting in feature extraction. The advantage and effectiveness of the line-cluster approach are both shown theoretically and demonstrated through the region-segmental carotid artery medical image processing.

Fast VQ Codebook Design by Sucessively Bisectioning of Principle Axis (주축의 연속적 분할을 통한 고속 벡터 양자화 코드북 설계)

  • Kang, Dae-Seong;Seo, Seok-Bae;Kim, Dai-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.422-431
    • /
    • 2000
  • This paper proposes a new codebook generation method, called a PCA-Based VQ, that incorporates the PCA (Principal Component Analysis) technique into VQ (Vector Quantization) codebook design. The PCA technique reduces the data dimensions by transforming input image vectors into the feature vectors. The cluster of feature vectors in the transformed domain is bisectioned into two subclusters by an optimally chosen partitioning hyperplane. We expedite the searching of the optimal partitioning hyperplane that is the most time consuming process by considering that (1) the optimal partitioning hyperplane is perpendicular to the first principal axis of the feature vectors, (2) it is located on the equilibrium point of the left and right cluster's distortions, and (3) the left and right cluster's distortions can be adjusted incrementally. This principal axis bisectioning is successively performed on the cluster whose difference of distortion between before and after bisection is the maximum among the existing clusters until the total distortion of clusters becomes as small as the desired level. Simulation results show that the proposed PCA-based VQ method is promising because its reconstruction performance is as good as that of the SOFM (Self-Organizing Feature Maps) method and its codebook generation is as fast as that of the K-means method.

  • PDF

Enhancing Document Clustering using Important Term of Cluster and Wikipedia (군집의 중요 용어와 위키피디아를 이용한 문서군집 향상)

  • Park, Sun;Lee, Yeon-Woo;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.45-52
    • /
    • 2012
  • This paper proposes a new enhancing document clustering method using the important terms of cluster and the wikipedia. The proposed method can well represent the concept of cluster topics by means of selecting the important terms in cluster by the semantic features of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of wikipedia. Also, it can improve the quality of document clustering which uses the expanded cluster important terms to refine the initial cluster by re-clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

Nonparametric clustering of functional time series electricity consumption data (전기 사용량 시계열 함수 데이터에 대한 비모수적 군집화)

  • Kim, Jaehee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.149-160
    • /
    • 2019
  • The electricity consumption time series data of 'A' University from July 2016 to June 2017 is analyzed via nonparametric functional data clustering since the time series data can be regarded as realization of continuous functions with dependency structure. We use a Bouveyron and Jacques (Advances in Data Analysis and Classification, 5, 4, 281-300, 2011) method based on model-based functional clustering with an FEM algorithm that assumes a Gaussian distribution on functional principal components. Clusterwise analysis is provided with cluster mean functions, densities and cluster profiles.

Effects of Organic Matter Application on Soil Microbial Community in a Newly Reclaimed Soil (신규 유기농경지 토양의 유기물 공급이 토양 미생물군집에 미치는 영향)

  • An, Nan-Hee;Ok, Jung-Hun;Cho, Jung-Lai;Shin, Jae-Hoon;Nam, Hong-Sik;Kim, Seok-Cheol
    • Korean Journal of Organic Agriculture
    • /
    • v.23 no.4
    • /
    • pp.767-779
    • /
    • 2015
  • soil microbial activities and diversities in a newly reclaimed soil. Soil chemical properties, population of microbe, microbial biomass, and properties of microbial community were investigated under 4 different treatment (animal manure compost+green manure, chemical fertilizer, and without fertilizer). The experiment was conducted for 3 years from 2012 to 2014. The most of chemical properties in the animal manure compost+green manure treatment were increased continually compare to chemical fertilizer and without fertilizer. The population of bacteria and fungi were higher in the animal manure compost+green manure treatment, however, there was no difference on actinomyces. Soil microbial biomass C content was higher in the animal manure compost+green manure treatment than in chemical fertilizer and without fertilizer. Biolog examination showed that catabolic diversities of bacterial communities were higher in the treatment of animal manure compost+green manure. It was showed that principle component analysis of the Biolog data differentiated the organic matter amended soils from NPK and control. These results indicated that application of animal manure compost+green manure had a beneficial effect on soil microbial properties.

A Classified Space VQ Design for Text-Independent Speaker Recognition (문맥 독립 화자인식을 위한 공간 분할 벡터 양자기 설계)

  • Lim, Dong-Chul;Lee, Hanig-Sei
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.673-680
    • /
    • 2003
  • In this paper, we study the enhancement of VQ (Vector Quantization) design for text independent speaker recognition. In a concrete way, we present a non-iterative method which makes a vector quantization codebook and this method performs non-iterative learning so that the computational complexity is epochally reduced The proposed Classified Space VQ (CSVQ) design method for text Independent speaker recognition is generalized from Semi-noniterative VQ design method for text dependent speaker recognition. CSVQ contrasts with the existing desiEn method which uses the iterative learninE algorithm for every traininE speaker. The characteristics of a CSVQ design is as follows. First, the proposed method performs the non-iterative learning by using a Classified Space Codebook. Second, a quantization region of each speaker is equivalent for the quantization region of a Classified Space Codebook. And the quantization point of each speaker is the optimal point for the statistical distribution of each speaker in a quantization region of a Classified Space Codebook. Third, Classified Space Codebook (CSC) is constructed through Sample Vector Formation Method (CSVQ1, 2) and Hyper-Lattice Formation Method (CSVQ 3). In the numerical experiment, we use the 12th met-cepstrum feature vectors of 10 speakers and compare it with the existing method, changing the codebook size from 16 to 128 for each Classified Space Codebook. The recognition rate of the proposed method is 100% for CSVQ1, 2. It is equal to the recognition rate of the existing method. Therefore the proposed CSVQ design method is, reducing computational complexity and maintaining the recognition rate, new alternative proposal and CSVQ with CSC can be applied to a general purpose recognition.