• Title/Summary/Keyword: over-clustering

Search Result 389, Processing Time 0.021 seconds

Clustering Meta Information of K-Pop Girl Groups Using Term Frequency-inverse Document Frequency Vectorization (단어-역문서 빈도 벡터화를 통한 한국 걸그룹의 음반 메타 정보 군집화)

  • JoonSeo Hyeon;JaeHyuk Cho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.12-23
    • /
    • 2023
  • In the 2020s, the K-Pop market has been dominated by girl groups over boy groups and the fourth generation over the third generation. This paper presents methods and results on lyric clustering to investigate whether the generation of girl groups has started to change. We collected meta-information data for 1469 songs of 47 groups released from 2013 to 2022 and classified them into lyric information and non-lyric meta-information and quantified them respectively. The lyrics information was preprocessed by applying word-translation frequency vectorization based on previous studies and then selecting only the top vector values. Non-lyric meta-information was preprocessed and applied with One-Hot Encoding to reduce the bias of using only lyric information and show better clustering results. The clustering performance on the preprocessed data is 129%, 45% higher for Spherical K-Means' Silhouette Score and Calinski-Harabasz Score, respectively, compared to Hierarchical Clustering. This paper is expected to contribute to the study of Korean popular song development and girl group lyrics analysis and clustering.

  • PDF

A Novel Cluster Validation Index (새로운 클러스터 평가 지표)

  • Seo Suk. T.;Son Seo. H.;Lee In. G.;Jeong Hye. C.;Kwon Soon. H.
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.171-174
    • /
    • 2005
  • 기존의 클러스터 평가 지표(cluster validation index)는 클러스터의 개수가 커질수록 클러스터 평가 지표 값이 단조 감소하는 경향을 보인다. 최근에 이러한 단점을 보완하는 새로운 클러스터 평가 지표가 본 논문 저자중의 하나에 의해 제안되었으나, over-clustering의 단점 을 지니고 있다. 본 논문에서는, 클러스터 평가 지표 값이 단조 감소 및 over-clustering을 방지할 수 있는 새로운 클러스터 평가 지표를 제안하고, 여러 가지 예제를 통하여 새롭게 제안된 평가 지표의 타당성을 보인다.

  • PDF

Semantic Segmentation using Iterative Over-Segmentation and Minimum Entropy Clustering with Automatic Window Size (자동 윈도우 크기 결정 기법을 적용한 Minimum Entropy Clustering과 Iterative Over-Segmentation 기반 Semantic Segmentation)

  • Choi, Hyunguk;Song, Hyeon-Seung;Sohn, Hong-Gyoo;Jeon, Moongu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.826-829
    • /
    • 2014
  • 본 연구에서는 야외 지형 영상 및 항공 영상 등에 대하여 각각의 영역들의 속성을 분할 및 인식 하기 위해 minimum entropy clustering 기반의 군집화 기법과 over-segmentation을 반복 적용하여 군집화 하는 두 방법을 융합한 기법을 제안하였다. 이 기법들을 기반으로 각 군집의 대표 영역을 추출한 후에 학습 데이터를 기반으로 만들어진 텍스톤 사전과 학습 데이터 각각의 텍스톤 모델을 이용하여 텍스톤 히스토그램 매칭을 통해 매칭 포인트를 얻어내고 얻어낸 매칭 포인트를 기반으로 영역의 카테고리를 결정한다. 본 논문에서는 인터넷에서 얻은 일반 야외 영상들로부터 자체적으로 제작한 지형 데이터 셋을 통해 제안한 기법의 우수성을 검증하였으며, 본 실험에서는 영역을 토양, 수풀 그리고 물 지형으로 하여 영상내의 영역을 분류 및 인식하였다.

Identification of Plastic Wastes by Using Fuzzy Radial Basis Function Neural Networks Classifier with Conditional Fuzzy C-Means Clustering

  • Roh, Seok-Beom;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.6
    • /
    • pp.1872-1879
    • /
    • 2016
  • The techniques to recycle and reuse plastics attract public attention. These public attraction and needs result in improving the recycling technique. However, the identification technique for black plastic wastes still have big problem that the spectrum extracted from near infrared radiation spectroscopy is not clear and is contaminated by noise. To overcome this problem, we apply Raman spectroscopy to extract a clear spectrum of plastic material. In addition, to improve the classification ability of fuzzy Radial Basis Function Neural Networks, we apply supervised learning based clustering method instead of unsupervised clustering method. The conditional fuzzy C-Means clustering method, which is a kind of supervised learning based clustering algorithms, is used to determine the location of radial basis functions. The conditional fuzzy C-Means clustering analyzes the data distribution over input space under the supervision of auxiliary information. The auxiliary information is defined by using k Nearest Neighbor approach.

EXTENDED ONLINE DIVISIVE AGGLOMERATIVE CLUSTERING

  • Musa, Ibrahim Musa Ishag;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.406-409
    • /
    • 2008
  • Clustering data streams has an importance over many applications like sensor networks. Existing hierarchical methods follow a semi fuzzy clustering that yields duplicate clusters. In order to solve the problems, we propose an extended online divisive agglomerative clustering on data streams. It builds a tree-like top-down hierarchy of clusters that evolves with data streams using geometric time frame for snapshots. It is an enhancement of the Online Divisive Agglomerative Clustering (ODAC) with a pruning strategy to avoid duplicate clusters. Our main features are providing update time and memory space which is independent of the number of examples on data streams. It can be utilized for clustering sensor data and network monitoring as well as web click streams.

  • PDF

The Application of an HMM-based Clustering Method to Speaker Independent Word Recognition (HMM을 기본으로한 집단화 방법의 불특정화자 단어 인식에 응용)

  • Lim, H.;Park, S.-Y.;Park, M.-W.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.5-10
    • /
    • 1995
  • In this paper we present a clustering procedure based on the use of HMM in order to get multiple statistical models which can well absorb the variants of each speaker with different ways of saying words. The HMM-clustered models obtained from the developed technique are applied to the speaker independent isolated word recognition. The HMM clustering method splits off all observation sequences with poor likelihood scores which fall below threshold from the training set and create a new model out of the observation sequences in the new cluster. Clustering is iterated by classifying each observation sequence as belonging to the cluster whose model has the maximum likelihood score. If any clutter has changed from the previous iteration the model in that cluster is reestimated by using the Baum-Welch reestimation procedure. Therefore, this method is more efficient than the conventional template-based clustering technique due to the integration capability of the clustering procedure and the parameter estimation. Experimental data show that the HMM-based clustering procedure leads to $1.43\%$ performance improvements over the conventional template-based clustering method and $2.08\%$ improvements over the single HMM method for the case of recognition of the isolated korean digits.

  • PDF

Clustering In Tied Mixture HMM Using Homogeneous Centroid Neural Network (Homogeneous Centroid Neural Network에 의한 Tied Mixture HMM의 군집화)

  • Park Dong-Chul;Kim Woo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.9C
    • /
    • pp.853-858
    • /
    • 2006
  • TMHMM(Tied Mixture Hidden Markov Model) is an important approach to reduce the number of free parameters in speech recognition. However, this model suffers from a degradation in recognition accuracy due to its GPDF (Gaussian Probability Density Function) clustering error. This paper proposes a clustering algorithm, called HCNN(Homogeneous Centroid Neural network), to cluster acoustic feature vectors in TMHMM. Moreover, the HCNN uses the heterogeneous distance measure to allocate more code vectors in the heterogeneous areas where probability densities of different states overlap each other. When applied to Korean digit isolated word recognition, the HCNN reduces the error rate by 9.39% over CNN clustering, and 14.63% over the traditional K-means clustering.

Data Clustering Method Using a Modified Gaussian Kernel Metric and Kernel PCA

  • Lee, Hansung;Yoo, Jang-Hee;Park, Daihee
    • ETRI Journal
    • /
    • v.36 no.3
    • /
    • pp.333-342
    • /
    • 2014
  • Most hyper-ellipsoidal clustering (HEC) approaches use the Mahalanobis distance as a distance metric. It has been proven that HEC, under this condition, cannot be realized since the cost function of partitional clustering is a constant. We demonstrate that HEC with a modified Gaussian kernel metric can be interpreted as a problem of finding condensed ellipsoidal clusters (with respect to the volumes and densities of the clusters) and propose a practical HEC algorithm that is able to efficiently handle clusters that are ellipsoidal in shape and that are of different size and density. We then try to refine the HEC algorithm by utilizing ellipsoids defined on the kernel feature space to deal with more complex-shaped clusters. The proposed methods lead to a significant improvement in the clustering results over K-means algorithm, fuzzy C-means algorithm, GMM-EM algorithm, and HEC algorithm based on minimum-volume ellipsoids using Mahalanobis distance.

Balancing Problem of Cross-over U-shaped Assembly Line Using Bi-directional Clustering Algorithm (양방향 군집 알고리즘을 적용한 교차혼합 U자형 조립라인 균형문제)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.89-96
    • /
    • 2022
  • This paper suggests heuristic algorithm for single-model cross-over assembly line balancing problem that is a kind of NP-hard problem. The assembly line balance problem is mainly applied with metaheuristic methods, and no algorithm has been proposed to find the exact solution of polynomial time, making it very difficult to apply in practice. The proposed bi-directional clustering algorithm computes the minimum number of worker m* = ⌈W/c⌉ and goal cycle time c* = ⌈W/m*⌉ from the given total assembling time W and cycle time c. Then we assign each workstation i=1,2,…,m* to Ti=c* ±α≤ c using bi-directional clustering method. For 7 experimental data, this bi-directional clustering algorithm same performance as other methods.

A Study on Distributed Self-Reliance Wireless Sensing Mechanism for Supporting Data Transmission over Heterogeneous Wireless Networks

  • Caytiles, Ronnie D.;Park, Byungjoo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.32-38
    • /
    • 2020
  • The deployment of geographically distributed wireless sensors has greatly elevated the capability of monitoring structural health in social-overhead capital (SOC) public infrastructures. This paper deals with the utilization of a distributed mobility management (DMM) approach for the deployment of wireless sensing devices in a structural health monitoring system (SHM). Then, a wireless sensing mechanism utilizing low-energy adaptive clustering hierarchy (LEACH)-based clustering algorithm for smart sensors has been analyzed to support the seamless data transmission of structural health information which is essentially important to guarantee public safety. The clustering of smart sensors will be able to provide real-time monitoring of structural health and a filtering algorithm to boost the transmission of critical information over heterogeneous wireless and mobile networks.