• Title/Summary/Keyword: clustering feature

Search Result 444, Processing Time 0.032 seconds

Reconstruction from Feature Points of Face through Fuzzy C-Means Clustering Algorithm with Gabor Wavelets (FCM 군집화 알고리즘에 의한 얼굴의 특징점에서 Gabor 웨이브렛을 이용한 복원)

  • 신영숙;이수용;이일병;정찬섭
    • Korean Journal of Cognitive Science
    • /
    • v.11 no.2
    • /
    • pp.53-58
    • /
    • 2000
  • This paper reconstructs local region of a facial expression image from extracted feature points of facial expression image using FCM(Fuzzy C-Meang) clustering algorithm with Gabor wavelets. The feature extraction in a face is two steps. In the first step, we accomplish the edge extraction of main components of face using average value of 2-D Gabor wavelets coefficient histogram of image and in the next step, extract final feature points from the extracted edge information using FCM clustering algorithm. This study presents that the principal components of facial expression images can be reconstructed with only a few feature points extracted from FCM clustering algorithm. It can also be applied to objects recognition as well as facial expressions recognition.

  • PDF

Classification Tree-Based Feature-Selective Clustering Analysis: Case of Credit Card Customer Segmentation (분류나무를 활용한 군집분석의 입력특성 선택: 신용카드 고객세분화 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.19 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • Clustering analysis is used in various fields including customer segmentation and clustering methods such as k-means are actively applied in the credit card customer segmentation. In this paper, we summarized the input features selection method of k-means clustering for the case of the credit card customer segmentation problem, and evaluated its feasibility through the analysis results. By using the label values of k-means clustering results as target features of a decision tree classification, we composed a method for prioritizing input features using the information gain of the branch. It is not easy to determine effectiveness with the clustering effectiveness index, but in the case of the CH index, cluster effectiveness is improved evidently in the method presented in this paper compared to the case of randomly determining priorities. The suggested method can be used for effectiveness of actively used clustering analysis including k-means method.

A Study on the Musical Theme Clustering for Searching Note Sequences (음렬 탐색을 위한 주제소절 자동분류에 관한 연구)

  • 심지영;김태수
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.3
    • /
    • pp.5-30
    • /
    • 2002
  • In this paper, classification feature is selected with focus of musical content, note sequences pattern, and measures similarity between note sequences followed by constructing clusters by similar note sequences, which is easier for users to search by showing the similar note sequences with the search result in the CBMR system. Experimental document was $\ulcorner$A Dictionary of Musical Themes$\lrcorner$, the index of theme bar focused on classical music and obtained kern-type file. Humdrum Toolkit version 1.0 was used as note sequences treat tool. The hierarchical clustering method is by stages focused on four-type similarity matrices by whether the note sequences segmentation or not and where the starting point is. For the measurement of the result, WACS standard is used in the case of being manual classification and in the case of the note sequences starling from any point in the note sequences, there is used common feature pattern distribution in the cluster obtained from the clustering result. According to the result, clustering with segmented feature unconnected with the starting point Is higher with distinct difference compared with clustering with non-segmented feature.

A Comparative Study of Feature Selection Methods for Korean Web Documents Clustering (한글 웹 문서 클러스터링 성능향상을 위한 자질선정 기법 비교 연구)

  • Kim Young-Gi
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.1
    • /
    • pp.45-58
    • /
    • 2005
  • This Paper is a comparative study of feature selection methods for Korean web documents clustering. First, we focused on how the term feature and the co-link of web documents affect clustering performance. We clustered web documents by native term feature, co-link and both, and compared the output results with the originally allocated category. And we selected term features for each category using $X^2$, Information Gain (IG), and Mutual Information (MI) from training documents, and applied these features to other experimental documents. In addition we suggested a new method named Max Feature Selection, which selects terms that have the maximum count for a category in each experimental document, and applied $X^2$ (or MI or IG) values to each term instead of term frequency of documents, and clustered them. In the results, $X^2$ shows a better performance than IG or MI, but the difference appears to be slight. But when we applied the Max Feature Selection Method, the clustering Performance improved notably. Max Feature Selection is a simple but effective means of feature space reduction and shows powerful performance for Korean web document clustering.

Use of Word Clustering to Improve Emotion Recognition from Short Text

  • Yuan, Shuai;Huang, Huan;Wu, Linjing
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.4
    • /
    • pp.103-110
    • /
    • 2016
  • Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Clustering based object feature matching for multi-camera system (멀티 카메라 연동을 위한 군집화 기반의 객체 특징 정합)

  • Kim, Hyun-Soo;Kim, Gyeong-Hwan
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.915-916
    • /
    • 2008
  • We propose a clustering based object feature matching for identification of same object in multi-camera system. The method is focused on ease to system initialization and extension. Clustering is used to estimate parameters of Gaussian mixture models of objects. A similarity measure between models are determined by Kullback-Leibler divergence. This method can be applied to occlusion problem in tracking.

  • PDF

Speaker Identification Using GMM Based on Local Fuzzy PCA (국부 퍼지 클러스터링 PCA를 갖는 GMM을 이용한 화자 식별)

  • Lee, Ki-Yong
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.159-166
    • /
    • 2003
  • To reduce the high dimensionality required for training of feature vectors in speaker identification, we propose an efficient GMM based on local PCA with Fuzzy clustering. The proposed method firstly partitions the data space into several disjoint clusters by fuzzy clustering, and then performs PCA using the fuzzy covariance matrix in each cluster. Finally, the GMM for speaker is obtained from the transformed feature vectors with reduced dimension in each cluster. Compared to the conventional GMM with diagonal covariance matrix, the proposed method needs less storage and shows faster result, under the same performance.

  • PDF

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

  • Tian, Hechan;Liu, Fenlin;Luo, Xiangyang;Zhang, Fan;Qiao, Yaqiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3972-3988
    • /
    • 2020
  • Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.

Detection of Moving Objects in Crowded Scenes using Trajectory Clustering via Conditional Random Fields Framework (Conditional Random Fields 구조에서 궤적군집화를 이용한 혼잡 영상의 이동 객체 검출)

  • Kim, Hyeong-Ki;Lee, Gwang-Gook;Kim, Whoi-Yul
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.8
    • /
    • pp.1128-1141
    • /
    • 2010
  • This paper proposes a method of moving object detection in crowded scene using clustered trajectory. Unlike previous appearance based approaches, the proposed method employes motion information only to isolate moving objects. In the proposed method, feature points are extracted from input frames first and then feature tracking is followed to create feature trajectories. Based on an assumption that feature points originated from the same objects shows similar motion as the object moves, the proposed method detects moving objects by clustering trajectories of similar motions. For this purpose an energy function based on spatial proximity, motion coherence, and temporal continuity is defined to measure the similarity between two trajectories and the clustering is achieved by minimizing the energy function in CRFs (conditional random fields). Compared to previous methods, which are unable to separate falsely merged trajectories during the clustering process, the proposed method is able to rearrange the falsely merged trajectories during iteration because the clustering is solved my energy minimization in CRFs. Experiment results with three different crowded scenes show about 94% detection rate with 7% false alarm rate.