• 제목/요약/키워드: Feature clustering

검색결과 444건 처리시간 0.03초

Hardware Accelerated Design on Bag of Words Classification Algorithm

  • Lee, Chang-yong;Lee, Ji-yong;Lee, Yong-hwan
    • Journal of Platform Technology
    • /
    • 제6권4호
    • /
    • pp.26-33
    • /
    • 2018
  • In this paper, we propose an image retrieval algorithm for real-time processing and design it as hardware. The proposed method is based on the classification of BoWs(Bag of Words) algorithm and proposes an image search algorithm using bit stream. K-fold cross validation is used for the verification of the algorithm. Data is classified into seven classes, each class has seven images and a total of 49 images are tested. The test has two kinds of accuracy measurement and speed measurement. The accuracy of the image classification was 86.2% for the BoWs algorithm and 83.7% the proposed hardware-accelerated software implementation algorithm, and the BoWs algorithm was 2.5% higher. The image retrieval processing speed of BoWs is 7.89s and our algorithm is 1.55s. Our algorithm is 5.09 times faster than BoWs algorithm. The algorithm is largely divided into software and hardware parts. In the software structure, C-language is used. The Scale Invariant Feature Transform algorithm is used to extract feature points that are invariant to size and rotation from the image. Bit streams are generated from the extracted feature point. In the hardware architecture, the proposed image retrieval algorithm is written in Verilog HDL and designed and verified by FPGA and Design Compiler. The generated bit streams are stored, the clustering step is performed, and a searcher image databases or an input image databases are generated and matched. Using the proposed algorithm, we can improve convenience and satisfaction of the user in terms of speed if we search using database matching method which represents each object.

A Comparison of Superpixel Characteristics based on SLIC(Simple Linear Iterative Clustering) for Color Feature Spaces (칼라특징공간별 SLIC기반 슈퍼픽셀의 특성비교)

  • Lee, Jeong Hwan
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • 제10권4호
    • /
    • pp.151-160
    • /
    • 2014
  • In this paper, a comparison of superpixel characteristics based on SLIC(simple linear iterative clustering) for several color feature spaces is presented. Computer vision applications have come to rely increasingly on superpixels in recent years. Superpixel algorithms group pixels into perceptually meaningful atomic regions, which can be used to replace the rigid structure of the pixel grid. A superpixel is consist of pixels with similar features such as luminance, color, textures etc. Thus superpixels are more efficient than pixels in case of large scale image processing. Generally superpixel characteristics are described by uniformity, boundary precision and recall, compactness. However previous methods only generate superpixels a special color space but lack researches on superpixel characteristics. Therefore we present superpixel characteristics based on SLIC as known popular. In this paper, Lab, Luv, LCH, HSV, YIQ and RGB color feature spaces are used. Uniformity, compactness, boundary precision and recall are measured for comparing characteristics of superpixel. For computer simulation, Berkeley image database(BSD300) is used and Lab color space is superior to the others by the experimental results.

Cluster Feature Selection using Entropy Weighting and SVD (엔트로피 가중치 및 SVD를 이용한 군집 특징 선택)

  • Lee, Young-Seok;Lee, Soo-Won
    • Journal of KIISE:Software and Applications
    • /
    • 제29권4호
    • /
    • pp.248-257
    • /
    • 2002
  • Clustering is a method for grouping objects with similar properties into a same cluster. SVD(Singular Value Decomposition) is known as an efficient preprocessing method for clustering because of dimension reduction and noise elimination for a high dimensional and sparse data set like E-Commerce data set. However, it is hard to evaluate the worth of original attributes because of information loss of a converted data set by SVD. This research proposes a cluster feature selection method, called ENTROPY-SVD, to find important attributes for each cluster based on entropy weighting and SVD. Using SVD, one can take advantage of the latent structures in the association of attributes with similar objects and, using entropy weighting one can find highly dense attributes for each cluster. This paper also proposes a model-based collaborative filtering recommendation system with ENTROPY-SVD, called CFS-CF and evaluates its efficiency and utilization.

Improved FCM Algorithm using Entropy-based Weight and Intercluster (엔트로피 기반의 가중치와 분포크기를 이용한 향상된 FCM 알고리즘)

  • Kwak Hyun-Wook;Oh Jun-Taek;Sohn Young-Ho;Kim Wook-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • 제43권4호
    • /
    • pp.1-8
    • /
    • 2006
  • This paper proposes an improved FCM(Fuzzy C-means) algorithm using intercluster and entropy-based weight in gray image. The fuzzy clustering methods have been extensively used in the image segmentation since it extracts feature information of the region. Most of fuzzy clustering methods have used the FCM algorithm. But, FCM algorithm is still sensitive to noise, as it does not include spatial information. In addition, it can't correctly classify pixels according to the feature-based distributions of clusters. To solve these problems, we applied a weight and intercluster to the traditional FCM algorithm. A weight is obtained from the entropy information based on the cluster's number of neighboring pixels. And a membership for one pixel is given based on the information considering the feature-based intercluster. Experiments has confirmed that the proposed method was more tolerant to noise and superior to existing methods.

Health Risk Management using Feature Extraction and Cluster Analysis considering Time Flow (시간흐름을 고려한 특징 추출과 군집 분석을 이용한 헬스 리스크 관리)

  • Kang, Ji-Soo;Chung, Kyungyong;Jung, Hoill
    • Journal of the Korea Convergence Society
    • /
    • 제12권1호
    • /
    • pp.99-104
    • /
    • 2021
  • In this paper, we propose health risk management using feature extraction and cluster analysis considering time flow. The proposed method proceeds in three steps. The first is the pre-processing and feature extraction step. It collects user's lifelog using a wearable device, removes incomplete data, errors, noise, and contradictory data, and processes missing values. Then, for feature extraction, important variables are selected through principal component analysis, and data similar to the relationship between the data are classified through correlation coefficient and covariance. In order to analyze the features extracted from the lifelog, dynamic clustering is performed through the K-means algorithm in consideration of the passage of time. The new data is clustered through the similarity distance measurement method based on the increment of the sum of squared errors. Next is to extract information about the cluster by considering the passage of time. Therefore, using the health decision-making system through feature clusters, risks able to managed through factors such as physical characteristics, lifestyle habits, disease status, health care event occurrence risk, and predictability. The performance evaluation compares the proposed method using Precision, Recall, and F-measure with the fuzzy and kernel-based clustering. As a result of the evaluation, the proposed method is excellently evaluated. Therefore, through the proposed method, it is possible to accurately predict and appropriately manage the user's potential health risk by using the similarity with the patient.

ModifiedFAST: A New Optimal Feature Subset Selection Algorithm

  • Nagpal, Arpita;Gaur, Deepti
    • Journal of information and communication convergence engineering
    • /
    • 제13권2호
    • /
    • pp.113-122
    • /
    • 2015
  • Feature subset selection is as a pre-processing step in learning algorithms. In this paper, we propose an efficient algorithm, ModifiedFAST, for feature subset selection. This algorithm is suitable for text datasets, and uses the concept of information gain to remove irrelevant and redundant features. A new optimal value of the threshold for symmetric uncertainty, used to identify relevant features, is found. The thresholds used by previous feature selection algorithms such as FAST, Relief, and CFS were not optimal. It has been proven that the threshold value greatly affects the percentage of selected features and the classification accuracy. A new performance unified metric that combines accuracy and the number of features selected has been proposed and applied in the proposed algorithm. It was experimentally shown that the percentage of selected features obtained by the proposed algorithm was lower than that obtained using existing algorithms in most of the datasets. The effectiveness of our algorithm on the optimal threshold was statistically validated with other algorithms.

Music Classification Based On Emotion Utilizing Data Mining (데이터마이닝 기법을 이용한 감정 기반 음악 분류)

  • Jo, Wooyeon;Shon, Taeshik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2015년도 춘계학술발표대회
    • /
    • pp.941-944
    • /
    • 2015
  • 저장 장치의 급속한 발전으로 인해 기존에 서비스할 수 없었던 개인 사용자를 위한 클라우드 서비스가 활성화되고 있다. 이 중 음악을 대상으로 하는 스트리밍 및 공유 서비스는 다양한 음악의 종류를 수용하기 위해 체계적인 분류를 필요로 한다. 기존의 분류체계는 단순히 작곡가나 업로더의 의견에 의해서 일방적으로 정해지기 때문에 사용자가 중심이 되는 클라우드 서비스에는 어울리지 않는다. 따라서 본 논문은 이와 같은 문제점을 해결하기 위해 사랑의 감정을 기준으로 새로운 분류체계를 제안한다. 자동적인 분류를 위해 데이터마이닝 기법을 접목시켰으며, 원활한 마이닝을 위해 오디오 음악 파일(raw data)을 정해진 크기로 자르고 feature extraction을 통해 오디오 음악 파일에 대한 전처리를 수행하였다. 이후 feature selection을 수행하기 위해 clustering을 이용해 유효한 중요도를 지나는 feature를 선별하였으며 선별된 feature를 토대로 SVN(Support Vector Machine)을 이용해 feature의 중요도에 대한 유효성을 검증함과 동시에 분류를 수행하여 감정을 기반으로 분류한 결과를 보였다.

Multiview Data Clustering by using Adaptive Spectral Co-clustering (적응형 분광 군집 방법을 이용한 다중 특징 데이터 군집화)

  • Son, Jeong-Woo;Jeon, Junekey;Lee, Sang-Yun;Kim, Sun-Joong
    • Journal of KIISE
    • /
    • 제43권6호
    • /
    • pp.686-691
    • /
    • 2016
  • In this paper, we introduced the adaptive spectral co-clustering, a spectral clustering for multiview data, especially data with more than three views. In the adaptive spectral co-clustering, the performance is improved by sharing information from diverse views. For the efficiency in information sharing, a co-training approach is adopted. In the co-training step, a set of parameters are estimated to make all views in data maximally independent, and then, information is shared with respect to estimated parameters. This co-training step increases the efficiency of information sharing comparing with ordinary feature concatenation and co-training methods that assume the independence among views. The adaptive spectral co-clustering was evaluated with synthetic dataset and multi lingual document dataset. The experimental results indicated the efficiency of the adaptive spectral co-clustering with the performances in every iterations and similarity matrix generated with information sharing.

Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering (문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구)

  • Choi, Sang-Hee;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • 제29권1호
    • /
    • pp.331-349
    • /
    • 2012
  • Structured abstracts have been regarded as an essential information factor to represent topics of journal articles. This study aims to provide an unconventional view to utilize structured abstracts with the analysis on sub fields of a structured abstract in depth. In this study, a structured abstract was segmented into four fields, namely, purpose, design, findings, and values/implications. Each field was compared in the performance analysis of document clustering. In result, the purpose statement of an abstract affected on the performance of journal article clustering more than any other fields. Furthermore, certain types of keywords were identified to be excluded in the document clustering to improve clustering performance, especially by Within group average clustering method. These keywords had stronger relationship to a specific abstract field such as research design than the topic of an article.

The Method of the Evaluation of Verbal Lexical-Semantic Network Using the Automatic Word Clustering System (단어클러스터링 시스템을 이용한 어휘의미망의 활용평가 방안)

  • Kim, Hae-Gyung;Song, Mi-Young
    • Korean Journal of Oriental Medicine
    • /
    • 제12권3호통권18호
    • /
    • pp.1-15
    • /
    • 2006
  • For the recent several years, there has been much interest in lexical semantic network. However, it seems to be very difficult to evaluate the effectiveness and correctness of it and invent the methods for applying it into various problem domains. In order to offer the fundamental ideas about how to evaluate and utilize lexical semantic networks, we developed two automatic word clustering systems, which are called system A and system B respectively. 68,455,856 words were used to learn both systems. We compared the clustering results of system A to those of system B which is extended by the lexical-semantic network. The system B is extended by reconstructing the feature vectors which are used the elements of the lexical-semantic network of 3,656 '-ha' verbs. The target data is the 'multilingual Word Net-CoreNet'.When we compared the accuracy of the system A and system B, we found that system B showed the accuracy of 46.6% which is better than that of system A, 45.3%.

  • PDF