• 제목/요약/키워드: and clustering

검색결과 5,610건 처리시간 0.036초

Refinement of Document Clustering by Using NMF

  • Shinnou, Hiroyuki;Sasaki, Minoru
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.430-439
    • /
    • 2007
  • In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cut (Mcut), which is a powerful spectral clustering method, and then refine the result via NMF. Finally we should obtain an accurate clustering result. However, NMF often fails to improve the given clustering result. To overcome this problem, we use the Mcut object function to stop the iteration of NMF.

  • PDF

Web Image Clustering with Text Features and Measuring its Efficiency

  • Cho, Soo-Sun
    • 한국멀티미디어학회논문지
    • /
    • 제10권6호
    • /
    • pp.699-706
    • /
    • 2007
  • This article is an approach to improving the clustering of Web images by using high-level semantic features from text information relevant to Web images as well as low-level visual features of image itself. These high-level text features can be obtained from image URLs and file names, page titles, hyperlinks, and surrounding text. As a clustering algorithm, a self-organizing map (SOM) proposed by Kohonen is used. To evaluate the clustering efficiencies of SOMs, we propose a simple but effective measure indicating the accumulativeness of same class images and the perplexities of class distributions. Our approach is to advance the existing measures through defining and using new measures accumulativeness on the most superior clustering node and concentricity to evaluate clustering efficiencies of SOMs. The experimental results show that the high-level text features are more useful in SOM-based Web image clustering.

  • PDF

효율적인 개념 클러스터링 기법 (An Efficient Conceptual Clustering Scheme)

  • 양기철
    • 한국엔터테인먼트산업학회논문지
    • /
    • 제14권4호
    • /
    • pp.349-354
    • /
    • 2020
  • 본 논문에서는 개체를 자유롭게 설명하고 효율적으로 클러스터링을 수행 할 수 있는 개념 그래프 기반의 새로운 클러스터링 체계 Clustering scheme Based on Conceptual graphs(CBC)를 제안한다. 개념적 클러스터링은 기계 학습 기술 중 하나이다. 개념 클러스터링에서 개체 간의 유사성은 개체의 의미나 환경을 고려하지 않고 유사성을 결정하는 일반적인 클러스터링 체계와 달리 개념 구성원의 자격에 따라 결정된다. 이 논문에서는 다양한 개체를 개념 그래프로 자유롭게 설명하여 효율적인 개념 클러스터링을 수행 할 수 있는 새로운 개념 클러스터링 체계인 CBC를 소개한다.

시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링 (Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure)

  • 오승준;김재련
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2004년도 추계학술대회
    • /
    • pp.221-229
    • /
    • 2004
  • Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

  • PDF

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 추계학술대회
    • /
    • pp.59-69
    • /
    • 2005
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF

Zone-Based Self-Organized Clustering with Byzantine Agreement in MANET

  • Sung, Soon-Hwa
    • Journal of Communications and Networks
    • /
    • 제10권2호
    • /
    • pp.221-227
    • /
    • 2008
  • The proposed zone-based self-organized clustering broadcasts neighbor information to only a zone with the same ID. Besides, the zone-based self-organized clustering with unique IDs can communicate securely even if the state transition of nodes in zone-based self-organized clustering is threatened by corrupted nodes. For this security, the Byzantine agreement protocol with proactive asynchronous verifiable secret sharing (AVSS) is considered. As a result of simulation, an efficiency and a security of the proposed clustering are better than those of a traditional clustering. Therefore, this paper describes a new and extended self-organized clustering that securely seeks to minimize the interference in mobile ad hoc networks (MANETs).

군집의 효율향상을 위한 휴리스틱 알고리즘 (Heuristic algorithm to raise efficiency in clustering)

  • 이석환;박승헌
    • 대한안전경영과학회지
    • /
    • 제11권3호
    • /
    • pp.157-166
    • /
    • 2009
  • In this study, we developed a heuristic algorithm to get better efficiency of clustering than conventional algorithms. Conventional clustering algorithm had lower efficiency of clustering as there were no solid method for selecting initial center of cluster and as they had difficulty in search solution for clustering. EMC(Expanded Moving Center) heuristic algorithm was suggested to clear the problem of low efficiency in clustering. We developed algorithm to select initial center of cluster and search solution systematically in clustering. Experiments of clustering are performed to evaluate performance of EMC heuristic algorithm. Squared-error of EMC heuristic algorithm showed better performance for real case study and improved greatly with increase of cluster number than the other ones.

Clustering Routing Algorithms In Wireless Sensor Networks: An Overview

  • Liu, Xuxun;Shi, Jinglun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권7호
    • /
    • pp.1735-1755
    • /
    • 2012
  • Wireless sensor networks (WSNs) are becoming increasingly attractive for a variety of applications and have become a hot research area. Routing is a key technology in WSNs and can be coarsely divided into two categories: flat routing and hierarchical routing. In a flat topology, all nodes perform the same task and have the same functionality in the network. In contrast, nodes in a hierarchical topology perform different tasks in WSNs and are typically organized into lots of clusters according to specific requirements or metrics. Owing to a variety of advantages, clustering routing protocols are becoming an active branch of routing technology in WSNs. In this paper, we present an overview on clustering routing algorithms for WSNs with focus on differentiating them according to diverse cluster shapes. We outline the main advantages of clustering and discuss the classification of clustering routing protocols in WSNs. In particular, we systematically analyze the typical clustering routing protocols in WSNs and compare the different approaches based on various metrics. Finally, we conclude the paper with some open questions.

Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System

  • Zolkepli, Maslina;Dong, Fangyan;Hirota, Kaoru
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제14권4호
    • /
    • pp.256-267
    • /
    • 2014
  • An automatic switch among ensembles of clustering algorithms is proposed as a part of the bibliographic big data retrieval system by utilizing a fuzzy inference engine as a decision support tool to select the fastest performing clustering algorithm between fuzzy C-means (FCM) clustering, Newman-Girvan clustering, and the combination of both. It aims to realize the best clustering performance with the reduction of computational complexity from O($n^3$) to O(n). The automatic switch is developed by using fuzzy logic controller written in Java and accepts 3 inputs from each clustering result, i.e., number of clusters, number of vertices, and time taken to complete the clustering process. The experimental results on PC (Intel Core i5-3210M at 2.50 GHz) demonstrates that the combination of both clustering algorithms is selected as the best performing algorithm in 20 out of 27 cases with the highest percentage of 83.99%, completed in 161 seconds. The self-adapted FCM is selected as the best performing algorithm in 4 cases and the Newman-Girvan is selected in 3 cases.The automatic switch is to be incorporated into the bibliographic big data retrieval system that focuses on visualization of fuzzy relationship using hybrid approach combining FCM and Newman-Girvan algorithm, and is planning to be released to the public through the Internet.

소비자 선호 이슈 및 R&D 관점에서의 다차원 이슈 클러스터링 (A Multi-Dimensional Issue Clustering from the Perspective Consumers' Interests and R&D)

  • 현윤진;김남규;조윤호
    • 한국IT서비스학회지
    • /
    • 제14권1호
    • /
    • pp.237-249
    • /
    • 2015
  • The volume of unstructured text data generated by various social media has been increasing rapidly; therefore, use of text mining to support decision making has also been increasing. Especially, issue Clustering-determining a new relation with various issues through clustering-has gained attention from many researchers. However, traditional issue clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be discovered using traditional issue clustering methods, even if those issues are strongly related in other perspectives. Therefore, issue clustering that fits each of criteria needs to be performed by the perspective of analysis and the purpose of use. In this study, a multi-dimensional issue clustering is proposed to overcome the limitation of traditional issue clustering. We assert, specifically in this study, that issue clustering should be performed for a particular purpose. We analyze the results of applying our methodology to two specific perspectives on issue clustering, (i) consumers' interests, and (ii) related R&D terms.