• Title/Summary/Keyword: 계층적 클러스터링

Search Result 244, Processing Time 0.026 seconds

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

A Design of Building a Meaningful Tag Cluster (의미 있는 태그 클러스터 구축을 위한 설계 방안)

  • Park, Byoung-Jae;Woo, Chong-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.658-661
    • /
    • 2008
  • 태깅은 웹 2.0의 핵심 기술 중 하나로, 매우 유연하고 역동적인 분류 체계를 제공한다. 하지만 유연성과 역동성의 확보에 의해 계층 구조나 연관 관계와 같은 태그의 관계성이 부족하거나 존재하지 않는 한계점을 가지고 있는 것 또한 사실이다. 이런 한계점을 보완하기 위한 방법으로 계층 관계를 형성하기 위한 계층 클러스터링 방법과, 연관 관계를 형성하기 위한 협업 필터링 방법이 존재한다. 이 두 가지 방법은 태그의 관계성을 제공하지만, 연관 관계와 계층 관계 중 하나만 제공한다는 단점을 가진다. 본 논문에서는 태그 검색 시 연관 관계뿐 아니라 계층 구조의 탐색을 제공해주기 위한 태그 클러스터링 알고리즘을 설계하였다. 제안한 알고리즘은 사용자 태그셋을 활용하여 태그의 유사성을 계산하는 방법을 제시하고, 기존의 시각화 방법(태그 구름)과 다른 새로운 형태로 시각화 할 수 있는 결과 데이터를 제공한다.

A Scalable Clustering Method for Categorical Sequences (범주형 시퀀스들에 대한 확장성 있는 클러스터링 방법)

  • Oh, Seung-Joon;Kim, Jae-Yearn
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.136-141
    • /
    • 2004
  • There has been enormous growth in the amount of commercial and scientific data, such as retail transactions, protein sequences, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few clustering algorithms consider sequentiality. In this paper, we study how to cluster sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. We also present an efficient method for determining the similarity measure and develop a clustering algorithm. Due to the high computational complexity of hierarchical clustering algorithms for clustering large datasets, a new clustering method is required. Therefore, we propose a new scalable clustering method using sampling and a k-nearest-neighbor method. Using a real dataset and a synthetic dataset, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional algorithms.

An Energy-Efficient Data Aggregation using Hierarchical Filtering in Sensor Network (센서 네트워크에서 계층적 필터링을 이용한 에너지 효율적인 데이터 집계연산)

  • Kim, Jin-Su;Park, Chan-Heum;Kim, Chong-Gun;Kang, Byung-Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.1 s.45
    • /
    • pp.73-82
    • /
    • 2007
  • This paper proposes how to reduce the amount of data transmitted in each sensor and cluster head in order to lengthen the lifetime of sensor network by data aggregation of the continuous queries. The most important factor of refuting the sensor's energy dissipation is to reduce the amount of messages transmitted. The method proposed is basically to combine clustering, in-network data aggregation and hierarchical filtering. Hierarchical filtering is to divide sensor network by two tiers when filtering it. First tier performs filtering when transmitting the data from cluster member to cluster head, and second tier performs filtering when transmitting the data from cluster head to base station. This method is much more efficient and effective than the previous work. We show through various experiments that our scheme reduces the network traffic significantly and increases the network's lifetime than existing methods.

  • PDF

Relevance Feedback Method of an Extended Boolean Model using Hierarchical Clustering Techniques (계층적 클러스터링 기법을 이용한 확장 불리언 모델의 적합성 피드백 방법)

  • 최종필;김민구
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1374-1385
    • /
    • 2004
  • The relevance feedback process uses information obtained from a user about an initially retrieved set of documents to improve subsequent search formulations and retrieval performance. In the extended Boolean model, the relevance feedback Implies not only that new query terms must be identified, but also that the terms must be connected with the Boolean AND/OR operators properly Salton et al. proposed a relevance feedback method for the extended Boolean model, called the DNF (disjunctive normal form) method. However, this method has a critical problem in generating a reformulated queries. In this study, we investigate the problem of the DNF method and propose a relevance feedback method using hierarchical clustering techniques to solve the problem. We show the results of experiments which are performed on two data sets: the DOE collection in TREC 1 and the Web TREC 10 collection.

A Clustering Method for Optimizing Spatial Locality (공간국부성을 최적화하는 클러스터링 방법)

  • 김홍기
    • Journal of KIISE:Databases
    • /
    • v.31 no.2
    • /
    • pp.83-90
    • /
    • 2004
  • In this paper, we study the CCD(Clustering with Circular Distance) and the COD(Clustering with Obstructed Distance) problems to be considered when objects are being clustered in a circularly search space and a search space with the presence of obstacles. We also propose a now clustering algorithm for clustering efficiently objects that the insertion or the deletion is occurring frequently in multi-dimensional search space. The distance function for solving the CCD and COD Problems is defined in the Proposed clustering algorithm. This algorithm is included a clustering method to create clusters that have a high spatial locality by minimum computation time.

Probability-based Deep Learning Clustering Model for the Collection of IoT Information (IoT 정보 수집을 위한 확률 기반의 딥러닝 클러스터링 모델)

  • Jeong, Yoon-Su
    • Journal of Digital Convergence
    • /
    • v.18 no.3
    • /
    • pp.189-194
    • /
    • 2020
  • Recently, various clustering techniques have been studied to efficiently handle data generated by heterogeneous IoT devices. However, existing clustering techniques are not suitable for mobile IoT devices because they focus on statically dividing networks. This paper proposes a probabilistic deep learning-based dynamic clustering model for collecting and analyzing information on IoT devices using edge networks. The proposed model establishes a subnet by applying the frequency of the attribute values collected probabilistically to deep learning. The established subnets are used to group information extracted from seeds into hierarchical structures and improve the speed and accuracy of dynamic clustering for IoT devices. The performance evaluation results showed that the proposed model had an average 13.8 percent improvement in data processing time compared to the existing model, and the server's overhead was 10.5 percent lower on average than the existing model. The accuracy of extracting IoT information from servers has improved by 8.7% on average from previous models.

Extraction of higher yeast protein-protein interaction with hierarchical clustering from textual data (계층적 군집화를 통한 이스트(Yeast) 단백질의 고차 상호작용 추출)

  • 엄재홍;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.364-366
    • /
    • 2002
  • 본 논문에서는 텍스트 형태로 구성된 특정 생물에 대한 문헌 데이터에서 해당 생물의 주요 단백질간의 이진(binary) 관계를 추출하여 이들을 특징별로 계층적으로 군집화 함으로써 특정 현상을 나타내는 단백질간의 주요 관계를 추출하는 방법을 제시한다. 텍스트 데이터에서 단백질간의 이진관계는 기본적인 데이터마이닝 기법을 사용하여 연관규칙(association rule)의 형태로 추출하게 된다. 본 논문에서는 실험을 위해 PUBMED에서 추출한 Yeast의 주요 단백질간의 관계를 포함하고 있는 논문 데이터인 MEDLINE Abstract와 몇몇 공개 데이터베이스를 사용하였다. 실험 결과 SH3와 같이 기존에 알려진 단백질간의 단일 관계를 추출하는 것 이외에 이러한 관계들을 이용하여 클러스터링을 행한 결과 공통 현상에 작용하는 주요 단백질간의 관계들이 서로 군집화 됨을 확인 할 수 있었다. 또한 단순 이진관계가 아닌 클러스터링을 이용한 보다 상위 단계에서 단순 규칙들 간의 관계를 살펴봄으로써 단백질간의 이진관계를 추출하기 위한 데이터로 사용한 문헌 데이터에 나타나 있지 않은 1차 이상의 관계를 고찰 해 볼 수 있었다. 논문에서는 규칙 추출의 전체 과정과 함께 사용된 추출 시스템의 각 부와 데이터에 대한 설명을 다룬다.

  • PDF

i-LEACH : Head-node Constrained Clustering Algorithm for Randomly-Deployed WSN (i-LEACH : 랜덤배치 고정형 WSN에서 헤더수 고정 클러스터링 알고리즘)

  • Kim, Chang-Joon;Lee, Doo-Wan;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.198-204
    • /
    • 2012
  • Generally, the clustering of sensor nodes in WSN is a useful mechanism that helps to cope with scalability problem and, if combined with network data aggregation, may increase the energy efficiency of the network. The Hierarchical clustering routing algorithm is a typical algorithm for enhancing overall energy efficiency of network, which selects cluster-head in order to send the aggregated data arriving from the node in cluster to a base station. In this paper, we propose the improved-LEACH that uses comparably simple and light-weighted policy to select cluster-head nodes, which results in reduction of the clustering overhead and overall power consumption of network. By using fine-grained power model, the simulation results show that i-LEACH can reduce clustering overhead compared with the well-known previous works such as LEACH. As result, i-LEACH algorithm and LEACH algorithm was compared, network power-consumption of i-LEACH algorithm was improved than LEACH algorithm with 25%, and network-traffic was improved 16%.

An Efficient Cluster Header Election Technique in Zigbee Environments (Zigbee환경에서 효율적인 Cluster Header 선출 기법)

  • Lee, Joo-Hyun;Lee, Kyung-Hwa;Lee, Jun-Bok;Shin, Yong-Tae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.346-350
    • /
    • 2010
  • Since sensor nodes have restriction of using resources in Zigbee network, number of study on improving efficiency is currently ongoing[1]. Clustering mechanism based on hierarchy structure provides a prevention of duplicated information and a facility of a network expansion[2]. however overheads can occurs when the cluster header is elected and the election of a incorrect cluster header causes to use resources inefficiently. In this paper, we propose that the cluster header election mechanism using distances between nodes and density of nodes in accordance with the operation of the central processing system in which the sync nodes are having information of location and energy with respect to general nodes based on hierachy clustering mechanism.