• Title/Summary/Keyword: Post Clustering

Search Result 70, Processing Time 0.026 seconds

A Post Web Document Clustering Algorithm (후처리 웹 문서 클러스터링 알고리즘)

  • Im, Yeong-Hui
    • The KIPS Transactions:PartB
    • /
    • v.9B no.1
    • /
    • pp.7-16
    • /
    • 2002
  • The Post-clustering algorithms, which cluster the results of Web search engine, have several different requirements from conventional clustering algorithms. In this paper, we propose the new post-clustering algorithm satisfying those requirements as many as possible. The proposed Concept ART is the form of combining the concept vector that have several advantages in document clustering with Fuzzy ART known as real-time clustering algorithms. Moreover we show that it is applicable to general-purpose clustering as well as post-clustering

Fuzzy Clustering Algorithm for Web-mining (웹마이닝을 위한 퍼지 클러스터링 알고리즘)

  • Lim, Young-Hee;Song, Ji-Young;Park, Dai-Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.3
    • /
    • pp.219-227
    • /
    • 2002
  • The post-clustering algorithms, which cluster the result of Web search engine, have some different requirements from conventional clustering algorithms. In this paper, we propose the new post-clustering algorithm satisfying those of requirements as many as possible. The proposed fuzzy Concept ART is the form of combining the concept vector having several advantages in document clustering with fuzzy ART known as real time clustering algorithms on the basis of fuzzy set theory. Moreover we show that it can be applicable to general-purpose clustering as well as post clustering.

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.33-40
    • /
    • 2003
  • Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.

Development of Clustering Algorithm and Tool for DNA Microarray Data (DNA 마이크로어레이 데이타의 클러스터링 알고리즘 및 도구 개발)

  • 여상수;김성권
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.544-555
    • /
    • 2003
  • Since the result data from DNA microarray experiments contain a lot of gene expression information, adequate analysis methods are required. Hierarchical clustering is widely used for analysis of gene expression profiles. In this paper, we study leaf-ordering, which is a post-processing for the dendrograms output by hierarchical clusterings to improve the efficiency of DNA microarray data analysis. At first, we analyze existing leaf-ordering algorithms and then present new approaches for leaf-ordering. And we introduce a software HCLO(Hierarchical Clustering & Leaf-Ordering Tool) that is our implementation of hierarchical clustering, some of existing leaf-ordering algorithms and those presented in this paper.

Novel Techniques for Real Time Computing Critical Clearing Time SIME-B and CCS-B

  • Dinh, Hung Nguyen;Nguyen, Minh Y.;Yoon, Yong Tae
    • Journal of Electrical Engineering and Technology
    • /
    • v.8 no.2
    • /
    • pp.197-205
    • /
    • 2013
  • Real time transient stability assessment mainly depends on real-time prediction. Unfortunately, conventional techniques based on offline analysis are too slow and unreliable in complex power systems. Hence, fast and reliable stability prediction methods and simple stability criterions must be developed for real time purposes. In this paper, two new methods for real time determining critical clearing time based on clustering identification are proposed. This article is covering three main sections: (i) clustering generators and recognizing critical group; (ii) replacing the multi-machine system by a two-machine dynamic equivalent and eventually, to a one-machine-infinite-bus system; (iii) presenting a new method to predict post-fault trajectory and two simple algorithms for calculating critical clearing time, respectively established upon two different transient stability criterions. The performance is expected to figure out critical clearing time within 100ms-150ms and with an acceptable accuracy.

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

A New Learning Algorithm for Neuro-Fuzzy Modeling Using Self-Constructed Clustering

  • Kim, Sung-Suk;Kwak, Keun-Chang;Kim, Sung-Soo;Ryu, Jeong-Woong
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1254-1259
    • /
    • 2005
  • In this paper, we proposed a learning algorithm for the neuro-fuzzy modeling using a learning rule to adapt clustering. The proposed algorithm includes the data partition, assigning the rule into the process of partition, and optimizing the parameters using predetermined threshold value in self-constructing algorithm. In order to improve the clustering, the learning method of neuro-fuzzy model is extended and the learning scheme has been modified such that the learning of overall model is extended based on the error-derivative learning. The effect of the proposed method is presented using simulation compare with previous ones.

  • PDF

Computational Approach for the Analysis of Post-PKS Glycosylation Step

  • Kim, Ki-Bong;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.6 no.4
    • /
    • pp.223-226
    • /
    • 2008
  • We introduce a computational approach for analysis of glycosylation in Post-PKS tailoring steps. It is a computational method to predict the deoxysugar biosynthesis unit pathway and the substrate specificity of glycosyltransferases involved in the glycosylation of polyketides. In this work, a directed and weighted graph is introduced to represent and predict the deoxysugar biosynthesis unit pathway. In addition, a homology based gene clustering method is used to predict the substrate specificity of glycosyltransferases. It is useful for the rational design of polyketide natural products, which leads to in silico drug discovery.

Image Deduplication Based on Hashing and Clustering in Cloud Storage

  • Chen, Lu;Xiang, Feng;Sun, Zhixin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1448-1463
    • /
    • 2021
  • With the continuous development of cloud storage, plenty of redundant data exists in cloud storage, especially multimedia data such as images and videos. Data deduplication is a data reduction technology that significantly reduces storage requirements and increases bandwidth efficiency. To ensure data security, users typically encrypt data before uploading it. However, there is a contradiction between data encryption and deduplication. Existing deduplication methods for regular files cannot be applied to image deduplication because images need to be detected based on visual content. In this paper, we propose a secure image deduplication scheme based on hashing and clustering, which combines a novel perceptual hash algorithm based on Local Binary Pattern. In this scheme, the hash value of the image is used as the fingerprint to perform deduplication, and the image is transmitted in an encrypted form. Images are clustered to reduce the time complexity of deduplication. The proposed scheme can ensure the security of images and improve deduplication accuracy. The comparison with other image deduplication schemes demonstrates that our scheme has somewhat better performance.