• Title/Summary/Keyword: 계층적 클러스터링 알고리즘

Search Result 100, Processing Time 0.023 seconds

An Efficient Text Mining method based on Domain Stopword Elimination (도메인 불용어 제거를 통한 효율적인 텍스트 마이닝 기법)

  • Song, Jae-Sun;Joo, Kil-Hong;Lee, Won-Suk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05c
    • /
    • pp.1523-1526
    • /
    • 2003
  • 정보 검색 분야에서 문서 클러스터링방법은 사용자에게 양질의 다양한 정보를 제공하기 위한 방법으로 이에 대한 많은 연구가 수행되었다. 피러나 기존의 문서클러스터링 방법들은 클러스터간의 포함관계를 나타내는 계층적 관계를 표현하지 않고 의미적으로만 비슷한 내용의 문서를 묶어 여러 개의 클러스터로 나타내었다. 이에 본 논문에서는 각 문서가 속하는 도메인 별로 불용어와 키워드를 추출하여 문서클러스터링에 적용하는 알고리즘을 제안한다.

  • PDF

Resource Clustering Simulator for Desktop Virtualization Based on Intra Cloud (인트라 클라우드 기반 데스크탑 가상화를 위한 리소스 클러스터링 시뮬레이터)

  • Kim, Hyun-Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.1
    • /
    • pp.45-50
    • /
    • 2019
  • With the gradual advancement of IT, passive work processes are automated and the overall quality of life has greatly improved. This is made possible by the formation of an organic topology between a wide variety of real-life smart devices. To serve these diverse smart devices, businesses or users are using the cloud. The services in the cloud are divided into Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). SaaS runs on PaaS, and PaaS runs on IaaS. Since IaaS is the basis of all services, an algorithm is required to operate virtualization resources efficiently. Among them, desktop resource virtualization is used for resource high availability of unused state time of existing desktop PC. Clustering of hierarchical structures is important for high availability of these resources. In addition, it is very important to select a suitable algorithm because many clustering algorithms are mainly used depending on the distribution ratio and environment of the desktop PC. If various attempts are made to find an algorithm suitable for desktop resource virtualization in an operating environment, a great deal of power, time, and manpower will be incurred. Therefore, this paper proposes a resource clustering simulator for cluster selection of desktop virtualization. This provides a clustering simulation to properly select clustering algorithms and apply elements in different environments of desktop PCs.

Design of Fuzzy System with Hierarchical Classifying Structures and its Application to Time Series Prediction (계층적 분류구조의 퍼지시스템 설계 및 시계열 예측 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.5
    • /
    • pp.595-602
    • /
    • 2009
  • Fuzzy rules, which represent the behavior of their system, are sensitive to fuzzy clustering techniques. If the classification abilities of such clustering techniques are improved, their systems can work for the purpose more accurately because the capabilities of the fuzzy rules and parameters are enhanced by the clustering techniques. Thus, this paper proposes a new hierarchically structured clustering algorithm that can enhance the classification abilities. The proposed clustering technique consists of two clusters based on correlationship and statistical characteristics between data, which can perform classification more accurately. In addition, this paper uses difference data sets to reflect the patterns and regularities of the original data clearly, and constructs multiple fuzzy systems to consider various characteristics of the differences suitably. To verify effectiveness of the proposed techniques, this paper applies the constructed fuzzy systems to the field of time series prediction, and performs prediction for nonlinear time series examples.

Analytical Study of Fuzzy Clustering Technique for Automatic Term Classification (용어 자동분류를 위한 퍼지 클러스터링 기법 분석)

  • 한승희
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2003.08a
    • /
    • pp.95-103
    • /
    • 2003
  • 목차 및 권말색인과 같은 인쇄형태의 정보내용에 대한 구조화된 접근방식에서 착안하여 전자 문서의 내용에 대한 새로운 형태의 접근방식을 개발할 수 있는데, 이를 위한 방안으로 용어 자동분류 기법이 있다. 본 연구에서는 용어의 의미모호성 문제를 해결하는 동시에 용어간 계층관계 표현이 가능한 자동분류 기법으로 퍼지 클러스터링 기법을 제안하고, 대표적인 퍼지 클러스터링 알고리즘인 퍼지 c-means 기법에 대해 분석하고자 한다.

  • PDF

A Comparative Study on Clustering Methods for Grouping Related Tags (연관 태그의 군집화를 위한 클러스터링 기법 비교 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.399-416
    • /
    • 2009
  • In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

CACHE:Context-aware Clustering Hierarchy and Energy efficient for MANET (CACHE:상황인식 기반의 계층적 클러스터링 알고리즘에 관한 연구)

  • Mun, Chang-min;Lee, Kang-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.571-573
    • /
    • 2009
  • Mobile Ad-hoc Network(MANET) needs efficient node management because the wireless network has energy constraints. Mobility of MANET would require the topology change frequently compared with a static network. To improve the routing protocol in MANET, energy efficient routing protocol would be required as well as considering the mobility would be needed. Previously proposed a hybrid routing CACH prolong the network lifetime and decrease latency. However the algorithm has a problem when node density is increase. In this paper, we propose a new method that the CACHE(Context-aware Clustering Hierarchy and Energy efficient) algorithm. The proposed analysis could not only help in defining the optimum depth of hierarchy architecture CACH utilize, but also improve the problem about node density.

  • PDF

Fixed Partitioning Methods for Extending lifetime of sensor node for Wireless Sensor Networks (WSN환경에서 센서노드의 생명주기 연장을 위한 고정 분할 기법)

  • Han, Chang-Su;Cho, Young-Bok;Woo, Sung-Hee;Lee, Sang-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.5
    • /
    • pp.942-948
    • /
    • 2016
  • WSN based on wireless sensor nodes, Sensor nodes can not be reassigned and recharged if they once placed. Each sensor node comes into being involved to a communication network with its limited energy. But the existing proposed clustering techniques, being applied to WSN environment with irregular dispersion of sensor nodes, have the network reliability issues which bring about a communication interruption with the local node feature of unbalanced distribution in WSN. Therefore, the communications participation of the sensor nodes in the suggested algorithm is extended by 25% as the sensor field divided in the light of the non-uniformed distribution of sensor nodes and a static or a dynamic clustering algorithm adopted according to its partition of sensor node density in WSN. And the entire network life cycle was extended by 14% to ensure the reliability of the network.

A Study on the Construction of Stable Clustering by Minimizing the Order Bias (순서 바이어스 최소화에 의한 안정적 클러스터링 구축에 관한 연구)

  • Lee, Gye-Seong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1571-1580
    • /
    • 1999
  • When a hierarchical structure is derived from data set for data mining and machine learning, using a conceptual clustering algorithm, one of the unsupervised learning paradigms, it is not unusual to have a different set of outcomes with respect to the order of processing data objects. To overcome this problem, the first classification process is proceeded to construct an initial partition. The partition is expected to imply the possible range in the number of final classes. We apply center sorting to the data objects in the classes of the partition for new data ordering and build a new partition using ITERATE clustering procedure. We developed an algorithm, REIT that leads to the final partition with stable and best partition score. A number of experiments were performed to show the minimization of order bias effects using the algorithm.

  • PDF

Evolutionary Computation-based Hybird Clustring Technique for Manufacuring Time Series Data (제조 시계열 데이터를 위한 진화 연산 기반의 하이브리드 클러스터링 기법)

  • Oh, Sanghoun;Ahn, Chang Wook
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.23-30
    • /
    • 2021
  • Although the manufacturing time series data clustering technique is an important grouping solution in the field of detecting and improving manufacturing large data-based equipment and process defects, it has a disadvantage of low accuracy when applying the existing static data target clustering technique to time series data. In this paper, an evolutionary computation-based time series cluster analysis approach is presented to improve the coherence of existing clustering techniques. To this end, first, the image shape resulting from the manufacturing process is converted into one-dimensional time series data using linear scanning, and the optimal sub-clusters for hierarchical cluster analysis and split cluster analysis are derived based on the Pearson distance metric as the target of the transformation data. Finally, by using a genetic algorithm, an optimal cluster combination with minimal similarity is derived for the two cluster analysis results. And the performance superiority of the proposed clustering is verified by comparing the performance with the existing clustering technique for the actual manufacturing process image.