• Title/Summary/Keyword: Hierarchical Clustering

Search Result 552, Processing Time 0.033 seconds

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

A Study on Cluster Hierarchy Depth in Hierarchical Clustering (계층적 클러스터링에서 분류 계층 깊이에 관한 연구)

  • Jin, Hai-Nan;Lee, Shin-won;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.673-676
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.

  • PDF

Deduction of Acupoints Selecting Elements on Zhenjiuzishengjing using hierarchical clustering (계층적 군집분석(hierarchical clustering)을 통한 침구자생경(鍼灸資生經) 경혈 선택 요인 분석)

  • Oh, Junho
    • Journal of Haehwa Medicine
    • /
    • v.23 no.1
    • /
    • pp.115-124
    • /
    • 2014
  • Objectives : There are plenty of medical record of acupuncture & moxibustion in Traditional East Asian medicine(TEAM). We performed this study to find out the hidden criteria lies on this record to choose proper acupoints. Methods : "Zhenjiuzishengjing", ancient TEAM book was analysed using document clustering techniques. Corpus was made from this book. It contained 196 texts driven from each symptoms. Each texts converted to vector representing frequency of 349 acupoints. Distance of vectors calculated by weighted Euclidean distance method. According to this distances, hierarchical clustering of symptoms was builded. Results : The cluster consisted of five large groups. they had high corelation with body part; head and face, chest, abdomen, upper extremity, lower extremity, back. Conclusions : It assumes that body part of symptom is the most importance criteria of acupoints selecting. some high similar symptom vectors consolidated this result. the other criteria is cause and pathway of illness. some symptoms bound together which had common cause and pathway.

Classification of network packets using hierarchical clustering (Hierarchical Clustering을 이용한 네트워크 패킷의 분류)

  • Yeo, Insung;Hai, Quan Tran;Hwang, Seong Oun
    • Journal of Internet of Things and Convergence
    • /
    • v.3 no.1
    • /
    • pp.9-11
    • /
    • 2017
  • Recently, with the widespread use of the Internet and mobile devices, the number of attacks by hackers using the network is increasing. When connecting a network, packets are exchanged and communicated, which includes various information. We analyze the information of these packets using hierarchical clustering analysis and classify normal and abnormal packets to detect attacks. With this analysis method, it will be possible to detect attacks by analyzing new packets.

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

  • 류중원;조성배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.187-190
    • /
    • 2001
  • Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.

  • PDF

Customer Load Pattern Analysis using Clustering Techniques (클러스터링 기법을 이용한 수용가별 전력 데이터 패턴 분석)

  • Ryu, Seunghyoung;Kim, Hongseok;Oh, Doeun;No, Jaekoo
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.1
    • /
    • pp.61-69
    • /
    • 2016
  • Understanding load patterns and customer classification is a basic step in analyzing the behavior of electricity consumers. To achieve that, there have been many researches about clustering customers' daily load data. Nowadays, the deployment of advanced metering infrastructure (AMI) and big-data technologies make it easier to study customers' load data. In this paper, we study load clustering from the view point of yearly and daily load pattern. We compare four clustering methods; K-means clustering, hierarchical clustering (average & Ward's method) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). We also discuss the relationship between clustering results and Korean Standard Industrial Classification that is one of possible labels for customers' load data. We find that hierarchical clustering with Ward's method is suitable for clustering load data and KSIC can be well characterized by daily load pattern, but not quite well by yearly load pattern.

On the clustering of huge categorical data

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1353-1359
    • /
    • 2010
  • Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.

Magnetoencephalography Interictal Spike Clustering in Relation with Surgical Outcome of Cortical Dysplasia

  • Jeong, Woorim;Chung, Chun Kee;Kim, June Sic
    • Journal of Korean Neurosurgical Society
    • /
    • v.52 no.5
    • /
    • pp.466-471
    • /
    • 2012
  • Objective : The aim of this study was to devise an objective clustering method for magnetoencephalography (MEG) interictal spike sources, and to identify the prognostic value of the new clustering method in adult epilepsy patients with cortical dysplasia (CD). Methods : We retrospectively analyzed 25 adult patients with histologically proven CD, who underwent MEG examination and surgical resection for intractable epilepsy. The mean postoperative follow-up period was 3.1 years. A hierarchical clustering method was adopted for MEG interictal spike source clustering. Clustered sources were then tested for their prognostic value toward surgical outcome. Results : Postoperative seizure outcome was Engel class I in 6 (24%), class II in 3 (12%), class III in 12 (48%), and class IV in 4 (16%) patients. With respect to MEG spike clustering, 12 of 25 (48%) patients showed 1 cluster, 2 (8%) showed 2 or more clusters within the same lobe, 10 (40%) showed 2 or more clusters in a different lobe, and 1 (4%) patient had only scattered spikes with no clustering. Patients who showed focal clustering achieved better surgical outcome than distributed cases (p=0.017). Conclusion : This is the first study that introduces an objective method to classify the distribution of MEG interictal spike sources. By using a hierarchical clustering method, we found that the presence of focal clustered spikes predicts a better postoperative outcome in epilepsy patients with CD.

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

  • 황인수
    • Korean Management Science Review
    • /
    • v.19 no.1
    • /
    • pp.189-196
    • /
    • 2002
  • Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.