• Title/Summary/Keyword: Graph Clustering

Search Result 133, Processing Time 0.031 seconds

Web Document Clustering based on Graph using Hyperlinks (하이퍼링크를 이용한 그래프 기반의 웹 문서 클러스터링)

  • Lee, Joon;Kang, Jin-Beom;Choi, Joong-Min
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.590-595
    • /
    • 2009
  • With respect to the exponential increment of web documents on the internet, it is important how to improve performance of clustering method for web documents. Web document clustering techniques can offer accurate information and fast information retrieval by clustering web documents through semantic relationship. The clustering method based on mesh-graph provides high recall by calculating similarity for documents, but it requires high computation cost. This paper proposes a clustering method using hyperlinks which is structural feature of web documents in order to keep effectiveness and reduce computation cost.

  • PDF

The Graph Partition Problem (그래프분할문제)

  • 명영수
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.28 no.4
    • /
    • pp.131-143
    • /
    • 2003
  • In this paper, we present a survey about the various graph partition problems including the clustering problem, the k-cut problem, the multiterminal cut problem, the multicut problem, the sparsest cut problem, the network attack problem, the network disconnection problem. We compare those problems focusing on the problem characteristics such as the objective function and the conditions that the partitioned clusters should satisfy. We also introduce the mathematical programming formulations, and the solution approaches developed for the problems.

NOGSEC: A NOnparametric method for Genome SEquence Clustering (녹섹(NOGSEC): A NOnparametric method for Genome SEquence Clustering)

  • 이영복;김판규;조환규
    • Korean Journal of Microbiology
    • /
    • v.39 no.2
    • /
    • pp.67-75
    • /
    • 2003
  • One large topic in comparative genomics is to predict functional annotation by classifying protein sequences. Computational approaches for function prediction include protein structure prediction, sequence alignment and domain prediction or binding site prediction. This paper is on another computational approach searching for sets of homologous sequences from sequence similarity graph. Methods based on similarity graph do not need previous knowledges about sequences, but largely depend on the researcher's subjective threshold settings. In this paper, we propose a genome sequence clustering method of iterative testing and graph decomposition, and a simple method to calculate a strict threshold having biochemical meaning. Proposed method was applied to known bacterial genome sequences and the result was shown with the BAG algorithm's. Result clusters are lacking some completeness, but the confidence level is very high and the method does not need user-defined thresholds.

A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering (Balanced Canopy Clustering에 기반한 일반적 k-인접 이웃 그래프 생성 알고리즘)

  • Park, Youngki;Hwang, Heasoo;Lee, Sang-Goo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.4
    • /
    • pp.327-332
    • /
    • 2015
  • Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.

Consistent Triplets of Candidate Paralogs by Graph Clustering

  • Yun, Hwa-Seob;Muchnik, Ilya;Kulikowski, Casimir
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.156-160
    • /
    • 2005
  • We introduce a fully automatic clustering method to classier candidate paralog clusters from a set of protein sequences within one genome. A set of protein sequences is represented as a set of nodes, each represented by the amino acid sequence for a protein with the sequence similarities among them constituting a set of edges in a graph of protein relationships. We use graph-based clustering methods to identify structurally consistent sets of nodes which are strongly connected with each other. Our results are consistent with those from current leading systems such as COG/KOG and KEGG based on manual curation. All the results are viewable at http://www.cs.rutgers.edu/${\sim}$seabee.

  • PDF

Graph Coloring based Clustering Algorithm for Wireless Sensor Network (무선 센서 네트워크에서의 그래프 컬러링 기반의 클러스터링 알고리즘)

  • Kim, J.H.;Chang, H.S.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10d
    • /
    • pp.306-311
    • /
    • 2007
  • 본 논문에서는 Wireless Sensor Network상에서 전체 노드들의 lifetime을 증대시키기 위하여 "random한" 방식으로 cluster-head를 선출하는 LEACH 알고리즘이 가지고 있는 cluster-head 선출 과정에서 선출되는 수와 선출되는 노드들의 위치가 적절히 분산되지 않는 문제를 해결하기 위해 변형된 Graph Coloring 문제를 기반으로 노드의 위치 정보를 사용하지 않고 cluster-head를 적절히 분산하여 선출함으로써 효율적인 clustering을 하는 중앙처리 방식의 새로운 알고리즘 "GCCA : Graph Coloring based Clustering Algorithm for Wireless Sensor Networks" 을 제안한다. GCCA는 cluster-head가 선출되는 수를 일정하게 유지하고 선출되는 노드의 위치가 전체 network area에 적절히 분산되는 효과를 가져 옴으로 LEACH 알고리즘보다 에너지 효율이 증대됨을 실험을 통하여 보인다.

  • PDF

Hierarchical Structure in Semantic Networks of Japanese Word Associations

  • Miyake, Maki;Joyce, Terry;Jung, Jae-Young;Akama, Hiroyuki
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.321-329
    • /
    • 2007
  • This paper reports on the application of network analysis approaches to investigate the characteristics of graph representations of Japanese word associations. Two semantic networks are constructed from two separate Japanese word association databases. The basic statistical features of the networks indicate that they have scale-free and small-world properties and that they exhibit hierarchical organization. A graph clustering method is also applied to the networks with the objective of generating hierarchical structures within the semantic networks. The method is shown to be an efficient tool for analyzing large-scale structures within corpora. As a utilization of the network clustering results, we briefly introduce two web-based applications: the first is a search system that highlights various possible relations between words according to association type, while the second is to present the hierarchical architecture of a semantic network. The systems realize dynamic representations of network structures based on the relationships between words and concepts.

  • PDF

A Linear Clustering Method for the Scheduling of the Directed Acyclic Graph Model with Multiprocessors Using Genetic Algorithm (다중프로세서를 갖는 유방향무환그래프 모델의 스케쥴링을 위한 유전알고리즘을 이용한 선형 클러스터링 해법)

  • Sung, Ki-Seok;Park, Jee-Hyuk
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.24 no.4
    • /
    • pp.591-600
    • /
    • 1998
  • The scheduling of parallel computing systems consists of two procedures, the assignment of tasks to each available processor and the ordering of tasks in each processor. The assignment procedure is same with a clustering. The clustering is classified into linear or nonlinear according to the precedence relationship of the tasks in each cluster. The parallel computing system can be modeled with a Directed Acyclic Graph(DAG). By the granularity theory, DAG is categorized into Coarse Grain Type(CDAG) and Fine Grain Type(FDAG). We suggest the linear clustering method for the scheduling of CDAG using the genetic algorithm. The method utilizes a properly that the optimal schedule of a CDAG is one of linear clustering. We present the computational comparisons between the suggested method for CDAG and an existing method for the general DAG including CDAG and FDAG.

  • PDF

K-Way Graph Partitioning: A Semidefinite Programming Approach (Semidefinite Programming을 통한 그래프의 동시 분할법)

  • Jaehwan, Kim;Seungjin, Choi;Sung-Yang, Bang
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.697-699
    • /
    • 2004
  • Despite many successful spectral clustering algorithm (based on the spectral decomposition of Laplacian(1) or stochastic matrix(2) ) there are several unsolved problems. Most spectral clustering Problems are based on the normalized of algorithm(3) . are close to the classical graph paritioning problem which is NP-hard problem. To get good solution in polynomial time. it needs to establish its convex form by using relaxation. In this paper, we apply a novel optimization technique. semidefinite programming(SDP). to the unsupervised clustering Problem. and present a new multiple Partitioning method. Experimental results confirm that the Proposed method improves the clustering performance. especially in the Problem of being mixed with non-compact clusters compared to the previous multiple spectral clustering methods.

  • PDF

Document Summarization Based on Sentence Clustering Using Graph Division (그래프 분할을 이용한 문장 클러스터링 기반 문서요약)

  • Lee Il-Joo;Kim Min-Koo
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.149-154
    • /
    • 2006
  • The main purpose of document summarization is to reduce the complexity of documents that are consisted of sub-themes. Also it is to create summarization which includes the sub-themes. This paper proposes a summarization system which could extract any salient sentences in accordance with sub-themes by using graph division. A document can be represented in graphs by using chosen representative terms through term relativity analysis based on co-occurrence information. This graph, then, is subdivided to represent sub-themes through connected information. The divided graphs are types of sentence clustering which shows a close relationship. When salient sentences are extracted from the divided graphs, summarization consisted of core elements of sentences from the sub-themes can be produced. As a result, the summarization quality will be improved.