• Title/Summary/Keyword: 클러스터 간 유사도

Search Result 106, Processing Time 0.027 seconds

Construct ion of Korean Thesaurus Us ing Machine Readable Dictionary (기계가독사전을 이용한 한국어 시소러스 구축)

  • Lee, Ju-Ho;Un, Koaung-Hi;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.273-279
    • /
    • 2001
  • 시소러스는 자연언어처리의 여러 분야에서 이용 가능한 아주 유용한 정보이다. 본 논문에서는 기존의 구축된 시소러스를 기반으로 우리말 큰사전을 이용하여 한국어 명사 시소러스를 반자동으로 구축하는 과정을 소개한다. 우선 코퍼스의 고빈도어를 중심으로 사전에서 추출한 기본명사들의 각 의미에 1차로 의미번호 부착 후 그 결과를 이용하여 사전 정의문으로 각 의미별 클러스터를 구성했다. 그리고, 전단계에서 의미번호를 붙이지 못한 명사의 의미에 대하여 그 정의문과 클러스트들 간의 유사도를 계산하여 가장 유사한 의미번호를 후보로 제시하였다. 마지막으로 사전의 하이퍼링크를 사용하여 아직 의미 번호가 붙지 않는 명사의 의미에 의미번호를 부여했다. 각 단계에서는 사람의 후처리를 통해서 시소러스의 정확도를 높였다.

  • PDF

A Clustering Technique using Common Structures of XML Documents (XML 문서의 공통 구조를 이용한 클러스터링 기법)

  • Hwang, Jeong-Hee;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.6
    • /
    • pp.650-661
    • /
    • 2005
  • As the Internet is growing, the use of XML which is a standard of semi-structured document is increasing. Therefore, there are on going works about integration and retrieval of XML documents. However, the basis of efficient integration and retrieval of documents is to cluster XML documents with similar structure. The conventional XML clustering approaches use the hierarchical clustering algorithm that produces the demanded number of clusters through repeated merge, but it have some problems that it is difficult to compute the similarity between XML documents and it costs much time to compare similarity repeatedly. In order to address this problem, we use clustering algorithm for transactional data that is scale for large size of data. In this paper we use common structures from XML documents that don't have DTD or schema. In order to use common structures of XML document, we extract representative structures by decomposing the structure from a tree model expressing the XML document, and we perform clustering with the extracted structure. Besides, we show efficiency of proposed method by comparing and analyzing with the previous method.

An Efficient Clustering Algorithm based on Heuristic Evolution (휴리스틱 진화에 기반한 효율적 클러스터링 알고리즘)

  • Ryu, Joung-Woo;Kang, Myung-Ku;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.80-90
    • /
    • 2002
  • Clustering is a useful technique for grouping data points such that points within a single group/cluster have similar characteristics. Many clustering algorithms have been developed and used in engineering applications including pattern recognition and image processing etc. Recently, it has drawn increasing attention as one of important techniques in data mining. However, clustering algorithms such as K-means and Fuzzy C-means suffer from difficulties. Those are the needs to determine the number of clusters apriori and the clustering results depending on the initial set of clusters which fails to gain desirable results. In this paper, we propose a new clustering algorithm, which solves mentioned problems. In our method we use evolutionary algorithm to solve the local optima problem that clustering converges to an undesirable state starting with an inappropriate set of clusters. We also adopt a new measure that represents how well data are clustered. The measure is determined in terms of both intra-cluster dispersion and inter-cluster separability. Using the measure, in our method the number of clusters is automatically determined as the result of optimization process. And also, we combine heuristic that is problem-specific knowledge with a evolutionary algorithm to speed evolutionary algorithm search. We have experimented our algorithm with several sets of multi-dimensional data and it has been shown that one algorithm outperforms the existing algorithms.

Promotion Strategies for Daegu-Kyungbuk Mobile Cluster: Searching for Alternative Regional Innovation Governance (대구.경북 모바일 클러스터 육성전략: 지역혁신 거버넌스의 대안 모색)

  • Lee, Jeong-Hyop;Kim, Hyung-Joo
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.12 no.4
    • /
    • pp.477-493
    • /
    • 2009
  • This research aims to examine Korean regional innovation governance, find structural problems, and explore alternative strategies of regional innovation governance. Especially the alternative governance was searched through the case study of Daegu-Kyungbuk mobile cluster, of which formulation Samsung is the anchor institution. Regional innovation governance in this research is defined as a policy system to link knowledge generation & diffusion subsystem and knowledge application & exploitation subsystem, and institutional conditions to steer the system. "Social Capital Assessment Tool (SOCAT)" of the World Bank was utilized for the appreciation of cluster governance. The regional innovation governance of Daegu-Kyungbuk mobile cluster is characterized as production networks dominated by one-to-one relationship between Samsung and hardware/software developers, decentralized R&D networks and policy networks with multiple hubs. Major policy agents have not developed networks with local companies, and rare are interactions between the policy agents. Local companies, especially software developers, responded they have had experiences to cooperate for local problem solving and shared their community goal, however, the degree of trust in major local project leaders is not high. Local hardware/software developers with core technologies need to be cooperative to develop similar technologies or products in Daegu-Kyungbuk mobile cluster. Regional administrative actors, such as the City of Daegu and Kyungsangbuk-do, and diverse innovation-related institutes should build cooperative environment where diverse project-based cooperation units are incessantly created, taken apart, and recreated.

  • PDF

An Analysis on the Linkage Structure of Industrial Complexes(Clusters) in the Internal and External Capital Region (수도권 산업단지(클러스터)의 광역권 내부 및 외부 연계구조 분석)

  • Koo, Yang-Mi;Nahm, Kee-Bom;Park, Sam-Ock
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.13 no.2
    • /
    • pp.181-195
    • /
    • 2010
  • The policy of industrial complexes (innovative clusters) is being changed to build the linkage structure within Mega Economic Region according to the national policy of Mega Economic Region. The aim of this analysis is to draw the hypothetical linkage structure of industrial complexes in the internal and external Capital Region. First, with the survey data of firms located in the industrial complexes, we can catch the regional linkages of firms in the local area and internal and external Mega Economic Region. Next, the measure of structural similarity between industrial complexes is calculated with the number of employees by industrial sectors. After considering the geographical distance between industrial complexes, the percentage of industrial sectors and the location quotient synthetically, the idea of hub-and-spoke type linkage structure between clusters is deduced.

  • PDF

Web Document Clustering based on Graph using Hyperlinks (하이퍼링크를 이용한 그래프 기반의 웹 문서 클러스터링)

  • Lee, Joon;Kang, Jin-Beom;Choi, Joong-Min
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.590-595
    • /
    • 2009
  • With respect to the exponential increment of web documents on the internet, it is important how to improve performance of clustering method for web documents. Web document clustering techniques can offer accurate information and fast information retrieval by clustering web documents through semantic relationship. The clustering method based on mesh-graph provides high recall by calculating similarity for documents, but it requires high computation cost. This paper proposes a clustering method using hyperlinks which is structural feature of web documents in order to keep effectiveness and reduce computation cost.

  • PDF

Cluster Merging Using Density based Fuzzy C-Means algorithm (밀도 기반의 퍼지 C-Means 알고리즘을 이용한 클러스터 합병)

  • 한진우;전성해;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.235-238
    • /
    • 2003
  • Fuzzy C-Means(FCM) 알고리즘은 초기 군집 중심의 개수와 위치에 따라 군집 결과의 성능차이가 많이 나타난다. 하지만 일반적인 경우에 군집 중심의 개수는 분석가의 주관에 의해 결정되고, 임의적으로 결정되기 때문에 원래 데이터의 구조와는 무관하게 수행되어 최적화된 군집화 수행을 실행하지 못하는 경우가 발생하게 된다. 따라서 본 논문에서는 원래의 데이터의 구조에 좀더 근접한 퍼지 군집화를 수행하기 위하여 격자를 바탕으로 한 데이터의 밀도를 이용한 FCM을 제안하고, 이러한 밀도 기반 FCM에 의해 결정된 군집의 합병 기법을 제안하였다. N-차원의 데이터 공간을 N-차원의 격자로 나누고, 초기 군집 중심의 개수와 위치는 각 격자의 밀도를 바탕으로 결정된다. 초기화 이후에 각 격자 내부에서 FCM을 이용하여 군집화를 수행하고, 계속해서 이웃 격자의 군집결과에 대하여 군집간의 유사도 측도를 이용하여 군집 합병을 수행함으로써 데이터의 자연적인 구조에 근접한 군집화를 수행하였다. 제안된 군집화 합병 기법의 향상된 성능은 UCI Machine Learning Repository 데이터를 이용하여 확인하였다.

  • PDF

Document clustering based on summarized document using K-means algorithm (요약 문서 기반 문서 클러스터링)

  • Oh, Hyung-Jin;Ko, Ji-Hyun;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.589-592
    • /
    • 2002
  • 정보검색 시스템에서 문서 클러스터링 기법은 사용자 질의에 대하여 검색된 문서를 문서간의 관련도에 따라 클러스터로 구성하고 사용자에게 검색 결과로 보여주는 것이다. 본 논문에서는 사용자의 질의에 대하여 검색된 문서를 자동 문서 요약기를 통해 얻은 요약 문서와 문서 전문을 문서들간의 유사도를 기반으로 동적으로 클러스터링 한다. 구현한 시스템의 클러스터링 효과를 검증한 결과 검색된 문서 전문을 클러스터링 한 방식에 비해 요약 문서를 클러스터링 한 방식이 정확률 측면에서 더 나은 성능을 보였다.

  • PDF

GORank: Semantic Similarity Search for Gene Products using Gene Ontology (GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색)

  • Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.682-692
    • /
    • 2006
  • Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.

A clustering algorithm based on dynamic properties in Mobile Ad-hoc network (에드 혹 네트워크에서 노드의 동적 속성 기반 클러스터링 알고리즘 연구)

  • Oh, Young-Jun;Woo, Byeong-Hun;Lee, Kang-Whan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.3
    • /
    • pp.715-723
    • /
    • 2015
  • In this paper, we propose a context-awareness routing algorithm DDV (Dynamic Direction Vector)-hop algorithm in Mobile Ad Hoc Networks. The existing algorithm in MANET, it has a vulnerability that the dynamic network topology and the absence of network expandability of mobility of nodes. The proposed algorithm performs cluster formation using a range of direction and threshold of velocity for the base-station, we calculate the exchange of the cluster head node probability using the direction and velocity for maintaining cluster formation. The DDV algorithm forms a cluster based on the cluster head node. As a result of simulation, our scheme could maintain the proper number of cluster and cluster members regardless of topology changes.