• Title/Summary/Keyword: Co-Clustering

Search Result 221, Processing Time 0.023 seconds

A Mixed Co-clustering Algorithm Based on Information Bottleneck

  • Liu, Yongli;Duan, Tianyi;Wan, Xing;Chao, Hao
    • Journal of Information Processing Systems
    • /
    • v.13 no.6
    • /
    • pp.1467-1486
    • /
    • 2017
  • Fuzzy co-clustering is sensitive to noise data. To overcome this noise sensitivity defect, possibilistic clustering relaxes the constraints in FCM-type fuzzy (co-)clustering. In this paper, we introduce a new possibilistic fuzzy co-clustering algorithm based on information bottleneck (ibPFCC). This algorithm combines fuzzy co-clustering and possibilistic clustering, and formulates an objective function which includes a distance function that employs information bottleneck theory to measure the distance between feature data point and feature cluster centroid. Many experiments were conducted on three datasets and one artificial dataset. Experimental results show that ibPFCC is better than such prominent fuzzy (co-)clustering algorithms as FCM, FCCM, RFCC and FCCI, in terms of accuracy and robustness.

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

  • Do, Jin Hwan;Choi, Dong-Kug
    • Molecules and Cells
    • /
    • v.25 no.2
    • /
    • pp.279-288
    • /
    • 2008
  • The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

Multiview Data Clustering by using Adaptive Spectral Co-clustering (적응형 분광 군집 방법을 이용한 다중 특징 데이터 군집화)

  • Son, Jeong-Woo;Jeon, Junekey;Lee, Sang-Yun;Kim, Sun-Joong
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.686-691
    • /
    • 2016
  • In this paper, we introduced the adaptive spectral co-clustering, a spectral clustering for multiview data, especially data with more than three views. In the adaptive spectral co-clustering, the performance is improved by sharing information from diverse views. For the efficiency in information sharing, a co-training approach is adopted. In the co-training step, a set of parameters are estimated to make all views in data maximally independent, and then, information is shared with respect to estimated parameters. This co-training step increases the efficiency of information sharing comparing with ordinary feature concatenation and co-training methods that assume the independence among views. The adaptive spectral co-clustering was evaluated with synthetic dataset and multi lingual document dataset. The experimental results indicated the efficiency of the adaptive spectral co-clustering with the performances in every iterations and similarity matrix generated with information sharing.

Improved Image Clustering Algorithm based on Weighted Sub-sampling (Weighted subsampling 기반의 향상된 영상 클러스터링 알고리즘)

  • Choi, Byung-In;Nam, Sang-Hoon;Joung, Shi-Chang;Youn, Jung-Su;Yang, Yu-Kyung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.939-940
    • /
    • 2008
  • In this paper, we propose a novel image clustering method based on weighted sub-sampling to reduce clustering time and the number of clusters for target detection and tracking. Our proposed method first obtain sub-sampling image with specific weights which is the number of target pixels in sampling region. After performing clustering procedure, the cluster center position is properly obtained using weights of target pixels in the cluster. Therefore, our proposed method can not only reduce clustering time, but also obtain proper cluster center.

  • PDF

Clustering of Web Document Exploiting with the Union of Term frequency and Co-link in Hypertext (단어빈도와 동시링크의 결합을 통한 웹 문서 클러스터링 성능 향상에 관한 연구)

  • Lee, Kyo-Woon;Lee, Won-hee;Park, Heum;Kim, Young-Gi;Kwon, Hyuk-Chul
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.3
    • /
    • pp.211-229
    • /
    • 2003
  • In this paper, we have focused that the number of word in the web document affects definite clustering performance. Our experimental results have clearly shown the relationship between the amounts of word and its impact on clustering performance. We also have presented an algorithm that can be supplemented of the contrast portion through co-links frequency of web documents. Testing bench of this research is 1,449 web documents included on 'Natural science' category among the Naver Directory. We have clustered these objects by term-based clustering, link-based clustering, and hybrid clustering method, and compared the output results with originally allocated category of Naver directory.

  • PDF

Movie Recommendation Using Co-Clustering by Infinite Relational Models (Infinite Relational Model 기반 Co-Clustering을 이용한 영화 추천)

  • Kim, Byoung-Hee;Zhang, Byoung-Tak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.4
    • /
    • pp.443-449
    • /
    • 2014
  • Preferences of users on movies are observables of various factors that are related with user attributes and movie features. For movie recommendation, analysis methods for relation among users, movies, and preference patterns are mandatory. As a relational analysis tool, we focus on the Infinite Relational Model (IRM) which was introduced as a tool for multiple concept search. We show that IRM-based co-clustering on preference patterns and movie descriptors can be used as the first tool for movie recommender methods, especially content-based filtering approaches. By introducing a set of well-defined tag sets for movies and doing three-way co-clustering on a movie-rating matrix and a movie-tag matrix, we discovered various explainable relations among users and movies. We suggest various usages of IRM-based co-clustering, espcially, for incremental and dynamic recommender systems.

A Multi-Dimensional Issue Clustering from the Perspective Consumers' Interests and R&D (소비자 선호 이슈 및 R&D 관점에서의 다차원 이슈 클러스터링)

  • Hyun, Yoonjin;Kim, Namgyu;Cho, Yoonho
    • Journal of Information Technology Services
    • /
    • v.14 no.1
    • /
    • pp.237-249
    • /
    • 2015
  • The volume of unstructured text data generated by various social media has been increasing rapidly; therefore, use of text mining to support decision making has also been increasing. Especially, issue Clustering-determining a new relation with various issues through clustering-has gained attention from many researchers. However, traditional issue clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be discovered using traditional issue clustering methods, even if those issues are strongly related in other perspectives. Therefore, issue clustering that fits each of criteria needs to be performed by the perspective of analysis and the purpose of use. In this study, a multi-dimensional issue clustering is proposed to overcome the limitation of traditional issue clustering. We assert, specifically in this study, that issue clustering should be performed for a particular purpose. We analyze the results of applying our methodology to two specific perspectives on issue clustering, (i) consumers' interests, and (ii) related R&D terms.

Course Variance Clustering for Traffic Route Waypoint Extraction

  • Onyango Shem Otoi
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.277-279
    • /
    • 2022
  • Rapid Development and adoption of AIS as a survailance tool has resulted in widespread application of data analysis technology, in addition to AIS ship trajectory clustering. AIS data-based clustering has become an increasingly popular method for marine traffic pattern recognition, ship route prediction and anomaly detection in recent year. In this paper we propose a route waypoint extraction by clustering ships CoG variance trajectory using Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm in both port approach channel and coastal waters. The algorithm discovers route waypoint effectively. The result of the study could be used in traffic route extraction, and more-so develop a maritime anomaly detection tool.

  • PDF

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

  • 김영기;이원희;권혁철
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.2
    • /
    • pp.233-253
    • /
    • 2003
  • Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.

  • PDF

Hybrid-clustering game Algorithm for Resource Allocation in Macro-Femto HetNet

  • Ye, Fang;Dai, Jing;Li, Yibing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1638-1654
    • /
    • 2018
  • The heterogeneous network (HetNet) has been one of the key technologies in Long Term Evolution-Advanced (LTE-A) with growing capacity and coverage demands. However, the introduction of femtocells has brought serious co-layer interference and cross-layer interference, which has been a major factor affecting system throughput. It is generally acknowledged that the resource allocation has significant impact on suppressing interference and improving the system performance. In this paper, we propose a hybrid-clustering algorithm based on the $Mat{\acute{e}}rn$ hard-core process (MHP) to restrain two kinds of co-channel interference in the HetNet. As the impracticality of the hexagonal grid model and the homogeneous Poisson point process model whose points distribute completely randomly to establish the system model. The HetNet model based on the MHP is adopted to satisfy the negative correlation distribution of base stations in this paper. Base on the system model, the spectrum sharing problem with restricted spectrum resources is further analyzed. On the basis of location information and the interference relation of base stations, a hybrid clustering method, which takes into accounts the fairness of two types of base stations is firstly proposed. Then, auction mechanism is discussed to achieve the spectrum sharing inside each cluster, avoiding the spectrum resource waste. Through combining the clustering theory and auction mechanism, the proposed novel algorithm can be applied to restrain the cross-layer interference and co-layer interference of HetNet, which has a high density of base stations. Simulation results show that spectral efficiency and system throughput increase to a certain degree.