• Title/Summary/Keyword: Clustering Coefficient

Search Result 192, Processing Time 0.026 seconds

STATISTICAL NOISE BAND REMOVAL FOR SURFACE CLUSTERING OF HYPERSPECTRAL DATA

  • Huan, Nguyen Van;Kim, Hak-Il
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.111-114
    • /
    • 2008
  • The existence of noise bands may deform the typical shape of the spectrum, making the accuracy of clustering degraded. This paper proposes a statistical approach to remove noise bands in hyperspectral data using the correlation coefficient of bands as an indicator. Considering each band as a random variable, two adjacent signal bands in hyperspectral data are highly correlative. On the contrary, existence of a noise band will produce a low correlation. For clustering, the unsupervised ${\kappa}$-nearest neighbor clustering method is implemented in accordance with three well-accepted spectral matching measures, namely ED, SAM and SID. Furthermore, this paper proposes a hierarchical scheme of combining those measures. Finally, a separability assessment based on the between-class and the within-class scatter matrices is followed to evaluate the applicability of the proposed noise band removal method. Also, the paper brings out a comparison for spectral matching measures.

  • PDF

Analysis of Assortativity in the Keyword-based Patent Network Evolution (키워드기반 특허 네트워크 진화에 따른 동종성 분석)

  • Choi, Jinho;Kim, Junguk
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.107-115
    • /
    • 2013
  • Various networks can be observed in the world. Knowledge networks which are closely related with technology and research are especially important because these networks help us understand how knowledge is produced. Therefore, many studies regarding knowledge networks have been conducted. The assortativity coefficient represents the tendency of connections between nodes having a similar property as figures. The relevant characteristics of the assortativity coefficient help us understand how corresponding technologies have evolved in the keyword-based patent network which is considered to be a knowledge network. The relationships of keywords in a knowledge network where a node is depicted as a keyword show the structure of the technology development process. In this paper, we suggest two hypotheses basedon the previous research indicating that there exist core nodes in the keyword network and we conduct assortativity analysis to verify the hypotheses. First, the patents network based on the keyword represents disassortativity over time. Through our assortativity analysis, it is confirmed that the knowledge network shows disassortativity as the network evolves. Second, as the keyword-based patents network becomes disassortavie, clustering coefficients become lower. As the result of this hypothesis, weconfirm the clustering coefficient also becomes lower as the assortative coefficient of the network gets lower. Another interesting result concerning the second hypothesis is that, when the knowledge network is disassorativie, the tendency of decreasing of the clustering coefficient is much higher than when the network is assortative.

Temperature network analysis of the Korean peninsula linking by DCCA methodology (DCCA 방법으로 연결된 한반도의 기온 네트워크 분석)

  • Min, Seungsik
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1445-1458
    • /
    • 2016
  • This paper derives a correlation coefficient using detrended cross-correlation analysis (DCCA) method for 59 regional temperature series for 40 years from 1976 to 2015. The average temperature, maximum temperature, and minimum temperature series for 4 year units are analyzed; consequently, we estimated that a temperature correlation exists between the two regions during the unit period where the correlation coefficient is greater than or equal to 0.9; subsequently, we construct a network linking the two regions. Based on network theory, average path length, clustering coefficient, assortativity, and modularity were derived. As a result, it was found that the temperature network satisfies a small-worldness property and is a network having assortativity and modularity.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

Entropy-based Correlation Clustering for Wireless Sensor Networks in Multi-Correlated Regional Environments

  • Nga, Nguyen Thi Thanh;Khanh, Nguyen Kim;Hong, Son Ngo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.85-93
    • /
    • 2016
  • The existence of correlation characteristics brings significant potential advantages to the development of efficient routing protocols in wireless sensor networks. This research proposes a new simple method of clustering sensor nodes into correlation groups in multiple-correlation areas. At first, the evaluation of joint entropy for multiple-sensed data is considered. Based on the evaluation, the definition of correlation region, based on entropy theory, is proposed. Following that, a correlation clustering scheme with less computation is developed. The results are validated with a real data set.

Colorectal Cancer Staging Using Three Clustering Methods Based on Preoperative Clinical Findings

  • Pourahmad, Saeedeh;Pourhashemi, Soudabeh;Mohammadianpanah, Mohammad
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.823-827
    • /
    • 2016
  • Determination of the colorectal cancer stage is possible only after surgery based on pathology results. However, sometimes this may prove impossible. The aim of the present study was to determine colorectal cancer stage using three clustering methods based on preoperative clinical findings. All patients referred to the Colorectal Research Center of Shiraz University of Medical Sciences for colorectal cancer surgery during 2006 to 2014 were enrolled in the study. Accordingly, 117 cases participated. Three clustering algorithms were utilized including k-means, hierarchical and fuzzy c-means clustering methods. External validity measures such as sensitivity, specificity and accuracy were used for evaluation of the methods. The results revealed maximum accuracy and sensitivity values for the hierarchical and a maximum specificity value for the fuzzy c-means clustering methods. Furthermore, according to the internal validity measures for the present data set, the optimal number of clusters was two (silhouette coefficient) and the fuzzy c-means algorithm was more appropriate than the k-means clustering approach by increasing the number of clusters.

Plurality Rule-based Density and Correlation Coefficient-based Clustering for K-NN

  • Aung, Swe Swe;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.183-192
    • /
    • 2017
  • k-nearest neighbor (K-NN) is a well-known classification algorithm, being feature space-based on nearest-neighbor training examples in machine learning. However, K-NN, as we know, is a lazy learning method. Therefore, if a K-NN-based system very much depends on a huge amount of history data to achieve an accurate prediction result for a particular task, it gradually faces a processing-time performance-degradation problem. We have noticed that many researchers usually contemplate only classification accuracy. But estimation speed also plays an essential role in real-time prediction systems. To compensate for this weakness, this paper proposes correlation coefficient-based clustering (CCC) aimed at upgrading the performance of K-NN by leveraging processing-time speed and plurality rule-based density (PRD) to improve estimation accuracy. For experiments, we used real datasets (on breast cancer, breast tissue, heart, and the iris) from the University of California, Irvine (UCI) machine learning repository. Moreover, real traffic data collected from Ojana Junction, Route 58, Okinawa, Japan, was also utilized to lay bare the efficiency of this method. By using these datasets, we proved better processing-time performance with the new approach by comparing it with classical K-NN. Besides, via experiments on real-world datasets, we compared the prediction accuracy of our approach with density peaks clustering based on K-NN and principal component analysis (DPC-KNN-PCA).

On the Clustering Networks using the Kohonen's Elf-Organization Architecture (코호넨의 자기조직화 구조를 이용한 클러스터링 망에 관한 연구)

  • Lee, Ji-Young
    • The Journal of Information Technology
    • /
    • v.8 no.1
    • /
    • pp.119-124
    • /
    • 2005
  • Learning procedure in the neural network is updating of weights between neurons. Unadequate initial learning coefficient causes excessive iterations of learning process or incorrect learning results and degrades learning efficiency. In this paper, adaptive learning algorithm is proposed to increase the efficient in the learning algorithms of Kohonens Self-Organization Neural networks. The algorithm updates the weights adaptively when learning procedure runs. To prove the efficiency the algorithm is experimented to clustering of the random weight. The result shows improved learning rate about 42~55% ; less iteration counts with correct answer.

  • PDF

Multiview Data Clustering by using Adaptive Spectral Co-clustering (적응형 분광 군집 방법을 이용한 다중 특징 데이터 군집화)

  • Son, Jeong-Woo;Jeon, Junekey;Lee, Sang-Yun;Kim, Sun-Joong
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.686-691
    • /
    • 2016
  • In this paper, we introduced the adaptive spectral co-clustering, a spectral clustering for multiview data, especially data with more than three views. In the adaptive spectral co-clustering, the performance is improved by sharing information from diverse views. For the efficiency in information sharing, a co-training approach is adopted. In the co-training step, a set of parameters are estimated to make all views in data maximally independent, and then, information is shared with respect to estimated parameters. This co-training step increases the efficiency of information sharing comparing with ordinary feature concatenation and co-training methods that assume the independence among views. The adaptive spectral co-clustering was evaluated with synthetic dataset and multi lingual document dataset. The experimental results indicated the efficiency of the adaptive spectral co-clustering with the performances in every iterations and similarity matrix generated with information sharing.

Clustering Meta Information of K-Pop Girl Groups Using Term Frequency-inverse Document Frequency Vectorization (단어-역문서 빈도 벡터화를 통한 한국 걸그룹의 음반 메타 정보 군집화)

  • JoonSeo Hyeon;JaeHyuk Cho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.12-23
    • /
    • 2023
  • In the 2020s, the K-Pop market has been dominated by girl groups over boy groups and the fourth generation over the third generation. This paper presents methods and results on lyric clustering to investigate whether the generation of girl groups has started to change. We collected meta-information data for 1469 songs of 47 groups released from 2013 to 2022 and classified them into lyric information and non-lyric meta-information and quantified them respectively. The lyrics information was preprocessed by applying word-translation frequency vectorization based on previous studies and then selecting only the top vector values. Non-lyric meta-information was preprocessed and applied with One-Hot Encoding to reduce the bias of using only lyric information and show better clustering results. The clustering performance on the preprocessed data is 129%, 45% higher for Spherical K-Means' Silhouette Score and Calinski-Harabasz Score, respectively, compared to Hierarchical Clustering. This paper is expected to contribute to the study of Korean popular song development and girl group lyrics analysis and clustering.

  • PDF