• Title/Summary/Keyword: Clustering coefficient

Search Result 197, Processing Time 0.025 seconds

An Optimal Cluster Analysis Method with Fuzzy Performance Measures (퍼지 성능 측정자를 결합한 최적 클러스터 분석방법)

  • 이현숙;오경환
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.6 no.3
    • /
    • pp.81-88
    • /
    • 1996
  • Cluster analysis is based on partitioning a collection of data points into a number of clusters, where the data points in side a cluster have a certain degree of similarity and it is a fundamental process of data analysis. So, it has been playing an important role in solving many problems in pattern recognition and image processing. For these many clustering algorithms depending on distance criteria have been developed and fuzzy set theory has been introduced to reflect the description of real data, where boundaries might be fuzzy. If fuzzy cluster analysis is tomake a significant contribution to engineering applications, much more attention must be paid to fundamental questions of cluster validity problem which is how well it has identified the structure that is present in the data. Several validity functionals such as partition coefficient, claasification entropy and proportion exponent, have been used for measuring validity mathematically. But the issue of cluster validity involves complex aspects, it is difficult to measure it with one measuring function as the conventional study. In this paper, we propose four performance indices and the way to measure the quality of clustering formed by given learning strategy.

  • PDF

Analysis of Genetic Relationship among Cymbidium germplasms Using RAPD and URP (RAPD와 URP를 이용한 심비디움 유전자원 유연관계 분석)

  • Park, Pue Hee;Kim, Mi Seon;Lee, Young Ran;Park, Pil Man;Lee, Dong Soo;Yae, Byeong Woo
    • FLOWER RESEARCH JOURNAL
    • /
    • v.18 no.3
    • /
    • pp.201-206
    • /
    • 2010
  • The genetic relationship among 48 Cymbidium cultivars was analyzed using randomly amplified polymorphic DNA (RAPD) with eighty 10 mers random primers (Operon Technologies) and twelve 20 mers random primers. Forty eight Cymbidium cultivars included 34 oriental Cymbidium, 7 hybrids, and 7 western Cymbidium. 407 (9.9 per primer) and 56 polymorphic bands (9.5 per primer) were generated by polymerase chain reaction with selected thirty 10 mers primers, and nine 20 mers primers, respectively. The polymorphic fragments ranged from 0.4 to 1.5 kb in size. The dendrogram was constructed by using the UPGMA clustering algorithm based on genetic similarity. Forty eight Cymbidium cultivars were classified into four major groups at similarity coefficient value of 0.638.

Genetic characteristics of Korean Jeju Black cattle with high density single nucleotide polymorphisms

  • Alam, M. Zahangir;Lee, Yun-Mi;Son, Hyo-Jung;Hanna, Lauren H.;Riley, David G.;Mannen, Hideyuki;Sasazaki, Shinji;Park, Se Pill;Kim, Jong-Joo
    • Animal Bioscience
    • /
    • v.34 no.5
    • /
    • pp.789-800
    • /
    • 2021
  • Objective: Conservation and genetic improvement of cattle breeds require information about genetic diversity and population structure of the cattle. In this study, we investigated the genetic diversity and population structure of the three cattle breeds in the Korean peninsula. Methods: Jeju Black, Hanwoo, Holstein cattle in Korea, together with six foreign breeds were examined. Genetic diversity within the cattle breeds was analyzed with minor allele frequency (MAF), observed and expected heterozygosity (HO and HE), inbreeding coefficient (FIS) and past effective population size. Molecular variance and population structure between the nine breeds were analyzed using a model-based clustering method. Genetic distances between breeds were evaluated with Nei's genetic distance and Weir and Cockerham's FST. Results: Our results revealed that Jeju Black cattle had lowest level of heterozygosity (HE = 0.21) among the studied taurine breeds, and an average MAF of 0.16. The level of inbreeding was -0.076 for Jeju Black, while -0.018 to -0.118 for the other breeds. Principle component analysis and neighbor-joining tree showed a clear separation of Jeju Black cattle from other local (Hanwoo and Japanese cattle) and taurine/indicine cattle breeds in evolutionary process, and a distinct pattern of admixture of Jeju Black cattle having no clustering with other studied populations. The FST value between Jeju Black cattle and Hanwoo was 0.106, which was lowest across the pair of breeds ranging from 0.161 to 0.274, indicating some degree of genetic closeness of Jeju Black cattle with Hanwoo. The past effective population size of Jeju Black cattle was very small, i.e. 38 in 13 generation ago, whereas 209 for Hanwoo. Conclusion: This study indicates genetic uniqueness of Jeju Black cattle. However, a small effective population size of Jeju Black cattle indicates the requirement for an implementation of a sustainable breeding policy to increase the population for genetic improvement and future conservation.

The Effect of Small-World Structure in Team Processes on Team Performance (팀 프로세스의 작은 세상 구조가 팀 성과에 미치는 영향)

  • Seo, Il-Jung
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.3
    • /
    • pp.539-547
    • /
    • 2019
  • This study investigated the effect of small-world structure in team processes on team performance. I discussed the theoretical relationship between small-world structure in team processes and team performance and analyzed the relationship using pass data of soccer teams. I constructed the 128 pass networks from the pass data of the 2014 FIFA World Cup and then measured the structural features indicating small-world structure of the networks. Correlation analysis and regression analysis were performed in order to examine the strength and direction of the relationship. According to the results, the clustering has an exponential relationship with team performance and the connectivity has a log-function relationship with team performance. Finally, I found the positive effect of small-world structure in team processes on team performance. Through theoretical discussion and empirical analysis, this study found that small-world structure in team processes increase team performance by facilitating task coordination and collaboration between team members.

On-Line Social Network Generation Model (온라인 소셜 네트워크 생성 모델)

  • Lee, Kang-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.914-924
    • /
    • 2020
  • In this study we developed artificial network generation model, which can generate on-line social network. The suggested model can represent not only scale-free and small-world properties, but also can produce networks with various values of topological characteristics through controlling two input parameters. For this purpose, two parameter K and P are introduced, K for controlling the strength of preferential attachment and P for controlling clustering coefficient. It is found out on-line social network can be generated with the combinations of K(0~10) and P(0.3~0.5) or K=0 and P=0.9. Under these combinations of P and K small-world and scale-free properties are well represented. Node degree distribution follows power-law. Clustering coefficients are between 0.130 and 0.238, and average shortest path distance between 5.641 and 5.985. It is also found that on-line social network properties are maintained as network node size increases from 5,000 to 10,000.

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

  • Hyeonwoo Kim;Jiwon Kim;Ji Won Cho;Kwang-Sung Ahn;Dong-Il Park;Sangsoo Kim
    • Genomics & Informatics
    • /
    • v.21 no.3
    • /
    • pp.40.1-40.11
    • /
    • 2023
  • Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.

Anatomical Brain Connectivity Map of Korean Children (한국 아동 집단의 구조 뇌연결지도)

  • Um, Min-Hee;Park, Bum-Hee;Park, Hae-Jeong
    • Investigative Magnetic Resonance Imaging
    • /
    • v.15 no.2
    • /
    • pp.110-122
    • /
    • 2011
  • Purpose : The purpose of this study is to establish the method generating human brain anatomical connectivity from Korean children and evaluating the network topological properties using small-world network analysis. Materials and Methods : Using diffusion tensor images (DTI) and parcellation maps of structural MRIs acquired from twelve healthy Korean children, we generated a brain structural connectivity matrix for individual. We applied one sample t-test to the connectivity maps to derive a representative anatomical connectivity for the group. By spatially normalizing the white matter bundles of participants into a template standard space, we obtained the anatomical brain network model. Network properties including clustering coefficient, characteristic path length, and global/local efficiency were also calculated. Results : We found that the structural connectivity of Korean children group preserves the small-world properties. The anatomical connectivity map obtained in this study showed that children group had higher intra-hemispheric connectivity than inter-hemispheric connectivity. We also observed that the neural connectivity of the group is high between brain stem and motorsensory areas. Conclusion : We suggested a method to examine the anatomical brain network of Korean children group. The proposed method can be used to evaluate the efficiency of anatomical brain networks in people with disease.

Preference Prediction System using Similarity Weight granted Bayesian estimated value and Associative User Clustering (베이지안 추정치가 부여된 유사도 가중치와 연관 사용자 군집을 이용한 선호도 예측 시스템)

  • 정경용;최성용;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.316-325
    • /
    • 2003
  • A user preference prediction method using an exiting collaborative filtering technique has used the nearest-neighborhood method based on the user preference about items and has sought the user's similarity from the Pearson correlation coefficient. Therefore, it does not reflect any contents about items and also solve the problem of the sparsity. This study suggests the preference prediction system using the similarity weight granted Bayesian estimated value and the associative user clustering to complement problems of an exiting collaborative preference prediction method. This method suggested in this paper groups the user according to the Genre by using Association Rule Hypergraph Partitioning Algorithm and the new user is classified into one of these Genres by Naive Bayes classifier to slove the problem of sparsity in the collaborative filtering system. Besides, for get the similarity between users belonged to the classified genre and new users, this study allows the different estimated value to item which user vote through Naive Bayes learning. If the preference with estimated value is applied to the exiting Pearson correlation coefficient, it is able to promote the precision of the prediction by reducing the error of the prediction because of missing value. To estimate the performance of suggested method, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

Evaluation of Structural and Functional Changes of Ecological Networks by Land Use Change in a Wetlandscape (토지이용변화에 따른 거시적 습지경관에서의 생태네트워크의 구조 및 기능적 변화 평가)

  • Kim, Bin;Park, Jeryang
    • Ecology and Resilient Infrastructure
    • /
    • v.7 no.3
    • /
    • pp.189-198
    • /
    • 2020
  • Wetlands, which provide various ecological services, have been regarded as an important nature-based solution for, for example, sustainable water quality improvement and buffering of impacts from climate change. Although the importance of conserving wetlands to reduce the impacts of various perturbations (e.g., changes of land use, climate, and hydrology) has been acknowledged, the possibility of applying these efforts as a nature-based solution in a macro-scale (e.g., landscape) has been insufficient. In this study, we examine the possibility of ecological network analysis that provides an engineering solution as a nature-based solution. Specifically, we analyzed how land use change affects the structural and functional characteristics (connectivity, network efficiency, and clustering coefficient) of the ecological networks by using the ecological networks generated by multiple dispersal models of the hypothetical inhabiting species in wetlandscape. Changes in ecological network characteristics were analyzed through simultaneously removing wetlands, with two initial conditions for surface area, in the zones where land use change occurs. We set a total number of four zones of land use change with different wetland densities. All analyses showed that mean degree and network efficiency were significantly reduced when wetlands in the zones with high wetland density were removed, and this phenomenon was intensified especially when zones contained hubs (nodes with high degree). On the other hand, we observed the clustering coefficient to increase. We suggest our approach for assessing the impacts of land use change on ecological networks, and with additional analysis on betweenness centrality, we expect it can provide a nature-based engineering solution for creating alternative wetlands.