• Title/Summary/Keyword: Bayesian clustering analysis

Search Result 48, Processing Time 0.019 seconds

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

Fuzzy Clustering Model using Principal Components Analysis and Naive Bayesian Classifier (주성분 분석과 나이브 베이지안 분류기를 이용한 퍼지 군집화 모형)

  • Jun, Sung-Hae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.485-490
    • /
    • 2004
  • In data representation, the clustering performs a grouping process which combines given data into some similar clusters. The various similarity measures have been used in many researches. But, the validity of clustering results is subjective and ambiguous, because of difficulty and shortage about objective criterion of clustering. The fuzzy clustering provides a good method for subjective clustering problems. It performs clustering through the similarity matrix which has fuzzy membership value for assigning each object. In this paper, for objective fuzzy clustering, the clustering algorithm which joins principal components analysis as a dimension reduction model with bayesian learning as a statistical learning theory. For performance evaluation of proposed algorithm, Iris and Glass identification data from UCI Machine Learning repository are used. The experimental results shows a happy outcome of proposed model.

Analysis of Saccharomyces Cell Cycle Expression Data using Bayesian Validation of Fuzzy Clustering (퍼지 클러스터링의 베이지안 검증 방법을 이용한 발아효모 세포주기 발현 데이타의 분석)

  • Yoo Si-Ho;Won Hong-Hee;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1591-1601
    • /
    • 2004
  • Clustering, a technique for the analysis of the genes, organizes the patterns into groups by the similarity of the dataset and has been used for identifying the functions of the genes in the cluster or analyzing the functions of unknown gones. Since the genes usually belong to multiple functional families, fuzzy clustering methods are more appropriate than the conventional hard clustering methods which assign a sample to a group. In this paper, a Bayesian validation method is proposed to evaluate the fuzzy partitions effectively. Bayesian validation method is a probability-based approach, selecting a fuzzy partition with the largest posterior probability given the dataset. At first, the proposed Bayesian validation method is compared to the 4 representative conventional fuzzy cluster validity measures in 4 well-known datasets where foray c-means algorithm is used. Then, we have analyzed the results of Saccharomyces cell cycle expression data evaluated by the proposed method.

Taxonomic reconsideration of Chinese Lespedeza maximowiczii (Fabaceae) based on morphological and genetic features, and recommendation as the independent species L. pseudomaximowiczii

  • JIN, Dong-Pil;XU, Bo;CHOI, Byoung-Hee
    • Korean Journal of Plant Taxonomy
    • /
    • v.48 no.3
    • /
    • pp.153-162
    • /
    • 2018
  • Lespedeza maximowiczii C. K. Schneid. (Fabaceae) is a deciduous shrub which is known to be distributed in the temperate forests of China, Korea and on Tsushima Island of Japan. Due to severe morphological variations within species, numerous examinations have been conducted for Korean L. maximowiczii. However, the morphology of Chinese plants has not been studied as thoroughly, despite doubts about their taxonomy. To clarify this taxonomic issue, we investigated morphological characters and undertook a Bayesian clustering analysis with microsatellite markers. The morphological and genetic traits of Chinese individuals varied considerably from those of typical L. maximowiczii growing in Korea. For example, petals of the former had a different shape and bore long claws, while the calyx lobes were diverged above the middle and the upper surface of the leaflet was pubescent. Their terete buds and spirally arranged bud scales were distinct from those within the series/section Heterolespedeza, which includes L. maximowiczii. Our Bayesian clustering analysis additionally included L. buergeri as an outgroup. Those results indicated that the Chinese samples clustered into a lineage separated from L. maximowiczii (optimum cluster, K = 2), despite the fact that the latter is grouped into the same lineage with L. buergeri. Therefore, we treat those Chinese plants as a new species with the name L. pseudomaximowiczii.

Clustering Algorithm for Data Mining using Posterior Probability-based Information Entropy (데이터마이닝을 위한 사후확률 정보엔트로피 기반 군집화알고리즘)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.293-301
    • /
    • 2014
  • In this paper, we propose a new measure based on the confidence of Bayesian posterior probability so as to reduce unimportant information in the clustering process. Because the performance of clustering is up to selecting the important degree of attributes within the databases, the concept of information entropy is added to posterior probability for attributes discernibility. Hence, The same value of attributes in the confidence of the proposed measure is considerably much less due to the natural logarithm. Therefore posterior probability-based clustering algorithm selects the minimum of attribute reducts and improves the efficiency of clustering. Analysis of the validation of the proposed algorithms compared with others shows their discernibility as well as ability of clustering to handle uncertainty with ACME categorical data.

A Short Note on Empirical Penalty Term Study of BIC in K-means Clustering Inverse Regression

  • Ahn, Ji-Hyun;Yoo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.267-275
    • /
    • 2011
  • According to recent studies, Bayesian information criteria(BIC) is proposed to determine the structural dimension of the central subspace through sliced inverse regression(SIR) with high-dimensional predictors. The BIC may be useful in K-means clustering inverse regression(KIR) with high-dimensional predictors. However, the direct application of the BIC to KIR may be problematic, because the slicing scheme in SIR is not the same as that of KIR. In this paper, we present empirical penalty term studies of BIC in KIR to identify the most appropriate one. Numerical studies and real data analysis are presented.

Genetic Diversity and Population Genetic Structure of Black-spotted Pond Frog (Pelophylax nigromaculatus) Distributed in South Korean River Basins

  • Park, Jun-Kyu;Yoo, Nakyung;Do, Yuno
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.2 no.2
    • /
    • pp.120-128
    • /
    • 2021
  • The objective of this study was to analyze the genotype of black-spotted pond frog (Pelophylax nigromaculatus) using seven microsatellite loci to quantify its genetic diversity and population structure throughout the spatial scale of basins of Han, Geum, Yeongsan, and Nakdong Rivers in South Korea. Genetic diversities in these four areas were compared using diversity index and inbreeding coefficient obtained from the number and frequency of alleles as well as heterozygosity. Additionally, the population structure was confirmed with population differentiation, Nei's genetic distance, multivariate analysis, and Bayesian clustering analysis. Interestingly, a negative genetic diversity pattern was observed in the Han River basin, indicating possible recent habitat disturbances or population declines. In contrast, a positive genetic diversity pattern was found for the population in the Nakdong River basin that had remained the most stable. Results of population structure suggested that populations of black-spotted pond frogs distributed in these four river basins were genetically independent. In particular, the population of the Nakdong River basin had the greatest genetic distance, indicating that it might have originated from an independent population. These results support the use of genetics in addition to designations strictly based on geographic stream areas to define the spatial scale of populations for management and conservation practices.

K-means Clustering for Environmental Indicator Survey Data

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.185-192
    • /
    • 2005
  • There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.

  • PDF

On-line Inspection Algorithm of Brown Rice Using Image Processing (영상처리를 이용한 현미의 온라인 품위판정 알고리즘)

  • Kim, Tae-Min;Noh, Sang-Ha
    • Journal of Biosystems Engineering
    • /
    • v.35 no.2
    • /
    • pp.138-145
    • /
    • 2010
  • An on-line algorithm that discriminates brown rice kernels on their echelon feeder using color image processing is presented for quality inspection. A rapid color image segmentation algorithm based on Bayesian clustering method was developed by means of the look-up table which was made from the significant clusters selected by experts. A robust estimation method was presented to improve the stability of color clusters. Discriminant analysis of color distributions was employed to distinguish nine types of brown rice kernels. Discrimination accuracies of the on-line discrimination algorithm were ranged from 72% to 85% for the sound, cracked, green-transparent and green-opaque, greater than 93% for colored, red, and unhulled, about 92% for white-opaque and 67% for chalky, respectively.

Spatial Analysis of Common Gastrointestinal Tract Cancers in Counties of Iran

  • Soleimani, Ali;Hassanzadeh, Jafar;Motlagh, Ali Ghanbari;Tabatabaee, Hamidreza;Partovipour, Elham;Keshavarzi, Sareh;Hossein, Mohammad
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.9
    • /
    • pp.4025-4029
    • /
    • 2015
  • Background: Gastrointestinal tract cancers are among the most common cancers in Iran and comprise approximately 38% of all the reported cases of cancer. This study aimed to describe the epidemiology and to investigate spatial clustering of common cancers of the gastrointestinal tract across the counties of Iran using full Bayesian smoothing and Moran I Index statistics. Materials and Methods: The data of the national registry cancer were used in this study. Besides, indirect standardized rates were calculated for 371 counties of Iranand smoothed using Winbug 1.4 software with a full Bayesian method. Global Moran I and local Moran I were also used to investigate clustering. Results: According to the results, 75,644 new cases of cancer were nationally registered in Iran among which 18,019 cases (23.8%) were esophagus, gastric, colorectal, and liver cancers. The results of Global Moran's I test were 0.60 (P=0.001), 0.47 (P=0.001), 0.29 (P=0.001), and 0.40 (P=0.001) for esophagus, gastric, colorectal, and liver cancers, respectively. This shows clustering of the four studied cancers in Iran at the national level. Conclusions: High level clustering of the cases was seen in northern, northwestern, western, and northeastern areas for esophagus, gastric, and colorectal cancers. Considering liver cancer, high clustering was observed in some counties in central, northeastern, and southern areas.