• 제목/요약/키워드: Bayesian clustering analysis

검색결과 48건 처리시간 0.024초

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

주성분 분석과 나이브 베이지안 분류기를 이용한 퍼지 군집화 모형 (Fuzzy Clustering Model using Principal Components Analysis and Naive Bayesian Classifier)

  • 전성해
    • 정보처리학회논문지B
    • /
    • 제11B권4호
    • /
    • pp.485-490
    • /
    • 2004
  • 자조의 표현에서 군집화는 주어진 데이터를 서로 유사한 개체들끼리 몇 개의 집단으로 묶는 작업을 수행한다. 군집화의 유사도 결정 측도는 맡은 연구들에서 매우 다양한 것들이 사용되었다. 하지만 군집화 결과의 성능 측정에 대한 객관적인 기준 설정이 어렵기 때문에 군집화 결과에 대한 해석은 매우 주관적이고, 애매한 경우가 많다. 퍼지 군집화는 이러한 주관적인 군집화 문제에 있어서 객관성 있는 군집 결정 방안을 제시하여 준다. 각 개체들이 특정 군집에 속하게 될 퍼지 멤버 함수값을 원소로 하는 유사도 행렬을 통하여 군집화를 수행한다. 본 논문에서는 차원 축소기법의 하나인 주성분 분석과 강력한 통계적 학습 이론인 베이지안 학습을 결합한 군집화 모형을 제안하여, 객관적인 퍼지 군집화를 수행하였다. 제안 알고리즘의 성능 평가를 위하여 UCI Machine Loaming Repository의 Iris와 Glass Identification 데이터를 이용한 실험 결과를 제시하였다.

퍼지 클러스터링의 베이지안 검증 방법을 이용한 발아효모 세포주기 발현 데이타의 분석 (Analysis of Saccharomyces Cell Cycle Expression Data using Bayesian Validation of Fuzzy Clustering)

  • 유시호;원홍희;조성배
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권12호
    • /
    • pp.1591-1601
    • /
    • 2004
  • 유전자를 분석하는 방법 중 하나인 클러스터링은 비슷한 기능을 가진 유전자들을 집단화시켜서 유전자 집단의 기능을 분석하는데 이용되고 있다. 유전자들은 다양한 functional family에 속할 수 있기 때문에 각 유전자의 클러스터를 하나로 결정짓는 기존의 클러스터링 방법보다 퍼지 클러스터링 방법이 유전자 클러스터링에 더 적합하다. 본 논문에서는 피지 클러스터 결과를 효과적으로 검증할 수 있는 베이지안 검증 방법을 제안한다. 베이지안 검증 방법은 확률기반의 방법으로 주어진 데이타에 대해 가장 큰 사후확률을 가진 클러스터 분할을 선택한다. 먼저 본 논문에서 제안하는 베이지안 검증 방법과 기존의 대표적인 4가지 퍼지 클러스터 검증 방법들을 4가지 데이타에 대해 퍼지 c-means알고리즘을 대상으로 비교 평가한다. 그리고 발아효모 세포주기 발현 데이타를 클러스터링한 후, 제안하는 방법으로 그 결과를 검증하여 분석한다.

Taxonomic reconsideration of Chinese Lespedeza maximowiczii (Fabaceae) based on morphological and genetic features, and recommendation as the independent species L. pseudomaximowiczii

  • JIN, Dong-Pil;XU, Bo;CHOI, Byoung-Hee
    • 식물분류학회지
    • /
    • 제48권3호
    • /
    • pp.153-162
    • /
    • 2018
  • Lespedeza maximowiczii C. K. Schneid. (Fabaceae) is a deciduous shrub which is known to be distributed in the temperate forests of China, Korea and on Tsushima Island of Japan. Due to severe morphological variations within species, numerous examinations have been conducted for Korean L. maximowiczii. However, the morphology of Chinese plants has not been studied as thoroughly, despite doubts about their taxonomy. To clarify this taxonomic issue, we investigated morphological characters and undertook a Bayesian clustering analysis with microsatellite markers. The morphological and genetic traits of Chinese individuals varied considerably from those of typical L. maximowiczii growing in Korea. For example, petals of the former had a different shape and bore long claws, while the calyx lobes were diverged above the middle and the upper surface of the leaflet was pubescent. Their terete buds and spirally arranged bud scales were distinct from those within the series/section Heterolespedeza, which includes L. maximowiczii. Our Bayesian clustering analysis additionally included L. buergeri as an outgroup. Those results indicated that the Chinese samples clustered into a lineage separated from L. maximowiczii (optimum cluster, K = 2), despite the fact that the latter is grouped into the same lineage with L. buergeri. Therefore, we treat those Chinese plants as a new species with the name L. pseudomaximowiczii.

데이터마이닝을 위한 사후확률 정보엔트로피 기반 군집화알고리즘 (Clustering Algorithm for Data Mining using Posterior Probability-based Information Entropy)

  • 박인규
    • 디지털융복합연구
    • /
    • 제12권12호
    • /
    • pp.293-301
    • /
    • 2014
  • 본 논문에서는 데이터 마이닝에 필요한 클러스터링과정에서 불필요한 정보를 감축하기 위하여 베이지언 사후확률의 신뢰도를 이용한 새로운 척도를 제안한다. 데이터 감축을 위한 속성의 중요도가 클러스터링의 결과에 지배적이기 때문에 많은 속성의 변별력을 향상시키기 위하여 사후확률의 신뢰도에 정보 엔트로피를 적용하였다. 제안된 사후확률을 기반으로 한 러프 엔트로피 척도에 의한 속성의 신뢰도의 중복성은 엔트로피의 자연로그에 의하여 상당히 줄어든다. 따라서 제안된 척도에 의하여 생성된 군집화 알고리즘은 속성값의 변별력을 향상시켜 기존의 리덕트를 최소화하였고, 이는 분할의 효율성을 향상시킬 수 있었다. 제안된 알고리즘의 검증을 위해 패턴분류 문제에 적용되는 ACME 데이터에 대하여 속성간의 변별력, 분할결과에 따른 분할의 순정도를 기존의 알고리즘과 비교 분석하였다.

A Short Note on Empirical Penalty Term Study of BIC in K-means Clustering Inverse Regression

  • Ahn, Ji-Hyun;Yoo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제18권3호
    • /
    • pp.267-275
    • /
    • 2011
  • According to recent studies, Bayesian information criteria(BIC) is proposed to determine the structural dimension of the central subspace through sliced inverse regression(SIR) with high-dimensional predictors. The BIC may be useful in K-means clustering inverse regression(KIR) with high-dimensional predictors. However, the direct application of the BIC to KIR may be problematic, because the slicing scheme in SIR is not the same as that of KIR. In this paper, we present empirical penalty term studies of BIC in KIR to identify the most appropriate one. Numerical studies and real data analysis are presented.

Genetic Diversity and Population Genetic Structure of Black-spotted Pond Frog (Pelophylax nigromaculatus) Distributed in South Korean River Basins

  • Park, Jun-Kyu;Yoo, Nakyung;Do, Yuno
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • 제2권2호
    • /
    • pp.120-128
    • /
    • 2021
  • The objective of this study was to analyze the genotype of black-spotted pond frog (Pelophylax nigromaculatus) using seven microsatellite loci to quantify its genetic diversity and population structure throughout the spatial scale of basins of Han, Geum, Yeongsan, and Nakdong Rivers in South Korea. Genetic diversities in these four areas were compared using diversity index and inbreeding coefficient obtained from the number and frequency of alleles as well as heterozygosity. Additionally, the population structure was confirmed with population differentiation, Nei's genetic distance, multivariate analysis, and Bayesian clustering analysis. Interestingly, a negative genetic diversity pattern was observed in the Han River basin, indicating possible recent habitat disturbances or population declines. In contrast, a positive genetic diversity pattern was found for the population in the Nakdong River basin that had remained the most stable. Results of population structure suggested that populations of black-spotted pond frogs distributed in these four river basins were genetically independent. In particular, the population of the Nakdong River basin had the greatest genetic distance, indicating that it might have originated from an independent population. These results support the use of genetics in addition to designations strictly based on geographic stream areas to define the spatial scale of populations for management and conservation practices.

K-means Clustering for Environmental Indicator Survey Data

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 춘계학술대회
    • /
    • pp.185-192
    • /
    • 2005
  • There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.

  • PDF

영상처리를 이용한 현미의 온라인 품위판정 알고리즘 (On-line Inspection Algorithm of Brown Rice Using Image Processing)

  • 김태민;노상하
    • Journal of Biosystems Engineering
    • /
    • 제35권2호
    • /
    • pp.138-145
    • /
    • 2010
  • An on-line algorithm that discriminates brown rice kernels on their echelon feeder using color image processing is presented for quality inspection. A rapid color image segmentation algorithm based on Bayesian clustering method was developed by means of the look-up table which was made from the significant clusters selected by experts. A robust estimation method was presented to improve the stability of color clusters. Discriminant analysis of color distributions was employed to distinguish nine types of brown rice kernels. Discrimination accuracies of the on-line discrimination algorithm were ranged from 72% to 85% for the sound, cracked, green-transparent and green-opaque, greater than 93% for colored, red, and unhulled, about 92% for white-opaque and 67% for chalky, respectively.

Spatial Analysis of Common Gastrointestinal Tract Cancers in Counties of Iran

  • Soleimani, Ali;Hassanzadeh, Jafar;Motlagh, Ali Ghanbari;Tabatabaee, Hamidreza;Partovipour, Elham;Keshavarzi, Sareh;Hossein, Mohammad
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권9호
    • /
    • pp.4025-4029
    • /
    • 2015
  • Background: Gastrointestinal tract cancers are among the most common cancers in Iran and comprise approximately 38% of all the reported cases of cancer. This study aimed to describe the epidemiology and to investigate spatial clustering of common cancers of the gastrointestinal tract across the counties of Iran using full Bayesian smoothing and Moran I Index statistics. Materials and Methods: The data of the national registry cancer were used in this study. Besides, indirect standardized rates were calculated for 371 counties of Iranand smoothed using Winbug 1.4 software with a full Bayesian method. Global Moran I and local Moran I were also used to investigate clustering. Results: According to the results, 75,644 new cases of cancer were nationally registered in Iran among which 18,019 cases (23.8%) were esophagus, gastric, colorectal, and liver cancers. The results of Global Moran's I test were 0.60 (P=0.001), 0.47 (P=0.001), 0.29 (P=0.001), and 0.40 (P=0.001) for esophagus, gastric, colorectal, and liver cancers, respectively. This shows clustering of the four studied cancers in Iran at the national level. Conclusions: High level clustering of the cases was seen in northern, northwestern, western, and northeastern areas for esophagus, gastric, and colorectal cancers. Considering liver cancer, high clustering was observed in some counties in central, northeastern, and southern areas.