• 제목/요약/키워드: over-clustering

검색결과 385건 처리시간 0.023초

다자간 환경에서 프라이버시를 보호하는 효율적인 DBSCAN 군집화 기법 (Practical Privacy-Preserving DBSCAN Clustering Over Horizontally Partitioned Data)

  • 김기성;정익래
    • 정보보호학회논문지
    • /
    • 제20권3호
    • /
    • pp.105-111
    • /
    • 2010
  • 본 논문은 다자간 환경에서 프라이버시를 보호하는 효율적인 DBSCAN 군집화 기법을 제안한다. 기존 DBSCAN 군집화 기법에 가짜 데이터 삽입을 통한 프라이버시 보호 기법을 적용해 다자간 환경에서 프라이버시를 보호하는 기법으로 확장했다. 기존의 프라이버시를 보호하는 다자간 환경의 군집화 기법들은 비효율적이어서 실제 환경에 적용하기 힘들지만 제안한 기법은 이러한 문제를 해결한 매우 효율적인 기법이다. 본 기법은 다자간 환경뿐만 아니라 비 다자간 환경에도 적용 가능한 효율적인 기법이다.

퍼지 kNN과 Conditional FCM을 이용한 퍼지 RBF의 설계 (Design of Radial Basis Function with the Aid of Fuzzy KNN and Conditional FCM)

  • 노석범;오성권
    • 전기학회논문지
    • /
    • 제58권6호
    • /
    • pp.1223-1229
    • /
    • 2009
  • The performance of Radial Basis Function Neural Networks depends on setting up the Radial Basis Functions over the input space which are the important design procedure of Radial Basis Function Neural Networks. The existing method to initialize the location of the radial basis functions over the input space is to use the conditional fuzzy C-means clustering. However, the researchers which are interested in the conditional fuzzy C-means clustering cannot get as good modeling performance as they expect because the conditional fuzzy C-means clustering cannot project the information which is extracted over the output space into the input space. To compensate the above mentioned drawback of the conditional fuzzy C-means clustering, we apply a fuzzy K-nearest neighbors approach to project the auxiliary information defined over the output space into the input space without lose of the information.

화자분할을 위한 지역적 특성 기반 밀도 클러스터링 (Local Distribution Based Density Clustering for Speaker Diarization)

  • 노진상;손수원;김성수;이재원;고한석
    • 한국음향학회지
    • /
    • 제34권4호
    • /
    • pp.303-309
    • /
    • 2015
  • 화자 분할은 사전에 분류되지 않은 데이터를 각각의 화자로 분류하는 연구이며 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)은 간결함과 계산의 효율성으로 인해 화자분할 분야에 널리 사용되어 왔다. 그러나 클러스터의 데이터들이 공간적이지 않으며 서로 다른 클러스터가 근접하여 경계를 공유할 때 오버클러스터링 문제가 발생하여 DBSCAN의 성능이 하락한다. 본 논문에서는 DBSCAN과 문제점을 설명하고, 개체의 지역적 특성에 기반한 밀도 기반 클러스터링 알고리즘을 제안한다. 제안하는 알고리즘은 개체의 지역적 밀도와 분산의 정도에 따라 가변적인 판단 기준을 탐색에 이용한다. DBSCAN과 제안 기법의 실험을 통해 성능을 비교하고 제안 기법의 효용을 보인다. 실험 결과 제안한 방법은 오버클러스터링이 발생하지 않으며 DBSCAN에 비해 보다 높은 정확도를 보여 지역적 특성을 이용한 접근 방법이 효과적임을 증명한다.

Mobility-Based Clustering Algorithm for Multimedia Broadcasting over IEEE 802.11p-LTE-enabled VANET

  • Syfullah, Mohammad;Lim, Joanne Mun-Yee;Siaw, Fei Lu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권3호
    • /
    • pp.1213-1237
    • /
    • 2019
  • Vehicular Ad-hoc Network (VANET) facilities envision future Intelligent Transporting Systems (ITSs) by providing inter-vehicle communication for metrics such as road surveillance, traffic information, and road condition. In recent years, vehicle manufacturers, researchers and academicians have devoted significant attention to vehicular communication technology because of its highly dynamic connectivity and self-organized, decentralized networking characteristics. However, due to VANET's high mobility, dynamic network topology and low communication coverage, dissemination of large data packets (e.g. multimedia content) is challenging. Clustering enhances network performance by maintaining communication link stability, sharing network resources and efficiently using bandwidth among nodes. This paper proposes a mobility-based, multi-hop clustering algorithm, (MBCA) for multimedia content broadcasting over an IEEE 802.11p-LTE-enabled hybrid VANET architecture. The OMNeT++ network simulator and a SUMO traffic generator are used to simulate a network scenario. The simulation results indicate that the proposed clustering algorithm over a hybrid VANET architecture improves the overall network stability and performance, resulting in an overall 20% increased cluster head duration, 20% increased cluster member duration, lower cluster overhead, 15% improved data packet delivery ratio and lower network delay from the referenced schemes [46], [47] and [50] during multimedia content dissemination over VANET.

Path based K-means Clustering for RFID Data Sets

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • 제6권4호
    • /
    • pp.434-438
    • /
    • 2008
  • Massive data are continuously produced with a data rate of over several terabytes every day. These applications need effective clustering algorithms to achieve an overall high performance computation. In this paper, we propose ancestor as cluster center based approach to clustering, the K-means algorithm using ancestor. We modify the K-means algorithm. We present a clustering architecture and a clustering algorithm that minimize of I/Os and show a performance with excellent. In our experimental performance evaluation, we present that our algorithm can improve the I/O speed and the query processing time.

Sub-class Clustering of Land Cover over Asia considering 9-year NDVI and Climate Data

  • Lee, Ga-Lam;Han, Kyung-Soo;Kim, Do-Yong
    • 대한원격탐사학회지
    • /
    • 제27권3호
    • /
    • pp.289-301
    • /
    • 2011
  • In this paper an attempt has been made to classify Asia land cover considering climatic and vegetative characteristics. The sub-class clustering based on the 13 MODIS land cover classes (except water) over Asia was performed with the climate map and the NOVI derived from SPOT 5 VGT D10 data. The unsupervised classification for the sub-class clustering was performed in each land cover class, and total 74 clusters were determined over the study area. Via these clusters, the annual variations (from 1999 to 2007) of precipitation rate and temperature were analyzed as an example by a simple linear regression model. The various annual variations (negative or positive pattern) were represented for each cluster because of the various climate zones and NOVI annual cycles. Therefore, the detailed land cover map as the classification result by the sub-class clustering in this study can be useful information in modelling works for requiring the detailed climatic and vegetative information as a boundary condition.

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제27권4호
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • 음성과학
    • /
    • 제10권1호
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF