• Title/Summary/Keyword: Density-Based

Search Result 7,247, Processing Time 0.035 seconds

A Density Peak Clustering Algorithm Based on Information Bottleneck

  • Yongli Liu;Congcong Zhao;Hao Chao
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.778-790
    • /
    • 2023
  • Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

A Comparison on the Differential Entropy

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.705-712
    • /
    • 2005
  • Entropy is the basic concept of information theory. It is well defined for random varibles with known probability density function(pdf). For given data with unknown pdf, entropy should be estimated. Usually, estimation of entropy is based on the approximations. In this paper, we consider a kernel based approximation and compare it to the cumulant approximation method for several distributions. Monte carlo simulation for various sample size is conducted.

  • PDF

Sequential Confidence Intervals for Quantiles Based on Recursive Density Estimators

  • Kim, Sung-Kyun;Kim, Sung-Lai
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.3
    • /
    • pp.297-309
    • /
    • 1999
  • A sequential procedure of fixed-width confidence intervals for quantiles satisfying a condition of coverage probability is provided based on recursive density estimators. It is shown that the proposed sequential procedure is asymptotically efficient. In addition, the asymptotic normality for the proposed stopping time is derived.

  • PDF

Estimation of Density via Local Polynomial Tegression

  • Park, B. U.;Kim, W. C.;J. Huh;J. W. Jeon
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.1
    • /
    • pp.91-100
    • /
    • 1998
  • A method of estimating probability density using regression tools is presented here. It is based on equal-length binning and locally weighted approximate likelihood for bin counts. The method is particularly useful for densities with bounded supports, where it automatically corrects edge effects without using boundary kernels.

  • PDF

M-Estimation Functions Induced From Minimum L$_2$ Distance Estimation

  • Pak, Ro-Jin
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.507-514
    • /
    • 1998
  • The minimum distance estimation based on the L$_2$ distance between a model density and a density estimator is studied from M-estimation point of view. We will show that how a model density and a density estimator are incorporated in order to create an M-estimation function. This method enables us to create an M-estimating function reflecting the natures of both an assumed model density and a given set of data. Some new types of M-estimation functions for estimating a location and scale parameters are introduced.

  • PDF

The Region of Positivity and Unimodality in the Truncated Series of a Nonparametric Kernel Density Estimator

  • Gupta, A.K.;Im, B.K.K.
    • Journal of the Korean Statistical Society
    • /
    • v.10
    • /
    • pp.140-144
    • /
    • 1981
  • This paper approximates to a kernel density estimate by a truncated series of expansion involving Hermite polynomials, since this could ease the computing burden involved in the kernel-based density estimation. However, this truncated series may give a multimodal estimate when we are estiamting unimodal density. In this paper we will show a way to insure the truncated series to be positive and unimodal so that the approximation to a kernel density estimator would be maeningful.

  • PDF

Estimation of Crowd Density in Public Areas Based on Neural Network

  • Kim, Gyujin;An, Taeki;Kim, Moonhyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.9
    • /
    • pp.2170-2190
    • /
    • 2012
  • There are nowadays strong demands for intelligent surveillance systems, which can infer or understand more complex behavior. The application of crowd density estimation methods could lead to a better understanding of crowd behavior, improved design of the built environment, and increased pedestrian safety. In this paper, we propose a new crowd density estimation method, which aims at estimating not only a moving crowd, but also a stationary crowd, using images captured from surveillance cameras situated in various public locations. The crowd density of the moving people is measured, based on the moving area during a specified time period. The moving area is defined as the area where the magnitude of the accumulated optical flow exceeds a predefined threshold. In contrast, the stationary crowd density is estimated from the coarseness of textures, under the assumption that each person can be regarded as a textural unit. A multilayer neural network is designed, to classify crowd density levels into 5 classes. Finally, the proposed method is experimented with PETS 2009 and the platform of Gangnam subway station image sequences.

Ferroelectric ultra high-density data storage based on scanning nonlinear dielectric microscopy

  • Cho, Ya-Suo;Odagawa, Nozomi;Tanaka, Kenkou;Hiranaga, Yoshiomi
    • Transactions of the Society of Information Storage Systems
    • /
    • v.3 no.2
    • /
    • pp.94-112
    • /
    • 2007
  • Nano-sized inverted domain dots in ferroelectric materials have potential application in ultrahigh-density rewritable data storage systems. Herein, a data storage system is presented based on scanning non-linear dielectric microscopy and a thin film of ferroelectric single-crystal lithium tantalite. Through domain engineering, we succeeded to form an smallest artificial nano-domain single dot of 5.1 nm in diameter and artificial nano-domain dot-array with a memory density of 10.1 Tbit/$inch^2$ and a bit spacing of 8.0 nm, representing the highest memory density for rewritable data storage reported to date. Sub-nanosecond (500psec) domain switching speed also has been achieved. Next, long term retention characteristic of data with inverted domain dots is investigated by conducting heat treatment test. Obtained life time of inverted dot with the radius of 50nm was 16.9 years at $80^{\circ}C$. Finally, actual information storage with low bit error and high memory density was performed. A bit error ratio of less than $1\times10^{-4}$ was achieved at an areal density of 258 Gbit/inch2. Moreover, actual information storage is demonstrated at a density of 1 Tbit/$inch^2$.

  • PDF

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.17.1-17.11
    • /
    • 2022
  • Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

Development of Accident Density Model in Korea (국내 교통사고 밀도 모형 개발)

  • Park, Na Young;Kim, Tae Yang;Park, Byung Ho
    • Journal of the Korean Society of Safety
    • /
    • v.32 no.3
    • /
    • pp.130-135
    • /
    • 2017
  • This study deal with the traffic accident. The purpose of this study is to develop the accident density models reflecting the transportation and socioeconomic characteristics based on 230 zones of Korea. In this study, The models which are tested to be statistically significant are developed through multiple linear regression analysis. The main research results are as follows. First, in the transportation-based model, road length, avenue ratio, number of intersections and tunnels are analyzed to be positive to the model, however, school zone is analyzed to be negative to the model. Second, in the socioeconomic-based model, population density, transportation vulnerable ratio, children and truck ratio are analyzed to be positive to the model. Finally, in the integrated models, road ratio, population density, transportation vulnerable ratio, children ratio, truck ratio and number of companies are analyzed to be positive, however, school zone is analyzed to be negative to the model. This results could be expected to give good implications to accident-reduction policy-making.