• Title/Summary/Keyword: kernel distribution estimation

Search Result 79, Processing Time 0.022 seconds

Parametric nonparametric methods for estimating extreme value distribution (극단값 분포 추정을 위한 모수적 비모수적 방법)

  • Woo, Seunghyun;Kang, Kee-Hoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.1
    • /
    • pp.531-536
    • /
    • 2022
  • This paper compared the performance of the parametric method and the nonparametric method when estimating the distribution for the tail of the distribution with heavy tails. For the parametric method, the generalized extreme value distribution and the generalized Pareto distribution were used, and for the nonparametric method, the kernel density estimation method was applied. For comparison of the two approaches, the results of function estimation by applying the block maximum value model and the threshold excess model using daily fine dust public data for each observatory in Seoul from 2014 to 2018 are shown together. In addition, the area where high concentrations of fine dust will occur was predicted through the return level.

Identification of the associations between genes and quantitative traits using entropy-based kernel density estimation

  • Yee, Jaeyong;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.17.1-17.11
    • /
    • 2022
  • Genetic associations have been quantified using a number of statistical measures. Entropy-based mutual information may be one of the more direct ways of estimating the association, in the sense that it does not depend on the parametrization. For this purpose, both the entropy and conditional entropy of the phenotype distribution should be obtained. Quantitative traits, however, do not usually allow an exact evaluation of entropy. The estimation of entropy needs a probability density function, which can be approximated by kernel density estimation. We have investigated the proper sequence of procedures for combining the kernel density estimation and entropy estimation with a probability density function in order to calculate mutual information. Genotypes and their interactions were constructed to set the conditions for conditional entropy. Extensive simulation data created using three types of generating functions were analyzed using two different kernels as well as two types of multifactor dimensionality reduction and another probability density approximation method called m-spacing. The statistical power in terms of correct detection rates was compared. Using kernels was found to be most useful when the trait distributions were more complex than simple normal or gamma distributions. A full-scale genomic dataset was explored to identify associations using the 2-h oral glucose tolerance test results and γ-glutamyl transpeptidase levels as phenotypes. Clearly distinguishable single-nucleotide polymorphisms (SNPs) and interacting SNP pairs associated with these phenotypes were found and listed with empirical p-values.

Comparison Study of Kernel Density Estimation according to Various Bandwidth Selectors (다양한 대역폭 선택법에 따른 커널밀도추정의 비교 연구)

  • Kang, Young-Jin;Noh, Yoojeong
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.3
    • /
    • pp.173-181
    • /
    • 2019
  • To estimate probabilistic distribution function from experimental data, kernel density estimation(KDE) is mostly used in cases when data is insufficient. The estimated distribution using KDE depends on bandwidth selectors that smoothen or overfit a kernel estimator to experimental data. In this study, various bandwidth selectors such as the Silverman's rule of thumb, rule using adaptive estimates, and oversmoothing rule, were compared for accuracy and conservativeness. For this, statistical simulations were carried out using assumed true models including unimodal and multimodal distributions, and, accuracies and conservativeness of estimating distribution functions were compared according to various data. In addition, it was verified how the estimated distributions using KDE with different bandwidth selectors affect reliability analysis results through simple reliability examples.

A Berry-Esseen Type Bound in Kernel Density Estimation for a Random Left-Truncation Model

  • Asghari, P.;Fakoor, V.;Sarmad, M.
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.2
    • /
    • pp.115-124
    • /
    • 2014
  • In this paper we derive a Berry-Esseen type bound for the kernel density estimator of a random left truncated model, in which each datum (Y) is randomly left truncated and is sampled if $Y{\geq}T$, where T is the truncation random variable with an unknown distribution. This unknown distribution is estimated with the Lynden-Bell estimator. In particular the normal approximation rate, by choice of the bandwidth, is shown to be close to $n^{-1/6}$ modulo logarithmic term. We have also investigated this normal approximation rate via a simulation study.

Minimum Distance Estimation Based On The Kernels For U-Statistics

  • Park, Hyo-Il
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.1
    • /
    • pp.113-132
    • /
    • 1998
  • In this paper, we consider a minimum distance (M.D.) estimation based on kernels for U-statistics. We use Cramer-von Mises type distance function which measures the discrepancy between U-empirical distribution function(d.f.) and modeled d.f. of kernel. In the distance function, we allow various integrating measures, which can be finite, $\sigma$-finite or discrete. Then we derive the asymptotic normality and study the qualitative robustness of M. D. estimates.

  • PDF

A kernel machine for estimation of mean and volatility functions

  • Shim, Joo-Yong;Park, Hye-Jung;Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.905-912
    • /
    • 2009
  • We propose a doubly penalized kernel machine (DPKM) which uses heteroscedastic location-scale model as basic model and estimates both mean and volatility functions simultaneously by kernel machines. We also present the model selection method which employs the generalized approximate cross validation techniques for choosing the hyperparameters which affect the performance of DPKM. Artificial examples are provided to indicate the usefulness of DPKM for the mean and volatility functions estimation.

  • PDF

On the Equality of Two Distributions Based on Nonparametric Kernel Density Estimator

  • Kim, Dae-Hak;Oh, Kwang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.247-255
    • /
    • 2003
  • Hypothesis testing for the equality of two distributions were considered. Nonparametric kernel density estimates were used for testing equality of distributions. Cross-validatory choice of bandwidth was used in the kernel density estimation. Sampling distribution of considered test statistic were developed by resampling method, called the bootstrap. Small sample Monte Carlo simulation were conducted. Empirical power of considered tests were compared for variety distributions.

  • PDF

ECG Denoising by Modeling Wavelet Sub-Band Coefficients using Kernel Density Estimation

  • Ardhapurkar, Shubhada;Manthalkar, Ramchandra;Gajre, Suhas
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.669-684
    • /
    • 2012
  • Discrete wavelet transforms are extensively preferred in biomedical signal processing for denoising, feature extraction, and compression. This paper presents a new denoising method based on the modeling of discrete wavelet coefficients of ECG in selected sub-bands with Kernel density estimation. The modeling provides a statistical distribution of information and noise. A Gaussian kernel with bounded support is used for modeling sub-band coefficients and thresholds and is estimated by placing a sliding window on a normalized cumulative density function. We evaluated this approach on offline noisy ECG records from the Cardiovascular Research Centre of the University of Glasgow and on records from the MIT-BIH Arrythmia database. Results show that our proposed technique has a more reliable physical basis and provides improvement in the Signal-to-Noise Ratio (SNR) and Percentage RMS Difference (PRD). The morphological information of ECG signals is found to be unaffected after employing denoising. This is quantified by calculating the mean square error between the feature vectors of original and denoised signal. MSE values are less than 0.05 for most of the cases.

Development of MKDE-ebd for Estimation of Multivariate Probabilistic Distribution Functions (다변량 확률분포함수의 추정을 위한 MKDE-ebd 개발)

  • Kang, Young-Jin;Noh, Yoojeong;Lim, O-Kaung
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.1
    • /
    • pp.55-63
    • /
    • 2019
  • In engineering problems, many random variables have correlation, and the correlation of input random variables has a great influence on reliability analysis results of the mechanical systems. However, correlated variables are often treated as independent variables or modeled by specific parametric joint distributions due to difficulty in modeling joint distributions. Especially, when there are insufficient correlated data, it becomes more difficult to correctly model the joint distribution. In this study, multivariate kernel density estimation with bounded data is proposed to estimate various types of joint distributions with highly nonlinearity. Since it combines given data with bounded data, which are generated from confidence intervals of uniform distribution parameters for given data, it is less sensitive to data quality and number of data. Thus, it yields conservative statistical modeling and reliability analysis results, and its performance is verified through statistical simulation and engineering examples.

Online Probability Density Estimation of Nonstationary Random Signal using Dynamic Bayesian Networks

  • Cho, Hyun-Cheol;Fadali, M. Sami;Lee, Kwon-Soon
    • International Journal of Control, Automation, and Systems
    • /
    • v.6 no.1
    • /
    • pp.109-118
    • /
    • 2008
  • We present two estimators for discrete non-Gaussian and nonstationary probability density estimation based on a dynamic Bayesian network (DBN). The first estimator is for off line computation and consists of a DBN whose transition distribution is represented in terms of kernel functions. The estimator parameters are the weights and shifts of the kernel functions. The parameters are determined through a recursive learning algorithm using maximum likelihood (ML) estimation. The second estimator is a DBN whose parameters form the transition probabilities. We use an asymptotically convergent, recursive, on-line algorithm to update the parameters using observation data. The DBN calculates the state probabilities using the estimated parameters. We provide examples that demonstrate the usefulness and simplicity of the two proposed estimators.