• Title/Summary/Keyword: 이산형분포

Search Result 62, Processing Time 0.025 seconds

A Study on the Estimation of Confidence Intervals for Discrete Distribution

  • Kim, Dae-Hak;Oh, Kwang-Sik;Lee, Sang-Bok
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1-11
    • /
    • 2003
  • 일반적으로 모수에 대한 신뢰구간 추정량이 점 추정량보다 훨씬 더 선호되고 있으며 많이 알려져 있다. 그러나 이산형 분포의 경우에는 주로 대 표본 근사 이론에 입각한 근사 신뢰구간이 많이 사용되고 있다. 본 논문에서는 여러 가지 이산형 분포 가운데에서 가장 많이 활용되고 있는 이항분포와 포아송 분포의 모수에 대한 다양한 신뢰구간 추정량들을 소개하고 대 표본 근사 이론에 의한 신뢰구간뿐만 아니라 소 표본의 경우에도 유용하게 이용될 수 있는 신뢰구간 등을 살펴보고 이들 신뢰구간들을 비교하였다.

  • PDF

Discretization of continuous-valued attributes considering data distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • 이상훈;박정은;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.217-220
    • /
    • 2003
  • 본 논문에서는 특정 매개변수의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(conti-nuous) 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.

  • PDF

Discretization Method for Continuous Data using Wasserstein Distance (Wasserstein 거리를 이용한 연속형 변수 이산화 기법)

  • Ha, Sang-won;Kim, Han-joon
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.159-169
    • /
    • 2018
  • Discretization of continuous variables intended to improve the performance of various algorithms such as data mining by transforming quantitative variables into qualitative variables. If we use appropriate discretization techniques for data, we can expect not only better performance of classification algorithms, but also accurate and concise interpretation of results and speed improvements. Various discretization techniques have been studied up to now, and however there is still demand of research on discretization studies. In this paper, we propose a new discretization technique to set the cut-point using Wasserstein distance with considering the distribution of continuous variable values with classes of data. We show the superiority of the proposed method through the performance comparison between the proposed method and the existing proven methods.

Combining Independent Permutation p-Values Associated with Multi-Sample Location Test Data

  • Um, Yonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.175-182
    • /
    • 2020
  • Fisher's classical method for combining independent p-values from continuous distributions is widely used but it is known to be inadequate for combining p-values from discrete probability distributions. Instead, the discrete analog of Fisher's classical method is used as an alternative for combining p-values from discrete distributions. In this paper, firstly we obtain p-values from discrete probability distributions associated with multi-sample location test data (Fisher-Pitman test and Kruskall-Wallis test data) by permutation method, and secondly combine the permutaion p-values by the discrete analog of Fisher's classical method. And we finally compare the combined p-values from both the discrete analog of Fisher's classical method and Fisher's classical method.

Criterion of discrete unimodal mixtures (이산분포 혼합의 단봉성이 성립하기 위한 조건)

  • 최대우
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.1
    • /
    • pp.159-167
    • /
    • 1995
  • Considering special discrete distribution of exponential family as a sequence with respect to the points of support, the squence is unimodal in some sense. In this paper, we study under what condition the mixture of that discrete distribution with respect to a parameter is unimodal. We derive the maximal interval of the parameter in which each mixture of the discrete distribution such as Binomial and Poisson is always unimodal.

  • PDF

커널 판별분석의 오분류확률에 대한 붓스트랩 조정

  • 백장선
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.249-265
    • /
    • 1995
  • 본 논문에서는 확률분포가 알려져 있지 않은 두 모집단 중 어느 하나로 새로운 관측치를 분류할 때 오분류확률이 분석자에 의해 사전에 정해진 수준에 부합할 수 있도록 커널 판별함수의 임계치를 결정하였다. 정해진 오분류확률을 만족시키기 위한 판별함수의 임계치는 붓스트랩(bootstrap)기법을 판별 함수에 적용시켜 계산된다. 본 논문에서 제시도된 방법은 모집단에 대한 모수적 가정이 없으므로 어느 분포에도 적용가능하며, 모집단이 정규분포, 대수정규분포, 이산형과 연속형 변수가 혼합된 분포의 경우 모의실험을 통하여 그 성능에 대한 검증을 하였다.

  • PDF

Estimating the Moments of the Project Completion Time in Project Networks (프로젝트 네트워크에서 사업완성시간의 적률 추정)

  • Cho, Jae-Gyeun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.22 no.1
    • /
    • pp.61-67
    • /
    • 2017
  • For a project network analysis, a fundamental problem is to estimate the distribution function of the project completion time. In this paper, we propose a method for evaluating moments(mean, variance, skewness, kurtosis) of the project completion time under the assumption that the durations of activities are independently and normally distributed. The proposed method utilizes the technique of discretization to replace the continuous probability density function(pdf) of activity duration with its discrete pdf and a random number generation. The proposed method is easy to use for large-sized project networks, and the computational results of the proposed method indicate that the accuracy is comparable to that of direct Monte Carlo simulation.

Multi-fidelity Data-fusion for Improving Strain accuracy using Optical Fiber Sensors (이종 광섬유 센서 데이터 융합을 통한 변형률 정확도 향상 기법)

  • Park, Young-Soo;Jin, Seung-Seop;Yoo, Chul-Hwan;Kim, Sungtae;Park, Young-Hwan
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.40 no.6
    • /
    • pp.547-553
    • /
    • 2020
  • As aging infrastructures increase along with time, the efficient maintenance becomes more significant and accurate responses from the sensors are pre-requisite. Among various responses, strain is commonly used to detect damage such as crack and fatigue. Optical fiber sensor is one of the promising sensing techniques to measure strains with high-durability, immunity for electrical noise, long transmission distance. Fiber Bragg Grating (FBG) is a point sensor to measure the strain based on reflected signals from the grating, while Brillouin Optic Correlation Domain Analysis (BOCDA) is a distributed sensor to measure the strain along with the optical fiber based on scattering signals. Although the FBG provides the signal with high accuracy and reproducibility, the number of sensing points is limited. On the other hand, the BOCDA can measure a quasi-continuous strain along with the optical fiber. However, the measured signals from BOCDA have low accuracy and reproducibility. This paper proposed a multi-fidelity data-fusion method based on Gaussian Process Regression to improve the fidelity of the strain distribution by fusing the advantages of both systems. The proposed method was evaluated by laboratory test. The result shows that the proposed method is promising to improve the fidelity of the strain.

Estimating the Moments of the Project Completion Time in Stochastic Activity Networks: General Distributions for Activity Durations (확률적 활동 네트워크에서 사업완성시간의 적률 추정: 활동시간의 일반적 분포)

  • Cho, Jae-Gyeun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.3
    • /
    • pp.49-57
    • /
    • 2018
  • In a previous article, for analyzing a stochastic activity network, Cho proposed a method for estimating the moments (mean, variance, skewness, kurtosis) of the project completion time under the assumption that the durations of activities are independently and normally distributed. Developed in the present article is a method for estimating those moments for stochastic activity networks which allow any type of distributions for activity durations. The proposed method uses the moment matching approach to discretize the distribution function of activity duration, and then a discrete inverse-transform method to determine activity durations to be used for calculating the project completion time. The proposed method can be easily applied to large-sized activity networks, and computationally more efficient than Monte Carlo simulation, and its accuracy is comparable to that of Monte Carlo simulation.

Bayesian Analysis for Categorical Data with Missing Traits Under a Multivariate Threshold Animal Model (다형질 Threshold 개체모형에서 Missing 기록을 포함한 이산형 자료에 대한 Bayesian 분석)

  • Lee, Deuk-Hwan
    • Journal of Animal Science and Technology
    • /
    • v.44 no.2
    • /
    • pp.151-164
    • /
    • 2002
  • Genetic variance and covariance components of the linear traits and the ordered categorical traits, that are usually observed as dichotomous or polychotomous outcomes, were simultaneously estimated in a multivariate threshold animal model with concepts of arbitrary underlying liability scales with Bayesian inference via Gibbs sampling algorithms. A multivariate threshold animal model in this study can be allowed in any combination of missing traits with assuming correlation among the traits considered. Gibbs sampling algorithms as a hierarchical Bayesian inference were used to get reliable point estimates to which marginal posterior means of parameters were assumed. Main point of this study is that the underlying values for the observations on the categorical traits sampled at previous round of iteration and the observations on the continuous traits can be considered to sample the underlying values for categorical data and continuous data with missing at current cycle (see appendix). This study also showed that the underlying variables for missing categorical data should be generated with taking into account for the correlated traits to satisfy the fully conditional posterior distributions of parameters although some of papers (Wang et al., 1997; VanTassell et al., 1998) presented that only the residual effects of missing traits were generated in same situation. In present study, Gibbs samplers for making the fully Bayesian inferences for unknown parameters of interests are played rolls with methodologies to enable the any combinations of the linear and categorical traits with missing observations. Moreover, two kinds of constraints to guarantee identifiability for the arbitrary underlying variables are shown with keeping the fully conditional posterior distributions of those parameters. Numerical example for a threshold animal model included the maternal and permanent environmental effects on a multiple ordered categorical trait as calving ease, a binary trait as non-return rate, and the other normally distributed trait, birth weight, is provided with simulation study.