• Title/Summary/Keyword: data augmentation

Search Result 580, Processing Time 0.029 seconds

ON BAYESIAN ESTIMATION AND PROPERTIES OF THE MARGINAL DISTRIBUTION OF A TRUNCATED BIVARIATE t-DISTRIBUTION

  • KIM HEA-JUNG;KIM Ju SUNG
    • Journal of the Korean Statistical Society
    • /
    • v.34 no.3
    • /
    • pp.245-261
    • /
    • 2005
  • The marginal distribution of X is considered when (X, Y) has a truncated bivariate t-distribution. This paper mainly focuses on the marginal nontruncated distribution of X where Y is truncated below at its mean and its observations are not available. Several properties and applications of this distribution, including relationship with Azzalini's skew-normal distribution, are obtained. To circumvent inferential problem arises from adopting the frequentist's approach, a Bayesian method utilizing a data augmentation method is suggested. Illustrative examples demonstrate the performance of the method.

A Bayesian Approach to Detecting Outliers Using Variance-Inflation Model

  • Lee, Sangjeen;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.805-814
    • /
    • 2001
  • The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for outliers problem and also analyze it in linear regression model using a Bayesian approach with the variance-inflation model. We will use Geweke's(1996) ideas which is based on the data augmentation method for detecting outliers in linear regression model. The advantage of the proposed method is to find a subset of data which is most suspicious in the given model by the posterior probability The sampling based approach can be used to allow the complicated Bayesian computation. Finally, our proposed methodology is applied to a simulated and a real data.

  • PDF

Fully Efficient Fractional Imputation for Incomplete Contingency Tables

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.993-1002
    • /
    • 2004
  • Imputation procedures such as fully efficient fractional imputation(FEFI) or multiple imputation(MI) can be used to construct complete contingency tables from samples with partially classified responses. Variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random, reveal that FEFI provides more efficient estimates of population than either multiple imputation(MI) based on data augmentation or complete case analysis, but neither FEFI nor MI provides an improvement over complete-case(CC) analysis with respect to accuracy of estimation of some parameters for association between two variables like $\theta_{i+}\theta_{+i}-\theta_{ij}$ and log odds-ratio.

  • PDF

Using Bayesian Estimation Technique to Analyze a Dichotomous Choice Contingent Valuation Data (베이지안 추정법을 이용한 양분선택형 조건부 가치측정모형의 분석)

  • Yoo, Seung-Hoon
    • Environmental and Resource Economics Review
    • /
    • v.11 no.1
    • /
    • pp.99-119
    • /
    • 2002
  • As an alternative to classical maximum likelihood approach for analyzing dichotomous choice contingent valuation (DCCV) data, this paper develops a Bayesian approach. By using the idea of Gibbs sampling and data augmentation, the approach enables one to perform exact inference for DCCV models. A by-product from the approach is welfare measure, such as the mean willingness to pay, and its confidence interval, which can be used for policy analysis. The efficacy of the approach relative to the classical approach is discussed in the context of empirical DCCV studies. It is concluded that there appears to be considerable scope for the use of the Bayesian analysis in dealing with DCCV data.

  • PDF

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

MLE for Incomplete Contingency Tables with Lagrangian Multiplier

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.3
    • /
    • pp.919-925
    • /
    • 2006
  • Maximum likelihood estimate(MLE) is obtained from the partial log-likelihood function for the cell probabilities of two way incomplete contingency tables proposed by Chen and Fienberg(1974). The partial log-likelihood function is modified by adding lagrangian multiplier that constraints can be incorporated with. Variances of MLE estimators of population proportions are derived from the matrix of second derivatives of the loglikelihood with respect to cell probabilities. Simulation results, when data are missing at random, reveal that Complete-case(CC) analysis produces biased estimates of joint probabilities under MAR and less efficient than either MLE or MI. MLE and MI provides consistent results under either the MAR situation. MLE provides more efficient estimates of population proportions than either multiple imputation(MI) based on data augmentation or complete case analysis. The standard errors of MLE from the proposed method using lagrangian multiplier are valid and have less variation than the standard errors from MI and CC.

  • PDF

Comparison of CNN Structures for Detection of Surface Defects (표면 결함 검출을 위한 CNN 구조의 비교)

  • Choi, Hakyoung;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.7
    • /
    • pp.1100-1104
    • /
    • 2017
  • A detector-based approach shows the limited performances for the defect inspections such as shallow fine cracks and indistinguishable defects from background. Deep learning technique is widely used for object recognition and it's applications to detect defects have been gradually attempted. Deep learning requires huge scale of learning data, but acquisition of data can be limited in some industrial application. The possibility of applying CNN which is one of the deep learning approaches for surface defect inspection is investigated for industrial parts whose detection difficulty is challenging and learning data is not sufficient. VOV is adopted for pre-processing and to obtain a resonable number of ROIs for a data augmentation. Then CNN method is applied for the classification. Three CNN networks, AlexNet, VGGNet, and mofified VGGNet are compared for experiments of defects detection.

An Improved Deep Learning Method for Animal Images (동물 이미지를 위한 향상된 딥러닝 학습)

  • Wang, Guangxing;Shin, Seong-Yoon;Shin, Kwang-Weong;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.123-124
    • /
    • 2019
  • This paper proposes an improved deep learning method based on small data sets for animal image classification. Firstly, we use a CNN to build a training model for small data sets, and use data augmentation to expand the data samples of the training set. Secondly, using the pre-trained network on large-scale datasets, such as VGG16, the bottleneck features in the small dataset are extracted and to be stored in two NumPy files as new training datasets and test datasets. Finally, training a fully connected network with the new datasets. In this paper, we use Kaggle famous Dogs vs Cats dataset as the experimental dataset, which is a two-category classification dataset.

  • PDF

Predicting Blood Glucose Data and Ensuring Data Integrity Based on Artificial Intelligence (인공지능 기반 혈당 데이터 예측 및 데이터 무결성 보장 연구)

  • Lee, Tae Kang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.201-203
    • /
    • 2022
  • Over the past five years, the number of patients treated for diabetes has increased by 27.7% to 3.22 million, and since blood sugar is still checked through finger blood collection, continuous blood glucose measurement and blood sugar peak confirmation are difficult and painful. To solve this problem, based on blood sugar data measured for 14 days, three months of blood sugar prediction data are provided to diabetics using artificial intelligence technology.

  • PDF

Development of MATLAB GUI Based Software for Analysis of KASS Availability Performance (KASS 가용성 성능 평가를 위한 MATLAB GUI 기반 소프트웨어 설계)

  • Choi, Bong-kwan;Han, Deok-hwa;Kim, Dong-uk;Kim, Jung-beom;Kee, Chang-don
    • Journal of Advanced Navigation Technology
    • /
    • v.22 no.5
    • /
    • pp.384-390
    • /
    • 2018
  • This paper introduces a MATLAB graphical user interface (GUI) based software for analysis of korea augmentation satellite system (KASS) availability performance. This software uses minimum variance (MV) estimator and Kriging algorithm to generate integrity information such as user differential range error (UDRE) and grid ionospheric vertical error (GIVE). The information is offered to ground and aviation users in Korean region. The software also gives accuracy data, protection level data and availability map about each user position by using the integrity information. In particular the software calculates the protection level along a path of aircraft. We verified the result of protection level of aviation user by comparing them with the results of SBASimulator#2, which is a simulation tool of european geostationary navigation overlay service (EGNOS). As a result, the protection level error between the result of our software and the SBASimulator#2 was about 2% which means that the result of our software is accurate.