• Title/Summary/Keyword: Statistical prediction procedure

Search Result 77, Processing Time 0.022 seconds

Interval prediction on the sum of binary random variables indexed by a graph

  • Park, Seongoh;Hahn, Kyu S.;Lim, Johan;Son, Won
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.261-272
    • /
    • 2019
  • In this paper, we propose a procedure to build a prediction interval of the sum of dependent binary random variables over a graph to account for the dependence among binary variables. Our main interest is to find a prediction interval of the weighted sum of dependent binary random variables indexed by a graph. This problem is motivated by the prediction problem of various elections including Korean National Assembly and US presidential election. Traditional and popular approaches to construct the prediction interval of the seats won by major parties are normal approximation by the CLT and Monte Carlo method by generating many independent Bernoulli random variables assuming that those binary random variables are independent and the success probabilities are known constants. However, in practice, the survey results (also the exit polls) on the election are random and hardly independent to each other. They are more often spatially correlated random variables. To take this into account, we suggest a spatial auto-regressive (AR) model for the surveyed success probabilities, and propose a residual based bootstrap procedure to construct the prediction interval of the sum of the binary outcomes. Finally, we apply the procedure to building the prediction intervals of the number of legislative seats won by each party from the exit poll data in the $19^{th}$ and $20^{th}$ Korea National Assembly elections.

Development and implementation of statistical prediction procedure for field penetration index using ridge regression with best subset selection (최상부분집합이 고려된 능형회귀를 적용한 현장관입지수에 대한 통계적 예측기법 개발 및 적용)

  • Lee, Hang-Lo;Song, Ki-Il;Kim, Kyoung Yul
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.19 no.6
    • /
    • pp.857-870
    • /
    • 2017
  • The use of shield TBM is gradually increasing due to the urbanization of social infrastructures. Reliable estimation of advance rate is very important for accurate construction period and cost. For this purpose, it is required to develop the prediction model of advance rate that can consider the ground properties reasonably. Based on the database collected from field, statistical prediction procedure for field penetration index (FPI) was modularized in this study to calculate penetration rate of shield TBM. As output parameter, FPI was selected and various systems were included in this module such as, procedure of eliminating abnormal dataset, preprocessing of dataset and ridge regression with best subset selection. And it was finally validated by using field dataset.

A Hilbert-Huang Transform Approach Combined with PCA for Predicting a Time Series

  • Park, Min-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.995-1006
    • /
    • 2011
  • A time series can be decomposed into simple components with a multiscale method. Empirical mode decomposition(EMD) is a recently invented multiscale method in Huang et al. (1998). It is natural to apply a classical prediction method such a vector autoregressive(AR) model to the obtained simple components instead of the original time series; in addition, a prediction procedure combining a classical prediction model to EMD and Hilbert spectrum is proposed in Kim et al. (2008). In this paper, we suggest to adopt principal component analysis(PCA) to the prediction procedure that enables the efficient selection of input variables among obtained components by EMD. We discuss the utility of adopting PCA in the prediction procedure based on EMD and Hilbert spectrum and analyze the daily worm account data by the proposed PCA adopted prediction method.

Learning fair prediction models with an imputed sensitive variable: Empirical studies

  • Kim, Yongdai;Jeong, Hwichang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.251-261
    • /
    • 2022
  • As AI has a wide range of influence on human social life, issues of transparency and ethics of AI are emerging. In particular, it is widely known that due to the existence of historical bias in data against ethics or regulatory frameworks for fairness, trained AI models based on such biased data could also impose bias or unfairness against a certain sensitive group (e.g., non-white, women). Demographic disparities due to AI, which refer to socially unacceptable bias that an AI model favors certain groups (e.g., white, men) over other groups (e.g., black, women), have been observed frequently in many applications of AI and many studies have been done recently to develop AI algorithms which remove or alleviate such demographic disparities in trained AI models. In this paper, we consider a problem of using the information in the sensitive variable for fair prediction when using the sensitive variable as a part of input variables is prohibitive by laws or regulations to avoid unfairness. As a way of reflecting the information in the sensitive variable to prediction, we consider a two-stage procedure. First, the sensitive variable is fully included in the learning phase to have a prediction model depending on the sensitive variable, and then an imputed sensitive variable is used in the prediction phase. The aim of this paper is to evaluate this procedure by analyzing several benchmark datasets. We illustrate that using an imputed sensitive variable is helpful to improve prediction accuracies without hampering the degree of fairness much.

An Additive Sparse Penalty for Variable Selection in High-Dimensional Linear Regression Model

  • Lee, Sangin
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.147-157
    • /
    • 2015
  • We consider a sparse high-dimensional linear regression model. Penalized methods using LASSO or non-convex penalties have been widely used for variable selection and estimation in high-dimensional regression models. In penalized regression, the selection and prediction performances depend on which penalty function is used. For example, it is known that LASSO has a good prediction performance but tends to select more variables than necessary. In this paper, we propose an additive sparse penalty for variable selection using a combination of LASSO and minimax concave penalties (MCP). The proposed penalty is designed for good properties of both LASSO and MCP.We develop an efficient algorithm to compute the proposed estimator by combining a concave convex procedure and coordinate descent algorithm. Numerical studies show that the proposed method has better selection and prediction performances compared to other penalized methods.

A Dynamic-Stochastic Model for Air Pollutant Concentration (大氣汚染濃度에 관한 動的確率모델)

  • 김해경
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.7 no.3
    • /
    • pp.156-168
    • /
    • 1991
  • The purpose of this paper is to develop a stochastic model for daily sulphur dioxide $(SO_2)$ concentrations prediction in urban area (Seoul). For this, the influence of the meteorological parameters on the $SO_2$ concentrations is investigated by a statistical analysis of the 24-hr averaged $SO_2$ levels of Seoul area during 1989 $\sim$ 1990. The annual fluctuations of the regression trend, periodicity and dependence of the daily concentration are also analyzed. Based on these, a nonlinear regression transfer function model for the prediction of daily $SO_2$ concentrations is derived. A statistical procedure for using the model to predict the concentration level is also proposed.

  • PDF

Adaptive Regression by Mixing for Fixed Design

  • Oh, Jong-Chul;Lu, Yun;Yang, Yuhong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.713-727
    • /
    • 2005
  • Among different regression approaches, nonparametric procedures perform well under different conditions. In practice it is very hard to identify which is the best procedure for the data at hand, thus model combination is of practical importance. In this paper, we focus on one dimensional regression with fixed design. Polynomial regression, local regression, and smoothing spline are considered. The data are split into two parts, one part is used for estimation and the other part is used for prediction. Prediction performances are used to assign weights to different regression procedures. Simulation results show that the combined estimator performs better or similarly compared with the estimator chosen by cross validation. The combined estimator generates a similar risk to the best candidate procedure for the data.

A convenient approach for penalty parameter selection in robust lasso regression

  • Kim, Jongyoung;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.651-662
    • /
    • 2017
  • We propose an alternative procedure to select penalty parameter in $L_1$ penalized robust regression. This procedure is based on marginalization of prior distribution over the penalty parameter. Thus, resulting objective function does not include the penalty parameter due to marginalizing it out. In addition, its estimating algorithm automatically chooses a penalty parameter using the previous estimate of regression coefficients. The proposed approach bypasses cross validation as well as saves computing time. Variable-wise penalization also performs best in prediction and variable selection perspectives. Numerical studies using simulation data demonstrate the performance of our proposals. The proposed methods are applied to Boston housing data. Through simulation study and real data application we demonstrate that our proposals are competitive to or much better than cross-validation in prediction, variable selection, and computing time perspectives.

Optimized Chinese Pronunciation Prediction by Component-Based Statistical Machine Translation

  • Zhu, Shunle
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.203-212
    • /
    • 2021
  • To eliminate ambiguities in the existing methods to simplify Chinese pronunciation learning, we propose a model that can predict the pronunciation of Chinese characters automatically. The proposed model relies on a statistical machine translation (SMT) framework. In particular, we consider the components of Chinese characters as the basic unit and consider the pronunciation prediction as a machine translation procedure (the component sequence as a source sentence, the pronunciation, pinyin, as a target sentence). In addition to traditional features such as the bidirectional word translation and the n-gram language model, we also implement a component similarity feature to overcome some typos during practical use. We incorporate these features into a log-linear model. The experimental results show that our approach significantly outperforms other baseline models.

On the Accuracy of Shipboard Noise Prediction Using SEA (SEA에 의한 실선소음 예측 정도에 관한 고찰)

  • Kim, Jae-Seung;Kang, Hyun-Ju;Kim, Hyun-Sil;Kim, Sang-Ryul
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2000.06a
    • /
    • pp.849-854
    • /
    • 2000
  • Statistical energy analysis is suitable for shipboard noise prediction in many respects. It could effectively model the large and complicated ship structures for noise analysis. This paper introduces the procedure of SEA for shipboard noise analysis gained from author's experiences in the past few years. Also, prediction accuracies of shipboard noise analysis using statistical energy analysis are discussed. It is found that the prediction results could be much improved when using the actual measured data of source levels and material properties such as loss factors, absorption coefficients and etc.

  • PDF