• Title/Summary/Keyword: Statistical prediction procedure

검색결과 77건 처리시간 0.029초

Interval prediction on the sum of binary random variables indexed by a graph

  • Park, Seongoh;Hahn, Kyu S.;Lim, Johan;Son, Won
    • Communications for Statistical Applications and Methods
    • /
    • 제26권3호
    • /
    • pp.261-272
    • /
    • 2019
  • In this paper, we propose a procedure to build a prediction interval of the sum of dependent binary random variables over a graph to account for the dependence among binary variables. Our main interest is to find a prediction interval of the weighted sum of dependent binary random variables indexed by a graph. This problem is motivated by the prediction problem of various elections including Korean National Assembly and US presidential election. Traditional and popular approaches to construct the prediction interval of the seats won by major parties are normal approximation by the CLT and Monte Carlo method by generating many independent Bernoulli random variables assuming that those binary random variables are independent and the success probabilities are known constants. However, in practice, the survey results (also the exit polls) on the election are random and hardly independent to each other. They are more often spatially correlated random variables. To take this into account, we suggest a spatial auto-regressive (AR) model for the surveyed success probabilities, and propose a residual based bootstrap procedure to construct the prediction interval of the sum of the binary outcomes. Finally, we apply the procedure to building the prediction intervals of the number of legislative seats won by each party from the exit poll data in the $19^{th}$ and $20^{th}$ Korea National Assembly elections.

최상부분집합이 고려된 능형회귀를 적용한 현장관입지수에 대한 통계적 예측기법 개발 및 적용 (Development and implementation of statistical prediction procedure for field penetration index using ridge regression with best subset selection)

  • 이항로;송기일;김경열
    • 한국터널지하공간학회 논문집
    • /
    • 제19권6호
    • /
    • pp.857-870
    • /
    • 2017
  • 사회기반시설의 지중화로 인하여 쉴드 TBM 적용이 점차 확대되고 있는 추세다. 합리적인 공기기간 및 공사비 산정을 위해 쉴드 TBM의 실굴진율을 정확하게 예측하는 것은 매우 중요한 사안이라 할 수 있다. 이러한 이유로 국내에서는 지반의 물성을 합리적으로 반영한 쉴드 TBM의 실굴진율 예측모델이 필요한 상황이다. 본 연구는 쉴드 TBM의 순굴진율 산정을 위해 현장 데이터베이스를 기반으로 현장관입지수의 통계적 예측절차를 모듈화 하였다. 출력인자로 현장관입지수를 선정하였고, 비정상치 제거 및 전처리 그리고 최상 부분집합선택이 고려된 능형회귀를 적용한 예측시스템을 모듈에 포함하였다. 또한 현장 굴진 데이터를 활용하여 예측모델의 적용성을 확인하였다.

A Hilbert-Huang Transform Approach Combined with PCA for Predicting a Time Series

  • Park, Min-Jeong
    • 응용통계연구
    • /
    • 제24권6호
    • /
    • pp.995-1006
    • /
    • 2011
  • A time series can be decomposed into simple components with a multiscale method. Empirical mode decomposition(EMD) is a recently invented multiscale method in Huang et al. (1998). It is natural to apply a classical prediction method such a vector autoregressive(AR) model to the obtained simple components instead of the original time series; in addition, a prediction procedure combining a classical prediction model to EMD and Hilbert spectrum is proposed in Kim et al. (2008). In this paper, we suggest to adopt principal component analysis(PCA) to the prediction procedure that enables the efficient selection of input variables among obtained components by EMD. We discuss the utility of adopting PCA in the prediction procedure based on EMD and Hilbert spectrum and analyze the daily worm account data by the proposed PCA adopted prediction method.

Learning fair prediction models with an imputed sensitive variable: Empirical studies

  • Kim, Yongdai;Jeong, Hwichang
    • Communications for Statistical Applications and Methods
    • /
    • 제29권2호
    • /
    • pp.251-261
    • /
    • 2022
  • As AI has a wide range of influence on human social life, issues of transparency and ethics of AI are emerging. In particular, it is widely known that due to the existence of historical bias in data against ethics or regulatory frameworks for fairness, trained AI models based on such biased data could also impose bias or unfairness against a certain sensitive group (e.g., non-white, women). Demographic disparities due to AI, which refer to socially unacceptable bias that an AI model favors certain groups (e.g., white, men) over other groups (e.g., black, women), have been observed frequently in many applications of AI and many studies have been done recently to develop AI algorithms which remove or alleviate such demographic disparities in trained AI models. In this paper, we consider a problem of using the information in the sensitive variable for fair prediction when using the sensitive variable as a part of input variables is prohibitive by laws or regulations to avoid unfairness. As a way of reflecting the information in the sensitive variable to prediction, we consider a two-stage procedure. First, the sensitive variable is fully included in the learning phase to have a prediction model depending on the sensitive variable, and then an imputed sensitive variable is used in the prediction phase. The aim of this paper is to evaluate this procedure by analyzing several benchmark datasets. We illustrate that using an imputed sensitive variable is helpful to improve prediction accuracies without hampering the degree of fairness much.

An Additive Sparse Penalty for Variable Selection in High-Dimensional Linear Regression Model

  • Lee, Sangin
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.147-157
    • /
    • 2015
  • We consider a sparse high-dimensional linear regression model. Penalized methods using LASSO or non-convex penalties have been widely used for variable selection and estimation in high-dimensional regression models. In penalized regression, the selection and prediction performances depend on which penalty function is used. For example, it is known that LASSO has a good prediction performance but tends to select more variables than necessary. In this paper, we propose an additive sparse penalty for variable selection using a combination of LASSO and minimax concave penalties (MCP). The proposed penalty is designed for good properties of both LASSO and MCP.We develop an efficient algorithm to compute the proposed estimator by combining a concave convex procedure and coordinate descent algorithm. Numerical studies show that the proposed method has better selection and prediction performances compared to other penalized methods.

大氣汚染濃度에 관한 動的確率모델 (A Dynamic-Stochastic Model for Air Pollutant Concentration)

  • 김해경
    • 한국대기환경학회지
    • /
    • 제7권3호
    • /
    • pp.156-168
    • /
    • 1991
  • The purpose of this paper is to develop a stochastic model for daily sulphur dioxide $(SO_2)$ concentrations prediction in urban area (Seoul). For this, the influence of the meteorological parameters on the $SO_2$ concentrations is investigated by a statistical analysis of the 24-hr averaged $SO_2$ levels of Seoul area during 1989 $\sim$ 1990. The annual fluctuations of the regression trend, periodicity and dependence of the daily concentration are also analyzed. Based on these, a nonlinear regression transfer function model for the prediction of daily $SO_2$ concentrations is derived. A statistical procedure for using the model to predict the concentration level is also proposed.

  • PDF

Adaptive Regression by Mixing for Fixed Design

  • Oh, Jong-Chul;Lu, Yun;Yang, Yuhong
    • Communications for Statistical Applications and Methods
    • /
    • 제12권3호
    • /
    • pp.713-727
    • /
    • 2005
  • Among different regression approaches, nonparametric procedures perform well under different conditions. In practice it is very hard to identify which is the best procedure for the data at hand, thus model combination is of practical importance. In this paper, we focus on one dimensional regression with fixed design. Polynomial regression, local regression, and smoothing spline are considered. The data are split into two parts, one part is used for estimation and the other part is used for prediction. Prediction performances are used to assign weights to different regression procedures. Simulation results show that the combined estimator performs better or similarly compared with the estimator chosen by cross validation. The combined estimator generates a similar risk to the best candidate procedure for the data.

A convenient approach for penalty parameter selection in robust lasso regression

  • Kim, Jongyoung;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.651-662
    • /
    • 2017
  • We propose an alternative procedure to select penalty parameter in $L_1$ penalized robust regression. This procedure is based on marginalization of prior distribution over the penalty parameter. Thus, resulting objective function does not include the penalty parameter due to marginalizing it out. In addition, its estimating algorithm automatically chooses a penalty parameter using the previous estimate of regression coefficients. The proposed approach bypasses cross validation as well as saves computing time. Variable-wise penalization also performs best in prediction and variable selection perspectives. Numerical studies using simulation data demonstrate the performance of our proposals. The proposed methods are applied to Boston housing data. Through simulation study and real data application we demonstrate that our proposals are competitive to or much better than cross-validation in prediction, variable selection, and computing time perspectives.

Optimized Chinese Pronunciation Prediction by Component-Based Statistical Machine Translation

  • Zhu, Shunle
    • Journal of Information Processing Systems
    • /
    • 제17권1호
    • /
    • pp.203-212
    • /
    • 2021
  • To eliminate ambiguities in the existing methods to simplify Chinese pronunciation learning, we propose a model that can predict the pronunciation of Chinese characters automatically. The proposed model relies on a statistical machine translation (SMT) framework. In particular, we consider the components of Chinese characters as the basic unit and consider the pronunciation prediction as a machine translation procedure (the component sequence as a source sentence, the pronunciation, pinyin, as a target sentence). In addition to traditional features such as the bidirectional word translation and the n-gram language model, we also implement a component similarity feature to overcome some typos during practical use. We incorporate these features into a log-linear model. The experimental results show that our approach significantly outperforms other baseline models.

SEA에 의한 실선소음 예측 정도에 관한 고찰 (On the Accuracy of Shipboard Noise Prediction Using SEA)

  • 김재승;강현주;김현실;김상렬
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2000년도 춘계학술대회논문집
    • /
    • pp.849-854
    • /
    • 2000
  • Statistical energy analysis is suitable for shipboard noise prediction in many respects. It could effectively model the large and complicated ship structures for noise analysis. This paper introduces the procedure of SEA for shipboard noise analysis gained from author's experiences in the past few years. Also, prediction accuracies of shipboard noise analysis using statistical energy analysis are discussed. It is found that the prediction results could be much improved when using the actual measured data of source levels and material properties such as loss factors, absorption coefficients and etc.

  • PDF