• Title/Summary/Keyword: 다중 결측

Search Result 43, Processing Time 0.033 seconds

Regression models for interval-censored semi-competing risks data with missing intermediate transition status (중간 사건이 결측되었거나 구간 중도절단된 준 경쟁 위험 자료에 대한 회귀모형)

  • Kim, Jinheum;Kim, Jayoun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1311-1327
    • /
    • 2016
  • We propose a multi-state model for analyzing semi-competing risks data with interval-censored or missing intermediate events. This model is an extension of the 'illness-death model', which composes three states, such as 'healthy', 'diseased', and 'dead'. The state of 'diseased' can be considered as an intermediate event. Two more states are added into the illness-death model to describe missing events caused by a loss of follow-up before the end of the study. One of them is a state of 'LTF', representing a lost-to-follow-up, and the other is an unobservable state that represents the intermediate event experienced after LTF occurred. Given covariates, we employ the Cox proportional hazards model with a normal frailty and construct a full likelihood to estimate transition intensities between states in the multi-state model. Marginalization of the full likelihood is completed using the adaptive Gaussian quadrature, and the optimal solution of the regression parameters is achieved through the iterative Newton-Raphson algorithm. Simulation studies are carried out to investigate the finite-sample performance of the proposed estimation procedure in terms of the empirical coverage probability of the true regression parameter. Our proposed method is also illustrated with the dataset adapted from Helmer et al. (2001).

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

Methods for Handling Incomplete Repeated Measures Data (불완전한 반복측정 자료의 보정방법)

  • Woo, Hae-Bong;Yoon, In-Jin
    • Survey Research
    • /
    • v.9 no.2
    • /
    • pp.1-27
    • /
    • 2008
  • Problems of incomplete data are pervasive in statistical analysis. In particular, incomplete data have been an important challenge in repeated measures studies. The objective of this study is to give a brief introduction to missing data mechanisms and conventional/recent missing data methods and to assess the performance of various missing data methods under ignorable and non-ignorable missingness mechanisms. Given the inadequate attention to longitudinal studies with missing data, this study applied recent advances in missing data methods to repeated measures models and investigated the performance of various missing data methods, such as FIML (Full Information Maximum Likelihood Estimation) and MICE(Multivariate Imputation by Chained Equations), under MCAR, MAR, and MNAR mechanisms. Overall, the results showed that listwise deletion and mean imputation performed poorly compared to other recommended missing data procedures. The better performance of EM, FIML, and MICE was more noticeable under MAR compared to MCAR. With the non-ignorable missing data, this study showed that missing data methods did not perform well. In particular, this problem was noticeable in slope-related estimates. Therefore, this study suggests that if missing data are suspected to be non-ignorable, developmental research may underestimate true rates of change over the life course. This study also suggests that bias from non-ignorable missing data can be substantially reduced by considering rich information from variables related to missingness.

  • PDF

Estimation of radial spectrum for rainfall (호우의 환상스펙트럼 추정)

  • Lee, Jae-Hyeong;Lee, Dong-Ju;Park, Yeong-Gi
    • Water for future
    • /
    • v.22 no.2
    • /
    • pp.201-211
    • /
    • 1989
  • Using the storm data which was augmented by the stochastic correlation with it's neighbors, the multiquadric equation of random surface of total storm depth is constructed. And to separate the local components from it's regionals and find the regional characteristics, a double Fourier analysis was applied to the total depths of storm data. The local components, storm residuals of each storm was assumed to be homogeneous random field and investigated with it's autocorrelation function. For the practical application, isotropic was assumed and that was identified with emprical data. Coefficients of normalized autocorrelation for all storms showed similar apperance. Using this emprical result, an example of the radial spectral distribution function which represints the spatial characteristics of rainfall over Han River Basin during 1975-1983 is presented.

  • PDF

STL-Attention based Traffic Prediction with Seasonality Embedding (계절성 임베딩을 고려한 STL-Attention 기반 트래픽 예측)

  • Yeom, Sungwoong;Choi, Chulwoong;Kolekar, Shivani Sanjay;Kim, Kyungbaek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.95-98
    • /
    • 2021
  • 최근 비정상적인 네트워크 활동 감지 및 네트워크 서비스 프로비저닝과 같은 다양한 분야에서 응용되는 네트워크 트래픽 예측 기술이 네트워크 통신 문제에 의한 트래픽의 결측 및 네트워크 유저의 불규칙한 활동에 의한 비선형 특성 때문에 발생하는 성능 저하를 극복하기 위해 딥러닝 신경망에 대한 연구가 활성화되고 있다. 이 딥러닝 신경망 중 시계열 딥러닝 신경망은 단기 네트워크 트래픽 볼륨을 예측할 때 낮은 오류율을 보인다. 하지만, 시계열 딥러닝 신경망은 기울기 소멸 및 폭발과 같은 비선형성, 다중 계절성 및 장기적 의존성 문제와 같은 한계를 보여준다. 이 논문에서는 계절성 임베딩을 고려한 주의 신경망 기반 트래픽 예측 기법을 제안한다. 제안하는 기법은 STL 분해 기법을 통해 분해된 트래픽 트랜드, 계절성, 잔차를 이용하여 일별 및 주별 계절성을 임베딩하고 이를 주의 신경망을 기반으로 향후 트래픽을 예측한다.

Study on Proportional Reasoning in Elementary School Mathematics (초등학교 수학 교과에서의 비례 추론에 대한 연구)

  • Jeong, Eun Sil
    • Journal of Educational Research in Mathematics
    • /
    • v.23 no.4
    • /
    • pp.505-516
    • /
    • 2013
  • The purpose of this paper is to analyse the essence of proportional reasoning and to analyse the contents of the textbooks according to the mathematics curriculum revised in 2007, and to seek the direction for developing the proportional reasoning in the elementary school mathematics focused the task variables. As a result of analysis, it is found out that proportional reasoning is one form of qualitative and quantitative reasoning which is related to ratio, rate, proportion and involves a sense of covariation, multiple comparison. Mathematics textbooks according to the mathematics curriculum revised in 2007 are mainly examined by the characteristics of the proportional reasoning. It is found out that some tasks related the proportional reasoning were decreased and deleted and were numerically and algorithmically approached. It should be recognized that mechanical methods, such as the cross-product algorithm, for solving proportions do not develop proportional reasoning and should be required to provide tasks in a wide range of context including visual models.

  • PDF

An Exploratory Study on the Determinants of Manpower Utilization of Container Terminals in the Busan Port and Gwangyang Port (컨테이너 터미널 인력운영 변화요인의 탐색적 연구 -부산항과 광양항을 중심으로-)

  • Kang, Hye-Won;Sim, Min-Seop;Kim, Yul-Seong
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.121-142
    • /
    • 2021
  • The purpose of this study is to identify environmental factors affecting manpower utilization of container terminals. 300 questionnaires were distributed to employees working in terminal operators in Busan and Gwangyang port. Based on the results of multiple regression analysis, first, it is proved that technology development, organizational culture and social·external environment have a significant impact on manpower utilization of container terminals. Second, this study found that technology development and social·external environment have a significant impact on manpower utilization of container terminals system. Third, it is found that technology development, social·external environment, safety and security change the composition of the workforce between office and field employees. Forth, the number of workforce is expected to gradually decrease due to technology development. Finally, it is found that technology development, organizational culture, social·external environment and terminal operation have a impact on specialization and training system of human resoures.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Bias-correction of near-real-time multi-satellite precipitation products using machine learning (머신러닝 기반 준실시간 다중 위성 강수 자료 보정)

  • Sungho Jung;Xuan-Hien Le;Van-Giang Nguyen;Giha Lee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.280-280
    • /
    • 2023
  • 강수의 정확한 시·공간적 추정은 홍수 대응, 가뭄 관리, 수자원 계획 등 수문학적 모델링의 핵심 기술이다. 우주 기술의 발전으로 전지구 강수량 측정 프로젝트(Global Precipitation Measurement, GPM)가 시작됨에 따라 위성의 여러 센서를 이용하여 다양한 고해상도 강수량 자료가 생산되고 있으며, 기후변화로 인한 수재해의 빈도가 증가함에 따라 준실시간(Near-Real-Time) 위성 강수 자료의 활용성 및 중요성이 높아지고 있다. 하지만 준실시간 위성 강수 자료의 경우 빠른 지연시간(latency) 확보를 위해 관측 이후 최소한의 보정을 거쳐 제공되므로 상대적으로 강수 추정치의 불확실성이 높다. 이에 따라 본 연구에서는 앙상블 머신러닝 기반 수집된 위성 강수 자료들을 관측 자료와 병합하여 보정된 준실시간 강수량 자료를 생성하고자 한다. 모형의 입력에는 시단위 3가지 준실시간 위성 강수 자료(GSMaP_NRT, IMERG_Early, PERSIANN_CCS)와 방재기상관측 (AWS)의 온도, 습도, 강수량 지점 자료를 활용하였다. 지점 강수 자료의 경우 결측치를 고려하여 475개 관측소를 선정하였으며, 공간성을 고려한 랜덤 샘플링으로 375개소(약 80%)는 훈련 자료, 나머지 100개소(약 20%)는 검증 자료로 분리하였다. 모형의 정량적 평가 지표로는 KGE, MAE, RMSE이 사용되었으며, 정성적 평가 지표로 강수 분할표에 따라 POD, SR, BS 그리고 CSI를 사용하였다. 머신러닝 모형은 개별 원시 위성 강수 자료 및 IDW 기법보다 높은 정확도로 강수량을 추정하였으며 공간적으로 안정적인 결과를 나타내었다. 다만, 최대 강수량에서는 다소 과소추정되므로 이는 강수와 관련된 입력 변수의 개수 업데이트로 해결할 수 있을 것으로 판단된다. 따라서 불확실성이 높은 개별 준실시간 위성 자료들을 관측 자료와 병합하여 보정된 최적 강수 자료를 생성하는 머신러닝 기법은 돌발성 수재해에 실시간으로 대응 가능하며 홍수 예보에 신뢰도 높은 정량적인 강수량 추정치를 제공할 수 있다.

  • PDF

Comparison of Feature Selection Methods Applied on Risk Prediction for Hypertension (고혈압 위험 예측에 적용된 특징 선택 방법의 비교)

  • Khongorzul, Dashdondov;Kim, Mi-Hye
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.107-114
    • /
    • 2022
  • In this paper, we have enhanced the risk prediction of hypertension using the feature selection method in the Korean National Health and Nutrition Examination Survey (KNHANES) database of the Korea Centers for Disease Control and Prevention. The study identified various risk factors correlated with chronic hypertension. The paper is divided into three parts. Initially, the data preprocessing step of removes missing values, and performed z-transformation. The following is the feature selection (FS) step that used a factor analysis (FA) based on the feature selection method in the dataset, and feature importance (FI) and multicollinearity analysis (MC) were compared based on FS. Finally, in the predictive analysis stage, it was applied to detect and predict the risk of hypertension. In this study, we compare the accuracy, f-score, area under the ROC curve (AUC), and mean standard error (MSE) for each model of classification. As a result of the test, the proposed MC-FA-RF model achieved the highest accuracy of 80.12%, MSE of 0.106, f-score of 83.49%, and AUC of 85.96%, respectively. These results demonstrate that the proposed MC-FA-RF method for hypertension risk predictions is outperformed other methods.