• Title/Summary/Keyword: 다중 결측

Search Result 43, Processing Time 0.025 seconds

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

  • Kim, Jin-Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.635-647
    • /
    • 2013
  • Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.

A Case Study of Land-cover Classification Based on Multi-resolution Data Fusion of MODIS and Landsat Satellite Images (MODIS 및 Landsat 위성영상의 다중 해상도 자료 융합 기반 토지 피복 분류의 사례 연구)

  • Kim, Yeseul
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1035-1046
    • /
    • 2022
  • This study evaluated the applicability of multi-resolution data fusion for land-cover classification. In the applicability evaluation, a spatial time-series geostatistical deconvolution/fusion model (STGDFM) was applied as a multi-resolution data fusion model. The study area was selected as some agricultural lands in Iowa State, United States. As input data for multi-resolution data fusion, Moderate Resolution Imaging Spectroradiometer (MODIS) and Landsat satellite images were used considering the landscape of study area. Based on this, synthetic Landsat images were generated at the missing date of Landsat images by applying STGDFM. Then, land-cover classification was performed using both the acquired Landsat images and the STGDFM fusion results as input data. In particular, to evaluate the applicability of multi-resolution data fusion, two classification results using only Landsat images and using both Landsat images and fusion results were compared and evaluated. As a result, in the classification result using only Landsat images, the mixed patterns were prominent in the corn and soybean cultivation areas, which are the main land-cover type in study area. In addition, the mixed patterns between land-cover types of vegetation such as hay and grain areas and grass areas were presented to be large. On the other hand, in the classification result using both Landsat images and fusion results, these mixed patterns between land-cover types of vegetation as well as corn and soybean were greatly alleviated. Due to this, the classification accuracy was improved by about 20%p in the classification result using both Landsat images and fusion results. It was considered that the missing of the Landsat images could be compensated for by reflecting the time-series spectral information of the MODIS images in the fusion results through STGDFM. This study confirmed that multi-resolution data fusion can be effectively applied to land-cover classification.

A longitudinal data analysis for child academic achievement with Korea welfare panel study data (경시적 자료를 이용한 아동 학업성취도 분석)

  • Lee, Naeun;Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.1
    • /
    • pp.1-10
    • /
    • 2017
  • Longitudinal data of Korean child academic achievement have been used to find the significant exploratory variables under the assumption of independent repeated measured data. Using the exploratory variables in previous research works, we analyze the linear mixed model incorporating the fixed and random effects for child academic achievement to detect the significant exploratory variables. Korea welfare panel study data observed three times between 2006 and 2012 by additional survey for children. The child academic achievement is evaluated by the sum of academic achievements of Korean, English and Mathematics. We also investigate the multicollinearity and the missing mechanism and select some popular correlation matrices to analyze the linear mixed model.

Analysis of the cause-specific proportional hazards model with missing covariates (누락된 공변량을 가진 원인별 비례위험모형의 분석)

  • Minjung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.225-237
    • /
    • 2024
  • In the analysis of competing risks data, some of covariates may not be fully observed for some subjects. In such cases, excluding subjects with missing covariate values from the analysis may result in biased estimates and loss of efficiency. In this paper, we studied multiple imputation and the augmented inverse probability weighting method for regression parameter estimation in the cause-specific proportional hazards model with missing covariates. The performance of estimators obtained from multiple imputation and the augmented inverse probability weighting method is evaluated by simulation studies, which show that those methods perform well. Multiple imputation and the augmented inverse probability weighting method were applied to investigate significant risk factors for the risk of death from breast cancer and from other causes for breast cancer data with missing values for tumor size obtained from the Prostate, Lung, Colorectal, and Ovarian Cancer Screen Trial Study. Under the cause-specific proportional hazards model, the methods show that race, marital status, stage, grade, and tumor size are significant risk factors for breast cancer mortality, and stage has the greatest effect on increasing the risk of breast cancer death. Age at diagnosis and tumor size have significant effects on increasing the risk of other-cause death.

Estimation of regional Low-flow Indices Applicable to Unmetered Areas Using Machine Learning Technique (머신러닝 기법을 이용한 미계측지역에 적용가능한 지역화 Low-flow indices 산정)

  • Jeung, Se Jin;Kang, Dong Ho;Kim, Byung Sik
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.39-39
    • /
    • 2020
  • Low-flow 하천에서의 최저수위를 나타내는 지표이다. 일반적으로 유황곡선의 갈수량(Q355)를 대표적으로 사용한다. Low-flow는 물 공급 관리 및 계획, 관개용수, 생태계등 다양한 분야에 영향을 미친다. 이러한 Low-flow를 산정하기 위해서는 충분한 기간의 유량자료가 필요하다. 하지만 국토의 70%가 산지지형으로 구성되어 있는 우리나라의 경우 국가하천과 1급하천을 제외한 산지유역은 수위관측소가 부재하거나 결측으로 인해 자료가 충분하지 않아 Low-flow분석에 한계가 있다. 이에 과거에는 미계측지역의 갈수량을 예측하기 위해서 다중회귀분석, ARIMA 모형 등 다양한 기법을 사용하였지만, 최근들어 머신러닝 모형의 수요가 증가하고 있다. 이에 본 연구에서는 새로운 패러다임에 맞는 머신러닝 기법인 DNN기법을 사용하고자 한다. DNN기법은 ANN기법의 단점인 학습과정에서 최적 매개변수값을 찾기 어렵고, 학습시간이 느린 단점을 보완한 방법이다. 따라서 본연구에서는 머신러닝 기법인 DNN기법을 통해 미계측지역에 적용 가능한 지역화 Low-flow indices를 산정하고자 한다. 먼저, Low-flow에 영향을 미치는 인자들을 수집하고 인자들간의 상관분석, 다중공선성 분석을 통해 통계적으로 유의한 변수를 선정하여, 머신러닝 모형에 입력자료를 구축하였다. 또한 기존의 갈수량 예측기법인 다중회귀분석 결과와 비교하여 머신러닝 기법의 효용성을 검토하였다.

  • PDF

An Estimation of Link Travel Time by Using BMS Data (BMS 데이터를 활용한 링크단위 여행시간 산출방안에 관한 연구)

  • Jeon, Ok-Hee;Ahn, Gye-Hyeong;Hyun, Cheol-Seung;Hong, Kyung-Sik;Kim, Hyun-Ju;Lee, Choul-Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.13 no.3
    • /
    • pp.78-88
    • /
    • 2014
  • Now, UTIS collects and provides traffic information by building RSE 1,150(unit) and OBE about 51,000(vehicle). it's inevitable to enlarge traffic information sources which use to improve quality of UTIS traffic information for Stabilizing UTIS's service. but there are missing data sections. And, In this study as a way to overcome these problems, based on BIS(Bus information system) installed and operating in the capital area to develop normal vehicle's link transit time estimation model which is used realtime collecting BMS data, we'll utilize the model to provide missing data section's information. For these problem, we selected partial section of suwon-city, anyang-city followed by drive only way or not and conducted model estimating and verification each of BMS data and UTIS traffic information. Consequently, Case2,4,6,8 presented highly credibility between UTIS communication data and estimated value but In the Case 3,5 we determined to replace communication data of UTIS' missing data section too hard for large error. So we need to apply high credibility model formula adjusting road managing condition and the situation of object section.

A Study on the Development of a Technique to Predict Missing Travel Speed Collected by Taxi Probe (결측 택시 Probe 통행속도 예측기법 개발에 관한 연구)

  • Yoon, Byoung Jo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.31 no.1D
    • /
    • pp.43-50
    • /
    • 2011
  • The monitoring system for link travel speed using taxi probe is one of key sub-systems of ITS. Link travel speed collected by taxi probe has been widely employed for both monitoring the traffic states of urban road network and providing real-time travel time information. When sample size of taxi probe is small and link travel time is longer than a length of time interval to collect travel speed data, and in turn the missing state is inevitable. Under this missing state, link travel speed data is real-timely not collected. This missing state changes from single to multiple time intervals. Existing single interval prediction techniques can not generate multiple future states. For this reason, it is necessary to replace multiple missing states with the estimations generated by multi-interval prediction method. In this study, a multi-interval prediction method to generate the speed estimations of single and multiple future time step is introduced overcoming the shortcomings of short-term techniques. The model is developed based on Non-Parametric Regression (NPR), and outperformed single-interval prediction methods in terms of prediction accuracy in spite of multi-interval prediction scheme.

A Study on Estimation of Lowflow Ungauged Basin Using Multiple Regression Analysis (다중회귀분석을 이용한 미계측 유역의 갈수유량 산정에 관한 연구)

  • Lim, Ga Kyun;Jeung, Se Jin;Kim, Byung Sik
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.133-133
    • /
    • 2020
  • 갈수량이란 1년 중 355일은 유지되는 유량을 말하며 물 공급 계획 및 관리, 저수지 설계, 관개용수의 수량과 수질 관리, 생태계 보존 등에 있어서 갈수량의 크기와 빈도를 파악하는 것은 매우 중요한 과정이다. 갈수량 산정을 위해서는 오랜 기간의 관측 일유량 자료가 필요하지만 우리나라의 경우 관측 유량 자료의 결측자료가 많아 갈수량 산정에 필요한 장기간의 자료가 부족하다. 따라서 본 연구에서는 전국 40개 중권역 유역을 대상으로 갈수 빈도별 갈수량 산정 회귀식 개발을 수행하였다. 갈수량 산정에 적용할 수 있는 18개의 유역인자와 4개의 수문 인자를 상관분석을 통해 다중공선성을 고려하였으며 상관분석 결과를 토대로 미계측 유역에 적용 가능한 인자를 선정하였다. 갈수 빈도 분석과 단계적 회귀분석을 통하여 미계측 유역에 적용할 수 있는 갈수 빈도별 갈수량 산정 회귀식을 개발하였다. 또한 계측 유역을 미계측 유역으로 가정하여 개발된 갈수량 산정 회귀식을 이용하여 갈수량을 산정하고 분석 결과와 실제 갈수량을 비교하여 개발된 회귀식의 적정성을 검토하였다.

  • PDF

Prediction of the daily-flow duration curve and streamflow using the regional flow duration curve creation technique (지역화 유황곡선을 작성기법을 이용한 유역의 일유황곡선 및 유량 예측)

  • Choo, Kyung Su;Jeung, Se Jin;Kim, Byung Sik
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.132-132
    • /
    • 2020
  • 유황곡선은 하천유량의 변동성을 함축적으로 나타내고 연간유량 분석방법(calendar-year method)과 전 자료기간유량분석방법(total-period method)을 이용하여 작성하고 분석할 수 있다. 본 연구는 유황곡선 상에서 유역특성인자들을 포함시켜 작성하는 방법을 제시하였고 지형 및 기상학적 인자를 통해 지역화 시킨 유황곡선을 통해 미계측 유역의 유황곡선을 추정할 수 있는 곡선을 개발하고자 한다. 이를 위해 유역의 특성인자자료를 수집하여 독립변수로 설정하였고 다중회귀분석을 실시하여 변수들을 지역화 시켰다. 지역화 시킨 변수들을 유황곡선에 반영하여 대상지역에서 하나의 유황곡선으로 나타내었다. 도출한 유황곡선을 자료가 있는 지역을 미계측유역이라 가정하고 검증하였다. 검증결과 실제자료와 유사하게 나타나는 것을 확인할 수 있었고 이를 통해 미계측 유역의 유출량 자료가 부족한 유역에 대한 예측과 과거 많은 부분이 결측된 유역에 대한 유출량 예측도 가능할 것이라 판단된다. 또한 강우시나리오를 통해 지형인자가 고려된 유황곡선을 이용한 다양한 자료분석을 실시할 수 있을 것이라 판단된다.

  • PDF

Using multi-sensor for Development of Multiple Occupants' Activities Classification Model Based on LSTM (다중센서를 활용한 LSTM 기반 재실자 행동 분류 모델 개발)

  • Jin Su Park;Chul Seung Yang;Kyung-Ho Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.1065-1071
    • /
    • 2023
  • In this paper discuss with research developing an LSTM model for classifying the behavior of occupants within a residence. The multi-sensor consists of an IAQ (Indoor Air Quality) sensor that measures indoor air quality, a UWB radar that tracks occupancy detection and location, and a Piezo sensor to measure occupants' biometric information, and collects occupant behavior data such as going out, staying, cooking, cleaning, exercise, and sleep by constructed an experimental environment similar to the actual residential environment. After the data with removed outliers and missing, the LSTM model is used to calculate accuracy, sensitivity, specificity of the occupant behavior classification model, T1 score.