• Title/Summary/Keyword: Robust Estimation

Search Result 1,078, Processing Time 0.025 seconds

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

Estimating the Demand Function for Industrial Natural Gas Use in Korea : A Cross-sectional Analysis (횡단면 분석을 활용한 한국 산업용 도시가스 수요함수 추정)

  • Lee, Bok-Hee;Lee, Hye-Jeong;Yoo, Seung-Hoon;Huh, Sung-Yoon
    • Journal of the Korean Institute of Gas
    • /
    • v.24 no.6
    • /
    • pp.34-46
    • /
    • 2020
  • In order to supply stable natural gas in the future, it is necessary to forecast the demand in advance and secure the quantity of supply. In this paper, we propose a method of estimating the demand function of industrial natural gas, which is the core of the increase of domestic natural gas demand in the future. The cross-sectional data of 304 domestic industries were used to estimate the demand function of the industrial natural gas, and the effect of industry specific characteristics such as capital investment, manufacturing cost. Finally, the least absolute deviation estimation method which is robust to outliers and does not assume the homogeneity of the error term and the normality, And the results were derived. In addition, the economic value of industrial city gas was estimated using the price elasticity of industrial city gas. Therefore, it can be seen that the continuous expansion and supply of city gas to the industrial sector is beneficial at the national level, and the government needs to promote expansion through the industrial city gas support policy.

Estimation of Fractional Urban Tree Canopy Cover through Machine Learning Using Optical Satellite Images (기계학습을 이용한 광학 위성 영상 기반의 도시 내 수목 피복률 추정)

  • Sejeong Bae ;Bokyung Son ;Taejun Sung ;Yeonsu Lee ;Jungho Im ;Yoojin Kang
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.1009-1029
    • /
    • 2023
  • Urban trees play a vital role in urban ecosystems,significantly reducing impervious surfaces and impacting carbon cycling within the city. Although previous research has demonstrated the efficacy of employing artificial intelligence in conjunction with airborne light detection and ranging (LiDAR) data to generate urban tree information, the availability and cost constraints associated with LiDAR data pose limitations. Consequently, this study employed freely accessible, high-resolution multispectral satellite imagery (i.e., Sentinel-2 data) to estimate fractional tree canopy cover (FTC) within the urban confines of Suwon, South Korea, employing machine learning techniques. This study leveraged a median composite image derived from a time series of Sentinel-2 images. In order to account for the diverse land cover found in urban areas, the model incorporated three types of input variables: average (mean) and standard deviation (std) values within a 30-meter grid from 10 m resolution of optical indices from Sentinel-2, and fractional coverage for distinct land cover classes within 30 m grids from the existing level 3 land cover map. Four schemes with different combinations of input variables were compared. Notably, when all three factors (i.e., mean, std, and fractional cover) were used to consider the variation of landcover in urban areas(Scheme 4, S4), the machine learning model exhibited improved performance compared to using only the mean of optical indices (Scheme 1). Of the various models proposed, the random forest (RF) model with S4 demonstrated the most remarkable performance, achieving R2 of 0.8196, and mean absolute error (MAE) of 0.0749, and a root mean squared error (RMSE) of 0.1022. The std variable exhibited the highest impact on model outputs within the heterogeneous land covers based on the variable importance analysis. This trained RF model with S4 was then applied to the entire Suwon region, consistently delivering robust results with an R2 of 0.8702, MAE of 0.0873, and RMSE of 0.1335. The FTC estimation method developed in this study is expected to offer advantages for application in various regions, providing fundamental data for a better understanding of carbon dynamics in urban ecosystems in the future.

Multiple Linear Analysis for Generating Parametric Images of Irreversible Radiotracer (비가역 방사성추적자 파라메터 영상을 위한 다중선형분석법)

  • Kim, Su-Jin;Lee, Jae-Sung;Lee, Won-Woo;Kim, Yu-Kyeong;Jang, Sung-June;Son, Kyu-Ri;Kim, Hyo-Cheol;Chung, Jin-Wook;Lee, Dong-Soo
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.41 no.4
    • /
    • pp.317-325
    • /
    • 2007
  • Purpose: Biological parameters can be quantified using dynamic PET data with compartment modeling and Nonlinear Least Square (NLS) estimation. However, the generation of parametric images using the NLS is not appropriate because of the initial value problem and excessive computation time. In irreversible model, Patlak graphical analysis (PGA) has been commonly used as an alternative to the NLS method. In PGA, however, the start time ($t^*$, time where linear phase starts) has to be determined. In this study, we suggest a new Multiple Linear Analysis for irreversible radiotracer (MLAIR) to estimate fluoride bone influx rate (Ki). Methods: $[^{18}F]Fluoride$ dynamic PET scans was acquired for 60 min in three normal mini-pigs. The plasma input curve was derived using blood sampling from the femoral artery. Tissue time-activity curves were measured by drawing region of interests (ROls) on the femur head, vertebra, and muscle. Parametric images of Ki were generated using MLAIR and PGA methods. Result: In ROI analysis, estimated Ki values using MLAIR and PGA method was slightly higher than those of NLS, but the results of MLAIR and PGA were equivalent. Patlak slopes (Ki) were changed with different $t^*$ in low uptake region. Compared with PGA, the quality of parametric image was considerably improved using new method. Conclusion: The results showed that the MLAIR was efficient and robust method for the generation of Ki parametric image from $[^{18}F]Fluoride$ PET. It will be also a good alternative to PGA for the radiotracers with irreversible three compartment model.

Automatic Detection of Stage 1 Sleep (자동 분석을 이용한 1단계 수면탐지)

  • 신홍범;한종희;정도언;박광석
    • Journal of Biomedical Engineering Research
    • /
    • v.25 no.1
    • /
    • pp.11-19
    • /
    • 2004
  • Stage 1 sleep provides important information regarding interpretation of nocturnal polysomnography, particularly sleep onset. It is a short transition period from wakeful consciousness to sleep. Lack of prominent sleep events characterizing stage 1 sleep is a major obstacle in automatic sleep stage scoring. In this study, we attempted to utilize simultaneous EEC and EOG processing and analyses to detect stage 1 sleep automatically. Relative powers of the alpha waves and the theta waves were calculated from spectral estimation. Either the relative power of alpha waves less than 50% or the relative power of theta waves more than 23% was regarded as stage 1 sleep. SEM (slow eye movement) was defined as the duration of both eye movement ranging from 1.5 to 4 seconds and regarded also as stage 1 sleep. If one of these three criteria was met, the epoch was regarded as stage 1 sleep. Results f ere compared to the manual rating results done by two polysomnography experts. Total of 169 epochs was analyzed. Agreement rate for stage 1 sleep between automatic detection and manual scoring was 79.3% and Cohen's Kappa was 0.586 (p<0.01). A significant portion (32%) of automatically detected stage 1 sleep included SEM. Generally, digitally-scored sleep s1aging shows the accuracy up to 70%. Considering potential difficulties in stage 1 sleep scoring, the accuracy of 79.3% in this study seems to be robust enough. Simultaneous analysis of EOG provides differential value to the present study from previous oneswhich mainly depended on EEG analysis. The issue of close relationship between SEM and stage 1 sleep raised by Kinnariet at. remains to be a valid one in this study.

Retrieval of Hourly Aerosol Optical Depth Using Top-of-Atmosphere Reflectance from GOCI-II and Machine Learning over South Korea (GOCI-II 대기상한 반사도와 기계학습을 이용한 남한 지역 시간별 에어로졸 광학 두께 산출)

  • Seyoung Yang;Hyunyoung Choi;Jungho Im
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.933-948
    • /
    • 2023
  • Atmospheric aerosols not only have adverse effects on human health but also exert direct and indirect impacts on the climate system. Consequently, it is imperative to comprehend the characteristics and spatiotemporal distribution of aerosols. Numerous research endeavors have been undertaken to monitor aerosols, predominantly through the retrieval of aerosol optical depth (AOD) via satellite-based observations. Nonetheless, this approach primarily relies on a look-up table-based inversion algorithm, characterized by computationally intensive operations and associated uncertainties. In this study, a novel high-resolution AOD direct retrieval algorithm, leveraging machine learning, was developed using top-of-atmosphere reflectance data derived from the Geostationary Ocean Color Imager-II (GOCI-II), in conjunction with their differences from the past 30-day minimum reflectance, and meteorological variables from numerical models. The Light Gradient Boosting Machine (LGBM) technique was harnessed, and the resultant estimates underwent rigorous validation encompassing random, temporal, and spatial N-fold cross-validation (CV) using ground-based observation data from Aerosol Robotic Network (AERONET) AOD. The three CV results consistently demonstrated robust performance, yielding R2=0.70-0.80, RMSE=0.08-0.09, and within the expected error (EE) of 75.2-85.1%. The Shapley Additive exPlanations(SHAP) analysis confirmed the substantial influence of reflectance-related variables on AOD estimation. A comprehensive examination of the spatiotemporal distribution of AOD in Seoul and Ulsan revealed that the developed LGBM model yielded results that are in close concordance with AERONET AOD over time, thereby confirming its suitability for AOD retrieval at high spatiotemporal resolution (i.e., hourly, 250 m). Furthermore, upon comparing data coverage, it was ascertained that the LGBM model enhanced data retrieval frequency by approximately 8.8% in comparison to the GOCI-II L2 AOD products, ameliorating issues associated with excessive masking over very illuminated surfaces that are often encountered in physics-based AOD retrieval processes.

A Comparative Study of Vegetation Phenology Using High-resolution Sentinel-2 Imagery and Topographically Corrected Vegetation Index (고해상도 Sentinel-2 위성 자료와 지형효과를 고려한 식생지수 기반의 산림 식생 생장패턴 비교)

  • Seungheon Yoo;Sungchan Jeong
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.26 no.2
    • /
    • pp.89-102
    • /
    • 2024
  • Land Surface Phenology (LSP) plays a crucial role in understanding vegetation dynamics. The near-infrared reflectance of vegetation (NIRv) has been increasingly adopted in LSP studies, being recognized as a robust proxy for gross primary production (GPP). However, NIR v is sensitive to the terrain effects in mountainous areas due to artifacts in NIR reflectance cannot be canceled out. Because of this, estimating phenological metrics in mountainous regions have a substantial uncertainty, especially in the end of season (EOS). The topographically corrected NIRv (TCNIRv) employs the path length correction (PLC) method, which was deduced from the simplification of the radiative transfer equation, to alleviate limitations related to the terrain effects. TCNIRv has been demonstrated to estimate phenology metrics more accurately than NIRv, especially exhibiting improved estimation of EOS. As the topographic effect is significantly influenced by terrain properties such as slope and aspect, our study compared phenology metrics estimations between south-facing slopes (SFS) and north-facing slopes (NFS) using NIRv and TCNIRv in two distinct mountainous regions: Gwangneung Forest (GF) and Odaesan National Park (ONP), representing relatively flat and rugged areas, respectively. The results indicated that TCNIR v-derived EOS at NFS occurred later than that at SFS for both study sites (GF : DOY 266.8/268.3 at SFS/NFS; ONP : DOY 262.0/264.8 at SFS/NFS), in contrast to the results obtained with NIRv (GF : DOY 270.3/265.5 at SFS/NFS; ONP : DOY 265.0/261.8 at SFS/NFS). Additionally, the gap between SFS and NFS diminished after topographic correction (GF : DOY 270.3/265.5 at SFS/NFS; ONP : DOY 265.0/261.8 at SFS/NFS). We conclude that TCNIRv exhibits discrepancy with NIR v in EOS detection considering slope orientation. Our findings underscore the necessity of topographic correction in estimating photosynthetic phenology, considering slope orientation, especially in diverse terrain conditions.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.