• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.022 seconds

An Empirical Study on the Consumption Risk Sharing across the EU Regions (EU 지역간 소비위험분산에 대한 실증연구)

  • Park, You-Jin;Song, Jeongseok
    • International Area Studies Review
    • /
    • v.13 no.2
    • /
    • pp.89-115
    • /
    • 2009
  • By measuring the consumption risk sharing for the EU regions, we evaluate the performance of various risk sharing channels for the EU. We identify which countries are likely to form the highest risk sharing group among the EU regions by using the DFFITS and DFBETAS diagnostics derived in a statistical regression. Our finding suggests that most western European countries seem to display homogeneous degree of risk sharing. In addition, our result confirms that high risk sharing regions as well as low risk sharing regions are mainly located in many eastern European countries that joined the EU later than western European countries, and implies that the EU members are still dichotomized at large in terms of consumption risk sharing.

Optimal National Coordinate System Transform Model using National Control Point Network Adjustment Results (국가지준점 망조정 성과를 활용한 최적 국가 좌표계 변환 모델 결정)

  • Song, Dong-Seob;Jang, Eun-Seok;Kim, Tae-Woo;Yun, Hong-Sic
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.25 no.6_2
    • /
    • pp.613-623
    • /
    • 2007
  • The main purpose of this study is to investigate the coordinate transformation based on two different systems between local geodetic datum(tokyo datum) and international geocentric datum(new Korea geodetic datum). For this purpose, three methods were used to determine seven parameters as follows: Bursa-Wolf model, Molodensky-Badekas model, and Veis model. Also, we adopted multiple regression equation method to convert from Tokyo datum to KTRF. We used 935 control points as a common points and applied gross error analysis for detecting the outlier among those control points. The coordinate transformation was carried out using similarity transformation applied the obtained seven parameters and the precision of transformed coordinate was evaluated about 9,917 third or forth order control points. From these results, it was found that Bursa-Wolf model and Molodensky-Badekas model are more suitable than other for the determination of transformation parameters in Korea. And, transforming accuracy using MRE is lower than other similarity transformation model.

Robust Outlier-Object Detection in Image Pairs Based on Variable Threshold Using Empirical Correction Constant (실험적 교정상수를 사용한 가변문턱값에 기초한 영상 쌍에서의 강인한 이상 물체 검출)

  • Kim, Dong-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.14-22
    • /
    • 2009
  • By calculating the differences between two images, which are captured with the same scene at different time, we can detect a set of outliers, such as occluding objects due to moving vehicles. To reduce the influence from the different intensity properties of the images, a simple technique that reruns the regression, which is based on the polynomial regression model, is employed. For a robust detection of outliers, the image difference is normalized by the noise variance. Hence, an accurate estimate of the noise variance is very important. In this paper, using an empirically obtained correction constant is proposed. Numerical analysis using both synthetic and real images are also shown in this paper to show the robust performance of the detection algorithm.

Algorithm for the L1-Regression Estimation with High Breakdown Point (L1-회귀추정량의 붕괴점 향상을 위한 알고리즘)

  • Kim, Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.541-550
    • /
    • 2010
  • The $L_1$-regression estimator is susceptible to the leverage points, even though it is highly robust to the vertical outliers. This article is concerned with the improvement of robustness of the $L_1$-estimator. To improve its robustness, in terms of the breakdown point, we attempt to dampen the influence of the leverage points by means of reducing the weights corresponding to the leverage points. In addition the algorithm employs the linear scaling transformation technique, for higher computational efficiency with the large data sets, to solve the linear programming problem of $L_1$-estimation. Monte Carlo simulation results indicate that the proposed algorithm yields $L_1$-estimates which are robust to the leverage points as well as the vertical outliers.

Estimation of Freeway Accident Likelihood using Real-time Traffic Data (실시간 교통자료 기반 고속도로 교통사고 발생 가능성 추정 모형)

  • Park, Joon-Hyung;Oh, Cheol;NamKoong, Seong
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.2
    • /
    • pp.157-166
    • /
    • 2008
  • This study proposed a model to estimate traffic accident likelihood using real-time traffic data obtained from freeway traffic surveillance systems. Traffic variables representing spatio-temporal variations of traffic conditions were utilized as independent variables in the proposed models. Binary logistics regression modelings were conducted to correlate traffic variables and accident data that were collected from the Seohaean freeway during recent three years, from 2004 to 2006. To apply more reliable traffic variables, outlier filtering and data imputation were also performed. The outcomes of the model that are actually probabilistic measures of accident occurrence would be effectively utilized not only in designing warning information systems but also in evaluating the effectiveness of various traffic operations strategies in terms of traffic safety.

지자기 전달함수의 로버스트 추정

  • Yang, Jun-Mo;O, Seok-Hun;Lee, Deok-Gi;Yun, Yong-Hun
    • Journal of the Korean Geophysical Society
    • /
    • v.5 no.2
    • /
    • pp.131-142
    • /
    • 2002
  • Geomagnetic transfer function is generally estimated by choosing transfer to minimize the square sum of differences between observed values. If the error structure sccords to the Gaussian distribution, standard least square(LS) can be the estimation. However, for non-Gaussian error distribution, the LS estimation can be severely biased and distorted. In this paper, the Gaussian error assumption was tested by Q-Q(Quantile-Quantile) plot which provided information of real error structure. Therefore, robust estimation such as regression M-estimate that does not allow a few bad points to dominate the estimate was applied for error structure with non-Gaussian distribution. The results indicate that the performance of robust estimation is similar to the one of LS estimation for Gaussian error distribution, whereas the robust estimation yields more reliable and smooth transfer function estimates than standard LS for non-Gaussian error distribution.

  • PDF

Calibration Update for the Measuring Total Nitrogen Content in Rice Plant Tissue Using the Near Infrared Spectroscopy

  • Kwon, Young-Rip;Song, Young-Eun;Choi, Dong-Chil;Ryu, Jeong
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.54 no.1
    • /
    • pp.29-35
    • /
    • 2009
  • The aim of the present study was to update the calibration that is used for the measurement of the total nitrogen content in the rice plant samples by using the visible and near infrared spectrum. Before the equation merge, correlation coefficient of calibration equation for nitrogen content on each rice parts was 0.945 (Leaf), 0.928 (Stem), and 0.864 (Whole plant), respectively. In the calibration models created by each part in the rice plant under the various regression method, the calibration model for the leaf was recorded with relatively high accuracy. Among of those, the calibration equation developed by Partial least squares (PLS) method was more accurate than the Multiple linear regression (MLR) method. The calibration equation was sensitive based on variety and location variations. However, we have merged and enlarged various of the samples that made not only to measure the nitrogen content more accurately, but also later sampling populations became more diversified. After merging, $R^2$ value becomes more accurate and significantly to 0.950 (L.), 0.974 (S.), 0.940 (W.). Also, after removal of outlier, R2 values increased into 0.998, 0.995, and 0.997. In view of the results so far achieved, Standard error of prediction (SEP) and SEP (C) were reduced in the stem and whole plant. Biases were reduced in the leaf, stem as well as whole plant. Slopes were high in the stem. Standard deviation reduced in the stem but $R^2$ was high in the stem and whole plant. Result was indicated that calibration equation make update, and updating robust calibration equation from merge function and multi-variate calibration.

Development on Crop Yield Forecasting Model for Major Vegetable Crops using Meteorological Information of Main Production Area (주산지 기상정보를 활용한 주요 채소작물의 단수 예측 모형 개발)

  • Lim, Chul-Hee;Kim, Gang Sun;Lee, Eun Jung;Heo, Seongbong;Kim, Teayeon;Kim, Young Seok;Lee, Woo-Kyun
    • Journal of Climate Change Research
    • /
    • v.7 no.2
    • /
    • pp.193-203
    • /
    • 2016
  • The importance of forecasting agricultural production is receiving attention while climate change is accelerating. This study suggested three types of crop yield forecasting model for major vegetable crops by using downscaled meteorological information of main production area on farmland level, which identified as limitation from previous studies. First, this study conducted correlation analysis with seven types of farm level downscaled meteorological informations and reported crop yield of main production area. After, we selected three types of meteorological factors which showed the highest relation with each crop species and regions. Parameters were deducted from meterological factor with high correlation but crop species number was neglected. After, crop yield of each crops was estimated by using the three suggested types of models. Chinese cabbage showed high accuracy in overall, while the accuracy of daikon and onion was quiet revised by neglecting the outlier. Chili and garlic showed differences by region, but Kyungbuk chili and Chungnam, Kyungsang garlic appeared significant accuracy. We also selected key meteorological factor of each crops which has the highest relation with crop yield. If the factor had significant relation with the quantity, it explains better about the variations of key meteorological factor. This study will contribute to establishing the methodology of future studies by estimating the crop yield of different species by using farmland meterological information and relatively simplify multiple linear regression models.

Research on Mining Technology for Explainable Decision Making (설명가능한 의사결정을 위한 마이닝 기술)

  • Kyungyong Chung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.186-191
    • /
    • 2023
  • Data processing techniques play a critical role in decision-making, including handling missing and outlier data, prediction, and recommendation models. This requires a clear explanation of the validity, reliability, and accuracy of all processes and results. In addition, it is necessary to solve data problems through explainable models using decision trees, inference, etc., and proceed with model lightweight by considering various types of learning. The multi-layer mining classification method that applies the sixth principle is a method that discovers multidimensional relationships between variables and attributes that occur frequently in transactions after data preprocessing. This explains how to discover significant relationships using mining on transactions and model the data through regression analysis. It develops scalable models and logistic regression models and proposes mining techniques to generate class labels through data cleansing, relevance analysis, data transformation, and data augmentation to make explanatory decisions.

Prediction of Uniaxial Compressive Strength of Rock using Shield TBM Machine Data and Machine Learning Technique (쉴드 TBM 기계 데이터 및 머신러닝 기법을 이용한 암석의 일축압축강도 예측)

  • Kim, Tae-Hwan;Ko, Tae Young;Park, Yang Soo;Kim, Taek Kon;Lee, Dae Hyuk
    • Tunnel and Underground Space
    • /
    • v.30 no.3
    • /
    • pp.214-225
    • /
    • 2020
  • Uniaxial compressive strength (UCS) of rock is one of the important factors to determine the advance speed during shield TBM tunnel excavation. UCS can be obtained through the Geotechnical Data Report (GDR), and it is difficult to measure UCS for all tunneling alignment. Therefore, the purpose of this study is to predict UCS by utilizing TBM machine driving data and machine learning technique. Several machine learning techniques were compared to predict UCS, and it was confirmed the stacking model has the most successful prediction performance. TBM machine data and UCS used in the analysis were obtained from the excavation of rock strata with slurry shield TBMs. The data were divided into 8:2 for training and test and pre-processed including feature selection, scaling, and outlier removal. After completing the hyper-parameter tuning, the stacking model was evaluated with the root-mean-square error (RMSE) and the determination coefficient (R2), and it was found to be 5.556 and 0.943, respectively. Based on the results, the sacking models are considered useful in predicting rock strength with TBM excavation data.