• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.023 seconds

Outlier Detection Using Dynamic Plots (동적 그림을 이용한 이상치 검색)

  • Ahn, Byung-Jin;Seo, Han-Son
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.979-986
    • /
    • 2011
  • A linear regression method is commonly used to analyze data because of its simplicity and applicability; however, it is well known that data may contain some outliers and influential cases that may have a harmful effect on a statistical analysis. Thus detection and examination of outliers or influential cases are important parts of data analysis. In detecting multiple outliers, masking effects usually occur and make it difficult to identify the true outliers. We propose to use dynamic plots as a method resistant to masking effect. The procedure using dynamic plots is useful to find appropriate basic sets with which a dependent outliers detection method start and detect a true outliers set. Examples are given to demonstrate the effectiveness of the suggested idea.

A Study on the Improvement of Annual Runoff Estimation Model (연유출량 추정모형의 개선방안)

  • 이상훈
    • Water for future
    • /
    • v.26 no.1
    • /
    • pp.51-62
    • /
    • 1993
  • The most significant factor in estimating annual runoff must be the precipitation. But in the previous study, the watershed area instead of precitation was included as an independent variable in regression model in the process of checking accurate data. The criterion of accurate data was the runoff ratio in the range of 20% to 100%. In this study the valid range of evapotranspiration was adopted as a criterion of accurate data and the same data were reexamined. It came up with following model which has a high coefficient of determination and conforms to hydrologic theory. R=-518.25+0.8834P where, R: runoff depth(mm) P: precipitation(mm) This regression model was found to be stable by cross-validation and is proposed as annual runoff estimation model applicable to ungaged small and medium watersheds in Korea.

  • PDF

Bayesian forecasting approach for structure response prediction and load effect separation of a revolving auditorium

  • Ma, Zhi;Yun, Chung-Bang;Shen, Yan-Bin;Yu, Feng;Wan, Hua-Ping;Luo, Yao-Zhi
    • Smart Structures and Systems
    • /
    • v.24 no.4
    • /
    • pp.507-524
    • /
    • 2019
  • A Bayesian dynamic linear model (BDLM) is presented for a data-driven analysis for response prediction and load effect separation of a revolving auditorium structure, where the main loads are self-weight and dead loads, temperature load, and audience load. Analyses are carried out based on the long-term monitoring data for static strains on several key members of the structure. Three improvements are introduced to the ordinary regression BDLM, which are a classificatory regression term to address the temporary audience load effect, improved inference for the variance of observation noise to be updated continuously, and component discount factors for effective load effect separation. The effects of those improvements are evaluated regarding the root mean square errors, standard deviations, and 95% confidence intervals of the predictions. Bayes factors are used for evaluating the probability distributions of the predictions, which are essential to structural condition assessments, such as outlier identification and reliability analysis. The performance of the present BDLM has been successfully verified based on the simulated data and the real data obtained from the structural health monitoring system installed on the revolving structure.

A Robust Energy Consumption Forecasting Model using ResNet-LSTM with Huber Loss

  • Albelwi, Saleh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.301-307
    • /
    • 2022
  • Energy consumption has grown alongside dramatic population increases. Statistics show that buildings in particular utilize a significant amount of energy, worldwide. Because of this, building energy prediction is crucial to best optimize utilities' energy plans and also create a predictive model for consumers. To improve energy prediction performance, this paper proposes a ResNet-LSTM model that combines residual networks (ResNets) and long short-term memory (LSTM) for energy consumption prediction. ResNets are utilized to extract complex and rich features, while LSTM has the ability to learn temporal correlation; the dense layer is used as a regression to forecast energy consumption. To make our model more robust, we employed Huber loss during the optimization process. Huber loss obtains high efficiency by handling minor errors quadratically. It also takes the absolute error for large errors to increase robustness. This makes our model less sensitive to outlier data. Our proposed system was trained on historical data to forecast energy consumption for different time series. To evaluate our proposed model, we compared our model's performance with several popular machine learning and deep learning methods such as linear regression, neural networks, decision tree, and convolutional neural networks, etc. The results show that our proposed model predicted energy consumption most accurately.

A Study On Developing Weapon System CERs With Considering Various Data Characteristics (다양한 데이터 특성을 고려한 무기체계 비용추정관계식 개발 연구)

  • Jung, Won-Il;Kim, Dong-Kyu;Kang, Sung-Jin
    • Journal of the military operations research society of Korea
    • /
    • v.36 no.3
    • /
    • pp.43-56
    • /
    • 2010
  • Recently, the acquisition environment of the Korean defense weapon system is emphasizing more the importance of cost analysis in terms of efficient execution for defense acquisition budget. While cost analysis, however, is emphasized in law and process, its infrastructures are still insufficient We have been using computerized cost models to obtain an estimate at early phase of project. But those models have been developed by foreign companies, and so they have many limitations when using in Korean defense environment. For this reason, it began to sympathize that we need the development of the Korean version cost estimation model suitable for our defense industry environment, and now many studies are proceeding. In this study, we suggest Cost Estimating Relationships(CERs) developing methodologies which is key logics of Korean version cost estimation model. Especially, we proposed a new CER's development process depending upon data characteristics such as, multicolinearity, outlier, small samples and heteroscedasticity. Also, we presented a case study for artillery weapon system using these methods we developed. We find that these CERs could be verified through theoretical methods.

Robust multiple imputation method for missings with boundary and outliers (한계와 이상치가 있는 결측치의 로버스트 다중대체 방법)

  • Park, Yousung;Oh, Do Young;Kwon, Tae Yeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.889-898
    • /
    • 2019
  • The problem of missing value imputation for variables in surveys that include item missing becomes complicated if outliers and logical boundary conditions between other survey items cannot be ignored. If there are outliers and boundaries in a variable including missing values, imputed values based on previous regression-based imputation methods are likely to be biased and not meet boundary conditions. In this paper, we approach these difficulties in imputation by combining various robust regression models and multiple imputation methods. Through a simulation study on various scenarios of outliers and boundaries, we find and discuss the optimal combination of robust regression and multiple imputation method.

Outlier-Object Detection Using an Image Pair Based on Regression Analysis: Noise Variance Estimation and Performance Analysis (영상 쌍에서 회귀분석에 기초한 이상 물체 검출: 잡음분산의 추정과 성능 분석)

  • Kim, Dong-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.5
    • /
    • pp.25-34
    • /
    • 2008
  • By comparing two images, which are captured with the same scene at different time, we can detect a set of outliers, such as occluding objects due to moving vehicles. To reduce the influence from the different intensity properties of the images, an intensity compensation scheme, which is based on the polynomial regression model, is employed. For an accurate detection of outliers alleviating the influence from a set of outliers, a simple technique that reruns the regression is employed. In this paper, an algorithm that iteratively reruns the regression is theoretically analyzed by observing the convergence property of the estimates of the noise variance. Using a correction constant for the estimate of the noise variance is proposed. The correction enables the detection algorithm robust to the choice of thresholds for selecting outliers. Numerical analysis using both synthetic and Teal images are also shown in this paper to show the robust performance of the detection algorithm.

Study on Health Behavior of Private Security Guards Applying Planned Behavioral Theory (계획된 행동이론을 적용한 민간경비원의 건강행동연구)

  • Kim, Hae-Sun;Gwak, Han-Byeong
    • Korean Security Journal
    • /
    • no.43
    • /
    • pp.99-120
    • /
    • 2015
  • This research aimed at analyzing health behavior of private security guards applying planned behavioral theory. In order to achieve the above purpose, this research conducted purposive sampling on the security guards who live in Seoul Gyeonggi region. Excluding unfaithful response and abnormal outlier, material of 187 persons was used for analysis. As the concrete analysis method, multiple regression analysis and logistic regression analysis to presume exploratory factory analysis(EFA), Polyserial Exploratory Factor Analysis(EFA), Polyserial correlation analysis, and causal relationship between each variable. The result can be summarized as follows. First, attachment, attitude subjective standard on behavior, perceived behavioral control appeared to positively influence affirmative(+) effect on health behavior continuance will. Second, attachment had no meaningful influence attitude toward behavior. Third, attachment had affirmative(+) influence on health behavior continuance will. Fourth, perceived behavioral control had affirmative(+) influence on realization of health behavior, possibility of practising health behavior increased by about 62.9% when perceived behavioral control increased by 1 unit.

  • PDF

A Study of the Application of Machine Learning Methods in the Low-GloSea6 Weather Prediction Solution (Low-GloSea6 기상 예측 소프트웨어의 머신러닝 기법 적용 연구)

  • Hye-Sung Park;Ye-Rin, Cho;Dae-Yeong Shin;Eun-Ok Yun;Sung-Wook Chung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.307-314
    • /
    • 2023
  • As supercomputing and hardware technology advances, climate prediction models are improving. The Korean Meteorological Administration adopted GloSea5 from the UK Met Office and now operates an updated GloSea6 tailored to Korean weather. Universities and research institutions use Low-GloSea6 on smaller servers, improving accessibility and research efficiency. In this paper, profiling Low-GloSea6 on smaller servers identified the tri_sor_dp_dp subroutine in the tri_sor.F90 atmospheric model as a CPU-intensive hotspot. Applying linear regression, a type of machine learning, to this function showed promise. After removing outliers, the linear regression model achieved an RMSE of 2.7665e-08 and an MAE of 1.4958e-08, outperforming Lasso and ElasticNet regression methods. This suggests the potential for machine learning in optimizing identified hotspots during Low-GloSea6 execution.

Relationship Between Supply Factors of Medical Care and Use of Bed (의료의 공급량과 병상이용량과의 관계에 관한 국제비교연구)

  • 정형선
    • Health Policy and Management
    • /
    • v.5 no.2
    • /
    • pp.18-34
    • /
    • 1995
  • To clarify the relationship between the medical supply(medical persons and goods) and the use of bed, the author has made comparison among OECD 24 countries. Per Capita Bed-days can be divided into Average Length of Stay and Admission Rate, and these three variables were regressed upon both In-patient Care Beds of all medical institutions including acute somatic, psychiatric, special, nursing homes and other long-term care and Share of Total Health Employment in Total Employment. The result of regression analysis shows a statistically significant positive relationship between In-patient Care Beds and Average Length of Stay, and negative relationship between Share of Total Health Employment and Admission Rate. In addition to Ordinary Least Square(OLS) estimation, amended Bounded Influence Estimation(BIE) was also made to adjust the influence of outliers. Japan shows a very large number of In-patient Care Beds and a very low Share of Total Health Employment, and this medical situation is judged to have close relation to her long Average Length of Stay and low Admission Rate.

  • PDF