• 제목/요약/키워드: Linear Multiple Regression Method

검색결과 450건 처리시간 0.027초

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son;Yoon, Min
    • 응용통계연구
    • /
    • 제25권3호
    • /
    • pp.503-512
    • /
    • 2012
  • Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.

Multivariate statistical analysis of the comparative antioxidant activity of the total phenolics and tannins in the water and ethanol extracts of dried goji berry (Lycium chinense) fruits

  • Kim, Joo-Shin;Kimm, Haklin Alex
    • 한국식품과학회지
    • /
    • 제51권3호
    • /
    • pp.227-236
    • /
    • 2019
  • Antioxidant activity in water and ethanol extracts of dried Lycium chinense fruit, as a result of the total phenolic and tannin content, was measured using a number of chemical and biochemical assays for radical scavenging and inhibition of lipid peroxidation, with the analysis being extended by applying a bootstrapping statistical method. Previous statistical analyses mostly provided linear correlation and regression analyses between antioxidant activity and increasing concentrations of phenolics and tannins in a concentration-dependent mode. The present study showed that multiple component or multivariate analysis by applying multiple regression analysis or regression planes proved more informative than linear regression analysis of the relationship between the concentration of individual components and antioxidant activity. In this paper, we represented the multivariate analysis of antioxidant activities of both phenolic and tannin contents combined in the water and ethanol extracts, which revealed the hidden observations that were not evident from linear statistical analysis.

지역빈도해석 및 다중회귀분석을 이용한 산악형 강수해석 (Orographic Precipitation Analysis with Regional Frequency Analysis and Multiple Linear Regression)

  • 윤혜선;엄명진;조원철;허준행
    • 한국수자원학회논문집
    • /
    • 제42권6호
    • /
    • pp.465-480
    • /
    • 2009
  • 본 연구에서는 다중회귀분석을 이용하여 산악효과를 야기하는 지형인자와 강수와의 관계를 파악하였다. 섬 전체가 산악지형인 제주도의 연평균강수량과 지수홍수법으로 산출한 확률강우량을 강수자료로 사용하여 산악효과를 야기하는 지형인자로 선정한 고도, 위 경도와 회귀모형을 구성하였다. 회귀분석 결과 연평균강수량과 고도와의 선형관계가 확률강우량에서도 동일하게 나타났으며, 고도이외에 위도, 경도를 각각 추가인자로 고려할 경우 강우량과 더욱 강한 상관성을 보였다. 또한, 고도와 위도, 경도를 모두 고려한 회귀모형을 이용한 지형공간분석 결과 제주도의 실제 강수특성과 마찬가지로 남동부로 편중된 강수형태를 보여 모형의 적합성을 증명하였다. 그러나 지속시간 및 재현기간과 무관하게 높은 고도에서 회귀식의 유효성이 감소하므로, 높은 고도에서의 추가적인 산악효과인자의 강수량에 대한 영향이 존재될 것으로 판단되므로 추후 연구가 필요하다.

선형회귀 모형에서 자기공분산 기반 추정 (Autocovariance based estimation in the linear regression model)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권5호
    • /
    • pp.839-847
    • /
    • 2011
  • 이 연구에서는 다중 선형회귀 모형에서 자기공분산에 근거한 회귀 계수의 추정량을 도출하였다. 자기공분산에 근거한 방법은 Park (2009)에 제시된 방법으로 직관적으로 매혹적이지는 않지만, 이것에 근거한 추정량이 회귀 계수의 불편추정량이 된다. 설명변수 벡터가 어떤 정칙조건을 만족한다면, 오차가 자기회귀이동평균 모형을 따르면 만족되는 약한 조건 하에서 이 추정량이 최소제곱 추정량과 점근적으로 동일한 분포를 가지며 또한 회귀 계수에 확률 상 수렴한다는 것을 보였다. 마지막으로 모의실험을 통해 이 성질들이 소표본에서도 성립하는 것을 보였다.

스마트 무인기용 터보축 엔진의 성능진단을 위한 결함 예측에 관한 연구 (A Study on Defect Diagnostics for Health Monitoring of a Turbo-Shaft Engine for SUAV)

  • 박준철;노태성;최동환
    • 한국추진공학회:학술대회논문집
    • /
    • 한국추진공학회 2005년도 제24회 춘계학술대회논문집
    • /
    • pp.248-251
    • /
    • 2005
  • 본 연구에서는 가스 터빈 엔진의 결함에 의해 나타나는 엔진의 성능 저하를 진단하는 기법을 연구하였다. 대상 엔진을 모델화하기 위해 상용 프로그램 GSP를 이용하여 저하된 성능 진단을 위한 변수들을 추출하였으며 이를 바탕으로 Health Monitoring을 위한 Virtual Sensor Model을 구축하였다. 단일 결함과 복합 결함을 예측하기 위한 방법으로 Multiple Linear Regression기법과 가중치를 이용한 기법을 도입하여 엔진 구성품의 결함 위치 및 결함 정도를 예측하였다.

  • PDF

Determination of Research Octane Number using NIR Spectral Data and Ridge Regression

  • 정호일;이혜선;전지혁
    • Bulletin of the Korean Chemical Society
    • /
    • 제22권1호
    • /
    • pp.37-42
    • /
    • 2001
  • Ridge regression is compared with multiple linear regression (MLR) for determination of Research Octane Number (RON) when the baseline and signal-to-noise ratio are varied. MLR analysis of near-infrared (NIR) spectroscopic data usually encounters a collinearity problem, which adversely affects long-term prediction performance. The collinearity problem can be eliminated or greatly improved by using ridge regression, which is a biased estimation method. To evaluate the robustness of each calibration, the calibration models developed by both calibration methods were used to predict RONs of gasoline spectra in which the baseline and signal-to-noise ratio were varied. The prediction results of a ridge calibration model showed more stable prediction performance as compared to that of MLR, especially when the spectral baselines were varied. . In conclusion, ridge regression is shown to be a viable method for calibration of RON with the NIR data when only a few wavelengths are available such as hand-carry device using a few diodes.

다중선형회귀모델을 이용한 움직임 추정방법 (Motion estimation method using multiple linear regression model)

  • 김학수;임원택;이재철;이규원;박규택
    • 전자공학회논문지S
    • /
    • 제34S권10호
    • /
    • pp.98-103
    • /
    • 1997
  • Given the small bit allocation for motion information in very low bit-rate coding, motion estimation using the block matching algorithm(BMA) fails to maintain an acceptable level of prediction errors. The reson is that the motion model, or spatial transformation, assumed in block matching cannot approximate the motion in the real world precisely with a small number of parameters. In order to overcome the drawback of the conventional block matching algorithm, several triangle-based methods which utilize triangular patches insead of blocks have been proposed. To estimate the motions of image sequences, these methods usually have been based on the combination of optical flow equation, affine transform, and iteration. But the compuataional cost of these methods is expensive. This paper presents a fast motion estimation algorithm using a multiple linear regression model to solve the defects of the BMA and the triange-based methods. After describing the basic 2-D triangle-based method, the details of the proposed multiple linear regression model are presented along with the motion estimation results from one standard video sequence, representative of MPEG-4 class A data. The simulationresuls show that in the proposed method, the average PSNR is improved about 1.24 dB in comparison with the BMA method, and the computational cost is reduced about 25% in comparison with the 2-D triangle-based method.

  • PDF

전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델 (MapReduce-based Localized Linear Regression for Electricity Price Forecasting)

  • 한진주;이인규;온병원
    • 전기학회논문지P
    • /
    • 제67권4호
    • /
    • pp.183-190
    • /
    • 2018
  • Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제40권2호
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

On study for change point regression problems using a difference-based regression model

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권6호
    • /
    • pp.539-556
    • /
    • 2019
  • This paper derive a method to solve change point regression problems via a process for obtaining consequential results using properties of a difference-based intercept estimator first introduced by Park and Kim (Communications in Statistics - Theory Methods, 2019) for outlier detection in multiple linear regression models. We describe the statistical properties of the difference-based regression model in a piecewise simple linear regression model and then propose an efficient algorithm for change point detection. We illustrate the merits of our proposed method in the light of comparison with several existing methods under simulation studies and real data analysis. This methodology is quite valuable, "no matter what regression lines" and "no matter what the number of change points".