• Title/Summary/Keyword: Linear Multiple Regression Method

Search Result 446, Processing Time 0.031 seconds

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Inter-comparison of Prediction Skills of Multiple Linear Regression Methods Using Monthly Temperature Simulated by Multi-Regional Climate Models (다중 지역기후모델로부터 모의된 월 기온자료를 이용한 다중선형회귀모형들의 예측성능 비교)

  • Seong, Min-Gyu;Kim, Chansoo;Suh, Myoung-Seok
    • Atmosphere
    • /
    • v.25 no.4
    • /
    • pp.669-683
    • /
    • 2015
  • In this study, we investigated the prediction skills of four multiple linear regression methods for monthly air temperature over South Korea. We used simulation results from four regional climate models (RegCM4, SNURCM, WRF, and YSURSM) driven by two boundary conditions (NCEP/DOE Reanalysis 2 and ERA-Interim). We selected 15 years (1989~2003) as the training period and the last 5 years (2004~2008) as validation period. The four regression methods used in this study are as follows: 1) Homogeneous Multiple linear Regression (HMR), 2) Homogeneous Multiple linear Regression constraining the regression coefficients to be nonnegative (HMR+), 3) non-homogeneous multiple linear regression (EMOS; Ensemble Model Output Statistics), 4) EMOS with positive coefficients (EMOS+). It is same method as the third method except for constraining the coefficients to be nonnegative. The four regression methods showed similar prediction skills for the monthly air temperature over South Korea. However, the prediction skills of regression methods which don't constrain regression coefficients to be nonnegative are clearly impacted by the existence of outliers. Among the four multiple linear regression methods, HMR+ and EMOS+ methods showed the best skill during the validation period. HMR+ and EMOS+ methods showed a very similar performance in terms of the MAE and RMSE. Therefore, we recommend the HMR+ as the best method because of ease of development and applications.

Prediction of Pitting Corrosion Characteristics of AL-6XN Steel with Sensitization and Environmental Variables Using Multiple Linear Regression Method (다중선형회귀법을 활용한 예민화와 환경변수에 따른 AL-6XN강의 공식특성 예측)

  • Jung, Kwang-Hu;Kim, Seong-Jong
    • Corrosion Science and Technology
    • /
    • v.19 no.6
    • /
    • pp.302-309
    • /
    • 2020
  • This study aimed to predict the pitting corrosion characteristics of AL-6XN super-austenitic steel using multiple linear regression. The variables used in the model are degree of sensitization, temperature, and pH. Experiments were designed and cyclic polarization curve tests were conducted accordingly. The data obtained from the cyclic polarization curve tests were used as training data for the multiple linear regression model. The significance of each factor in the response (critical pitting potential, repassivation potential) was analyzed. The multiple linear regression model was validated using experimental conditions that were not included in the training data. As a result, the degree of sensitization showed a greater effect than the other variables. Multiple linear regression showed poor performance for prediction of repassivation potential. On the other hand, the model showed a considerable degree of predictive performance for critical pitting potential. The coefficient of determination (R2) was 0.7745. The possibility for pitting potential prediction was confirmed using multiple linear regression.

Quantitative Analysis by Diffuse Reflectance Infrared Fourier Transform and Linear Stepwise Multiple Regression Analysis I -Simultaneous quantitation of ethenzamide, isopropylantipyrine, caffeine, and allylisopropylacetylurea in tablet by DRIFT and linear stepwise multiple regression analysis-

  • Park, Man-Ki;Yoon, Hye-Ran;Kim, Kyoung-Ho;Cho, Jung-Hwan
    • Archives of Pharmacal Research
    • /
    • v.11 no.2
    • /
    • pp.99-113
    • /
    • 1988
  • Quantitation of ethenzamide, isopropylantipyrine and caffeine takes about 41 hrs by conventional GC method. Quantitation of allylisoprorylacetylurea takes about 40 hrs by conventional UV method. But quantitation of them takes about 6 hrs by DRIFT developing method. Each standard and sample sieved, powdered and acquired DRIFT spectrum. Out of them peak of each component was selected and ratio of each peak to standard peak was acquired, and then linear stepwise multiple regression was performed with these data and concentration. Reflectance value, Kubelka-Munk equation and Inverse-Kubelka-Munk equation were modified by us. Inverse-Kubelka-Munk equation completed the deficit of Kubelka-Munk equation. Correlation coefficients acquired by conventioanl GC and UV against DRIFT were more than 0.95.

  • PDF

Multiple linear regression and fuzzy linear regression based assessment of postseismic structural damage indices

  • Fani I. Gkountakou;Anaxagoras Elenas;Basil K. Papadopoulos
    • Earthquakes and Structures
    • /
    • v.24 no.6
    • /
    • pp.429-437
    • /
    • 2023
  • This paper studied the prediction of structural damage indices to buildings after earthquake occurrence using Multiple Linear Regression (MLR) and Fuzzy Linear Regression (FLR) methods. Particularly, the structural damage degree, represented by the Maximum Inter Story Drift Ratio (MISDR), is an essential factor that ensures the safety of the building. Thus, the seismic response of a steel building was evaluated, utilizing 65 seismic accelerograms as input signals. Among the several response quantities, the focus is on the MISDR, which expresses the postseismic damage status. Using MLR and FLR methods and comparing the outputs with the corresponding evaluated by nonlinear dynamic analyses, it was concluded that the FLR method had the most accurate prediction results in contrast to the MLR method. A blind prediction applying a set of another 10 artificial accelerograms also examined the model's effectiveness. The results revealed that the use of the FLR method had the smallest average percentage error level for every set of applied accelerograms, and thus it is a suitable modeling tool in earthquake engineering.

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

Pre-processing and Bias Correction for AMSU-A Radiance Data Based on Statistical Methods (통계적 방법에 근거한 AMSU-A 복사자료의 전처리 및 편향보정)

  • Lee, Sihye;Kim, Sangil;Chun, Hyoung-Wook;Kim, Ju-Hye;Kang, Jeon-Ho
    • Atmosphere
    • /
    • v.24 no.4
    • /
    • pp.491-502
    • /
    • 2014
  • As a part of the KIAPS (Korea Institute of Atmospheric Prediction Systems) Package for Observation Processing (KPOP), we have developed the modules for Advanced Microwave Sounding Unit-A (AMSU-A) pre-processing and its bias correction. The KPOP system calculates the airmass bias correction coefficients via the method of multiple linear regression in which the scan-corrected innovation and the thicknesses of 850~300, 200~50, 50~5, and 10~1 hPa are respectively used for dependent and independent variables. Among the four airmass predictors, the multicollinearity has been shown by the Variance Inflation Factor (VIF) that quantifies the severity of multicollinearity in a least square regression. To resolve the multicollinearity, we adopted simple linear regression and Principal Component Regression (PCR) to calculate the airmass bias correction coefficients and compared the results with those from the multiple linear regression. The analysis shows that the order of performances is multiple linear, principal component, and simple linear regressions. For bias correction for the AMSU-A channel 4 which is the most sensitive to the lower troposphere, the multiple linear regression with all four airmass predictors is superior to the simple linear regression with one airmass predictor of 850~300 hPa. The results of PCR with 95% accumulated variances accounted for eigenvalues showed the similar results of the multiple linear regression.

MOISTURE CONTENT MEASUREMENT OF POWDERED FOOD USING RF IMPEDANCE SPECTROSCOPIC METHOD

  • Kim, K. B.;Lee, J. W.;S. H. Noh;Lee, S. S.
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2000.11b
    • /
    • pp.188-195
    • /
    • 2000
  • This study was conducted to measure the moisture content of powdered food using RF impedance spectroscopic method. In frequency range of 1.0 to 30㎒, the impedance such as reactance and resistance of parallel plate type sample holder filled with wheat flour and red-pepper powder of which moisture content range were 5.93∼-17.07%w.b. and 10.87 ∼ 27.36%w.b., respectively, was characterized using by Q-meter (HP4342). The reactance was a better parameter than the resistance in estimating the moisture density defined as product of moisture content and bulk density which was used to eliminate the effect of bulk density on RF spectral data in this study. Multivariate data analyses such as principal component regression, partial least square regression and multiple linear regression were performed to develop one calibration model having moisture density and reactance spectral data as parameters for determination of moisture content of both wheat flour and red-pepper powder. The best regression model was one by the multiple linear regression model. Its performance for unknown data of powdered food was showed that the bias, standard error of prediction and determination coefficient are 0.179% moisture content, 1.679% moisture content and 0.8849, respectively.

  • PDF

Study on the Critical Storm Duration Decision of the Rivers Basin (중소하천유역의 임계지속시간 결정에 관한 연구)

  • Ahn, Seung-Seop;Lee, Hyeo-Jung;Jung, Do-June
    • Journal of Environmental Science International
    • /
    • v.16 no.11
    • /
    • pp.1301-1312
    • /
    • 2007
  • The objective of this study is to propose a critical storm duration forecasting model on storm runoff in small river basin. The critical storm duration data of 582 sub-basin which introduced disaster impact assessment report on the National Emergency Management Agency during the period from 2004 to 2007 were collected, analyzed and studied. The stepwise multiple regression method are used to establish critical storm duration forecasting models(Linear and exponential type). The results of multiple regression analysis discriminated the linear type more than exponential type. The results of multiple linear regression analysis between the critical storm duration and 5 basin characteristics parameters such as basin area, main stream length, average slope of main stream, shape factor and CN showed more than 0.75 of correlation in terms of the multi correlation coefficient.

Optimized Neural Network Weights and Biases Using Particle Swarm Optimization Algorithm for Prediction Applications

  • Ahmadzadeh, Ezat;Lee, Jieun;Moon, Inkyu
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.8
    • /
    • pp.1406-1420
    • /
    • 2017
  • Artificial neural networks (ANNs) play an important role in the fields of function approximation, prediction, and classification. ANN performance is critically dependent on the input parameters, including the number of neurons in each layer, and the optimal values of weights and biases assigned to each neuron. In this study, we apply the particle swarm optimization method, a popular optimization algorithm for determining the optimal values of weights and biases for every neuron in different layers of the ANN. Several regression models, including general linear regression, Fourier regression, smoothing spline, and polynomial regression, are conducted to evaluate the proposed method's prediction power compared to multiple linear regression (MLR) methods. In addition, residual analysis is conducted to evaluate the optimized ANN accuracy for both training and test datasets. The experimental results demonstrate that the proposed method can effectively determine optimal values for neuron weights and biases, and high accuracy results are obtained for prediction applications. Evaluations of the proposed method reveal that it can be used for prediction and estimation purposes, with a high accuracy ratio, and the designed model provides a reliable technique for optimization. The simulation results show that the optimized ANN exhibits superior performance to MLR for prediction purposes.