• Title/Summary/Keyword: robust regression

Search Result 365, Processing Time 0.023 seconds

A Comparative Study of Estimation by Analogy using Data Mining Techniques

  • Nagpal, Geeta;Uddin, Moin;Kaur, Arvinder
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.621-652
    • /
    • 2012
  • Software Estimations provide an inclusive set of directives for software project developers, project managers, and the management in order to produce more realistic estimates based on deficient, uncertain, and noisy data. A range of estimation models are being explored in the industry, as well as in academia, for research purposes but choosing the best model is quite intricate. Estimation by Analogy (EbA) is a form of case based reasoning, which uses fuzzy logic, grey system theory or machine-learning techniques, etc. for optimization. This research compares the estimation accuracy of some conventional data mining models with a hybrid model. Different data mining models are under consideration, including linear regression models like the ordinary least square and ridge regression, and nonlinear models like neural networks, support vector machines, and multivariate adaptive regression splines, etc. A precise and comprehensible predictive model based on the integration of GRA and regression has been introduced and compared. Empirical results have shown that regression when used with GRA gives outstanding results; indicating that the methodology has great potential and can be used as a candidate approach for software effort estimation.

Robust Outlier-Object Detection in Image Pairs Based on Variable Threshold Using Empirical Correction Constant (실험적 교정상수를 사용한 가변문턱값에 기초한 영상 쌍에서의 강인한 이상 물체 검출)

  • Kim, Dong-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.14-22
    • /
    • 2009
  • By calculating the differences between two images, which are captured with the same scene at different time, we can detect a set of outliers, such as occluding objects due to moving vehicles. To reduce the influence from the different intensity properties of the images, a simple technique that reruns the regression, which is based on the polynomial regression model, is employed. For a robust detection of outliers, the image difference is normalized by the noise variance. Hence, an accurate estimate of the noise variance is very important. In this paper, using an empirically obtained correction constant is proposed. Numerical analysis using both synthetic and real images are also shown in this paper to show the robust performance of the detection algorithm.

Development of robust Calibration for Determination Apple Sweetness using Near Infrared Spectroscopy

  • Sohn, Mi-Ryeong;Kwon, Young-Kil;Cho, Rae-Kwang
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1614-1614
    • /
    • 2001
  • The sweetness (。Bix) of fruit is the main quality factor contributing to the fruit taste. The brix of the apple fruit can be measured non-destructively by near infrared (NIR) spectroscopy, allowing the sweetness grading of individual apple fruit. However, the fruit quality is influenced by various factors such as growing location, producing year, variety and harvest time etc., accordingly the robust NIR calibration is required. In this experimental results are presented the influence of two variations such as growing location and producing year of apple fruit in establishing of calibrations for sweetness, and developed a stable and highly accurate calibration. Apple fruit (Fuji) was collected every year from 1995 to 1997 in 3 different growing locations (Andong, Youngchun and Chungsong) of Kyungpook in Korea. NIR reflectance spectra of apple fruit were scanned in wavelength range of 1100∼2500nm using an InfraAlyzer 500C (Bran+Luebbe) with halogen lamp and PbS detector. The multiple linear regression and stepwise was carried out between the NIR raw spectra and the brix measured by refractometer to select the best regression equations. The calibration models by each growing district were well predicted to dependent sample set, but poorly predicted to independent sample set. Combined calibration model using data of three growing districts predicted reasonable well to a population set drawn from all growing districts(SEP = 0.69%, Bias=-0.075). The calibration models by each harvest year were not transferable across harvest year, however a combined calibration model using data of three harvest years was sufficiently robust to predict each sample sets(SEP = 0.53%, Bias = 0.004).

  • PDF

The Regional Homogeneity in the Presence of Heteroskedasticity

  • Chung, Kyoun-Sup;Lee, Sang-Yup
    • Korean System Dynamics Review
    • /
    • v.8 no.2
    • /
    • pp.25-49
    • /
    • 2007
  • An important assumption of the classical linear regression model is that the disturbances appearing in the population regression function are homoskedastic; that is, they all have the same variance. If we persist in using the usual testing procedures despite heteroskedasticity, what ever conclusions we draw or inferences we make be very misleading. The contribution of this paper will be to the concrete procedure of the proper estimation when the heteroskedasticity does exist in the data, because the quality of dependent variable predictions, i.e., the estimated variance of the dependent variable, can be improved by giving consideration to the issues of regional homogeneity and/or heteroskedasticity across the research area. With respect to estimation, specific attention should be paid to the selection of the appropriate strategy in terms of the auxiliary regression model. The paper shows that by testing for heteroskedasticity, and by using robust methods in the presence of with and without heteroskedasticity, more efficient statistical inferences are provided.

  • PDF

A Psychophysical Approach to the Evaluation of Perceived Focusing Quality of CRT Displays

  • Yoon, Kwang-Ho;Kim, Sang-Ho;Chang, Sung-Ho
    • Journal of Information Display
    • /
    • v.5 no.3
    • /
    • pp.35-40
    • /
    • 2004
  • In this study, we collected data used to formulate the relationship between quantitative metrological parameters in CRT display and the perceived focus quality. Human perception of the focusing quality was evaluated in terms of user feedback scores regarding the character legibility from four highly trained inspectors. Thirteen CRT monitors from five different manufacturers were compared relatively with respect to the norm monitor. The profile of electron beam such as spot size and the shape of distribution made by electron beam, contrast, convergence of RGB beams, and luminance characteristics were measured using a precision measurement system. Linear regression analysis and artificial neural network models were used to formulate the relationship between human perception and the quantitative measurements. The accuracy of the formulated linear regression model ($R^2$=0.515) was not satisfactory but the nonlinear neural network model ($R^2$=0.716) was fairly convincing and robust even the utilized data included subjective differences.

Model-based inverse regression for mixture data

  • Choi, Changhwan;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.1
    • /
    • pp.97-113
    • /
    • 2017
  • This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

Mechanical Parameter Identification of Servo Systems using Robust Support Vector Regression (Support Vector Regression을 이용한 서보 시스템의 기계적 상수 추정)

  • Cho, Kyung-Rae;Seok, Jul-Ki;Lee, Dong-Choon
    • Proceedings of the KIEE Conference
    • /
    • 2004.04a
    • /
    • pp.106-108
    • /
    • 2004
  • 서보 시스템의 전체 제어 성능은 기계적 상수의 변화와 부하 토크의 영향을 크게 받는다. 그러므로 서보 시스템의 성능을 향상시키기 위해서는 기계적 상수와 부하 토크를 정확히 알 필요가 있다. 본 논문에서는 Support Vector Regression (SVR)을 이용한 기계적 상수와 부하 토크의 추정 알고리즘을 제안한다. 여기서 제안된 추정 알고리즘인 SVR은 통계적인 학습 이론을 기반으로 한 새로운 추정 알고리즘으로 적은 샘플, 비선형, 국부해의 문제를 극복하고 강력한 성능을 발휘한다. 실험 결과는 제안된 SVR 알고리즘이 기계적 상수와 부하토크를 비교적 정확하게 추정하고 있음을 보여준다.

  • PDF

On the Robustness of $L_1$-estimator in Linear Regression Models

  • Bu-Yong Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.277-287
    • /
    • 1995
  • It is well kmown that the $L_1$-estimator is robust with respect to vertical outliers in regression data, even if it is susceptible to bad leverage points. This article is concerned with the robustness of the $L_1$-estimator. To investigate its robustness against vertical outliers we may find intervals for the value of the response variable within which the $L_1$-estimates do not shange. A procedure for constructing those intervals in multiple limear regression is illustrated in the sensitivity analysis context. And then vertical breakdown point of the $L_1$-estimator is defined on the basis of properties related to those intervals.

  • PDF

Taxi-demand forecasting using dynamic spatiotemporal analysis

  • Gangrade, Akshata;Pratyush, Pawel;Hajela, Gaurav
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.624-640
    • /
    • 2022
  • Taxi-demand forecasting and hotspot prediction can be critical in reducing response times and designing a cost effective online taxi-booking model. Taxi demand in a region can be predicted by considering the past demand accumulated in that region over a span of time. However, other covariates-like neighborhood influence, sociodemographic parameters, and point-of-interest data-may also influence the spatiotemporal variation of demand. To study the effects of these covariates, in this paper, we propose three models that consider different covariates in order to select a set of independent variables. These models predict taxi demand in spatial units for a given temporal resolution using linear and ensemble regression. We eventually combine the characteristics (covariates) of each of these models to propose a robust forecasting framework which we call the combined covariates model (CCM). Experimental results show that the CCM performs better than the other models proposed in this paper.

Identifying the Optimal Machine Learning Algorithm for Breast Cancer Prediction

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.80-88
    • /
    • 2024
  • Breast cancer remains a significant global health burden, necessitating accurate and timely detection for improved patient outcomes. Machine learning techniques have demonstrated remarkable potential in assisting breast cancer diagnosis by learning complex patterns from multi-modal patient data. This study comprehensively evaluates several popular machine learning models, including logistic regression, decision trees, random forests, support vector machines (SVMs), naive Bayes, k-nearest neighbors (KNN), XGBoost, and ensemble methods for breast cancer prediction using the Wisconsin Breast Cancer Dataset (WBCD). Through rigorous benchmarking across metrics like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), we identify the naive Bayes classifier as the top-performing model, achieving an accuracy of 0.974, F1-score of 0.979, and highest AUC of 0.988. Other strong performers include logistic regression, random forests, and XGBoost, with AUC values exceeding 0.95. Our findings showcase the significant potential of machine learning, particularly the robust naive Bayes algorithm, to provide highly accurate and reliable breast cancer screening from fine needle aspirate (FNA) samples, ultimately enabling earlier intervention and optimized treatment strategies.