Browse > Article

Analyzing Influence of Outlier Elimination on Accuracy of Software Effort Estimation  

Seo, Yeong-Seok (KAIST 전산학과)
Yoon, Kyung-A (KAIST 전산학과)
Bae, Doo-Hwan (KAIST 전산학과)
Abstract
Accurate software effort estimation has always been a challenge for the software industrial and academic software engineering communities. Many studies have focused on effort estimation methods to improve the estimation accuracy of software effort. Although data quality is one of important factors for accurate effort estimation, most of the work has not considered it. In this paper, we investigate the influence of outlier elimination on the accuracy of software effort estimation through empirical studies applying two outlier elimination methods(Least trimmed square regression and K-means clustering) and three effort estimation methods(Least squares regression, Neural network and Bayesian network) associatively. The empirical studies are performed using two industry data sets(the ISBSG Release 9 and the Bank data set which consists of the project data collected from a bank in Korea) with or without outlier elimination.
Keywords
outlier elimination; effort estimation; data quality;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Gray and S. MacDonell, "Application of Fuzzy Logic to Software Metric Models for Development Effort Estimation," Annual Meeting of the North American Fuzzy Information Processing Society, pp. 394-399, 1997
2 J. Heaton, Introduction to Neural Networks with Java, Chesterfield, MO : Heaton Research, Inc, 2005
3 S. Chulani, B. Boehm, and B. Steece, "Bayesian Analysis of Empirical Software Engineering Cost Models," IEEE Transactions on Software Engineering, Vol.25, No.4, pp. 573-583, 1999   DOI   ScienceOn
4 Q. Song and M. Shepperd, "A new imputation method for small software project data sets," Journal of Systems and Software, Vol.80, No.1, pp. 51-62, 2007   DOI   ScienceOn
5 M.Mendes and A.Pala, "Type I Error Rate and Power of Three Normality Tests," Pakistan Journal of Information and Technology, Vol.2, No.2, pp. 135-139, 2003   DOI
6 T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, "A Simulation Study of the Model Evaluation Criterion MMRE," IEEE Transactions on Software Engineering, Vol.29, No.11, pp. 985-995, 2003   DOI   ScienceOn
7 P.C. Pendharkar, G.H. Subramanian, and J.A. Rodger, "A Probabilistic Model for Predicting Software Development Effort," IEEE Transactions on Software Engineering, Vol.31, No.7, pp. 615-624, 2005   DOI
8 V.K.Y. Chan and W.E. Wong, "Outlier Elimination in Construction of Software Metric Models," Proceedings of the 22nd ACM Symposium on Applied Computing, pp. 1484-1488, 2007
9 E. Mendes, C. Lokan, R. Harrison, and C. Triggs, "A Replicated Comparison of Cross-company and Within-company Effort Estimation Models using the ISBSG Database," 11th IEEE International Software Metrics Symposium, 2005
10 P.J. Rousseeuw and K. van Driessen, "Computing LTS Regression for Large Data Sets," Data Mining and Knowledge Discovery, Vol.12, No.1, pp. 29-45, 2006   DOI   ScienceOn
11 A.R. Gray and S.G. MacDonell, "A Comparison of Techniques for Developing Predictive Models of Software Metrics," Information and Software Technology, Vol.39, No.6, pp. 425-437, 1997   DOI   ScienceOn
12 S.D. Conte, H.E. Dunsmore, and V.Y. Shen, Software Engineering Metrics and Models. Benjamin/ Cummings Publishing Company, 1986
13 International Software Benchmarking Standards Group, http://www.isbsg.org, 2005
14 P.J. Rousseeuw, "Multivariate Estimation with High Breakdown Point," Mathematical Statistics and Applications, pp. 283-297, 1985
15 A.K. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, Vol.31, No.3, pp. 264-323, 1999   DOI   ScienceOn
16 M. Jorgensen, "Experience With the Accuracy of Software Maintenance Task Effort Prediction Models," IEEE Transactions on Software Engineering, Vol.21, No.8, pp. 674-681, 1995   DOI   ScienceOn
17 P.J. Rousseeuw, "Least Median Squares Regression," Journal of American Statistical Association, Vol.79, No.388, pp. 871-880, 1984   DOI   ScienceOn
18 P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, NY : John Wiley & Sons, Inc, 1987
19 S. Lamrous and M. Taileb, "Divisive Hierarchical K-means," International Conference on Intelligent Agents, Web Technologies and Internet Commerce, 2006
20 C. van Koten and A.R. Gray, "Bayesian Statistical Effort Prediction Models for Data-centred 4GL software development," Information and Software Technology, Vol.48, No.11, pp. 1056-1067, 2006   DOI   ScienceOn
21 B. Kitchenham, S.G. MacDonell, L. Pickard, and M.J. Shepperd, "Assessing Prediction Systems," The Information Science Discussion Paper Series, University of Otago, 1999