Browse > Article
http://dx.doi.org/10.3745/JIPS.2012.8.4.621

A Comparative Study of Estimation by Analogy using Data Mining Techniques  

Nagpal, Geeta (Dept. of Computer Science and Engineering, National Institute of Technology)
Uddin, Moin (Delhi Technological University)
Kaur, Arvinder (University School of IT, Gurugobind Singh Indraprastha University)
Publication Information
Journal of Information Processing Systems / v.8, no.4, 2012 , pp. 621-652 More about this Journal
Abstract
Software Estimations provide an inclusive set of directives for software project developers, project managers, and the management in order to produce more realistic estimates based on deficient, uncertain, and noisy data. A range of estimation models are being explored in the industry, as well as in academia, for research purposes but choosing the best model is quite intricate. Estimation by Analogy (EbA) is a form of case based reasoning, which uses fuzzy logic, grey system theory or machine-learning techniques, etc. for optimization. This research compares the estimation accuracy of some conventional data mining models with a hybrid model. Different data mining models are under consideration, including linear regression models like the ordinary least square and ridge regression, and nonlinear models like neural networks, support vector machines, and multivariate adaptive regression splines, etc. A precise and comprehensible predictive model based on the integration of GRA and regression has been introduced and compared. Empirical results have shown that regression when used with GRA gives outstanding results; indicating that the methodology has great potential and can be used as a candidate approach for software effort estimation.
Keywords
Software Estimations; Estimation by Analogy; Grey Relational Analysis; Robust Regression; Data Mining Techniques;
Citations & Related Records
연도 인용수 순위
  • Reference
1 V.Verardi, and C. Croux, "Robust regression in Stata", Stata Journal, StataCorp LP, Vol.9, No.3, 2009, pp.439-453.
2 PROMISE Repository of empirical software engineering data http://promisedata.org/ repository
3 Dolado JJ (2001) "On the problem of the software cost function". Journal of Information and Software Technology, Vol.43, pp.61-72.   DOI   ScienceOn
4 Mair C, Kadoda G, Lefley M, Phalp K, Schofield C, Shepperd M, Webster S , "An investigation of machine learning based prediction systems". J Syst Software, Vol.53, 2000, pp.23-29.   DOI   ScienceOn
5 G. Nagpal, M. Uddin and A. Kaur, "A hybrid technique using Grey Relational analysis and Regression for Software Effort Estimation using Feature Selection" International Journal of Soft Computing and Engineering (IJSCE), Vol.1, No.6, 2012.
6 C. Burgess and M. Lefley, "Can Genetic Programming Improve Software Effort Estimation? A Comparative Evaluation," Information and Software Technology, Vol.43, 2001, pp.863-873.   DOI   ScienceOn
7 Y. Shan, R.J. McKay, C. J. Lokan and D.L. Essam, "Software Project Effort Estimation Using Genetic Programming", IEEE, Available at: http://www.isbsg.org.au, 2002.
8 A.Idri, A. Abran and T. M. Khoshgoftaar, "Estimating Software Project Effort by Analogy Based on Linguistic Values", Eighth IEEE International Symposium on Software Metrics (METRICS'02), 2002.
9 X. Huang, L. F. Capretz and J. Ren, "A Neuro Fuzzy Model for Software Cost Estimation", Proceedings of the third International Conference on Quality Software (QSIC'03) 0-7695 2015-4/03, IEEE, 2003.
10 Z. Chen, T. Menzies, D. Port, and B. Boehm, "Feature Subset Selection Can Improve Software Cost Estimation Accuracy," ACM SIGSOFT Software Eng. Notes, Vol.30, No.4, 2005, pp.1-6
11 P. Sentas, L. Angelis, I. Stamelos, and G. Bleris, "Software Productivity and Effort Prediction with Ordinal Regression," Information and Software Technology, Vol.47, 2005, pp.17-29.   DOI   ScienceOn
12 A. F. Sheta, "Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects", Journal of Computer Science 2 (2): 118-123, ISSN 1549-3636, 2006.   DOI
13 M. Auer, A. Trendowicz, B. Graser, E. Haunschmid, and S. Biffl,"Optimal Project Feature Selection Weigths in Analogy-Based Cost Estimation: Improvement and Limitations," IEEE Trans.Software Eng., Vol.32, No.2, 2006, pp.83-92.   DOI   ScienceOn
14 N.-H. Chiu and S.-J. Huang, "The Adjusted Analogy-Based Software Effort Estimation Based on Similarity Distances," The J. Systems and Software, Vol.80, 2007, pp.628-640.   DOI   ScienceOn
15 K. Chaudhary, "GA Based Optimization of Software Development Effort Estimation", GA Based Optimization of Software Effort Estimation, IJCSI, Vol.1, 2010, pp.38-40.
16 M. Azzeh, D. Neagu and P. Cowling, "Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm", PROMISE'08 ,Leipzig, Germany, 2008.
17 V. Ch, M.K. Hari, T. S. Sethi, B. S. S. Kaushal and A. Sharma, "CPN-A Hybrid Model for Software Cost Estimation", 978-1-4244-9477-4/11, IEEE, 2011.
18 D.F.Andrews, P.J. Bickel, F. R. Hampel, P.J. Huber, W.H. Rogers and J. W.Tukey, Robust Estimates of Location: Survey and Advances. Princeton University Press, Princeton, New Jersey,1972.
19 P. J. Huber, "Robust Estimation of a Location Parameter". Annals of Mathematical Statistics, Vol.35, 1964, pp.73-101.   DOI   ScienceOn
20 P. J. Huber, Robust regression: Asymptotics, conjectures and Monte Carlo, The Annals of Statistics, Vol.1, 1981, pp.799-821.
21 $MATLAB^{{\circledR}}$ Documentation, http://www.mathworks.com/help/techdoc/
22 G.A.N.Mbamalu and M.E.El. Hawary, "Load Forecasting via Suboptimal Seasonal Autoregressive models and Iteratively Reweighted Least Squares Estimation" IEEE Transactions on Power Systems,Vol.8, No.1, 1993, pp.343-347.   DOI   ScienceOn
23 M. J. Shepperd and C.Schofield, "Estimating Software Project Effort Using Analogies", IEEE Transaction on Software Engineering ,Vol.23, 1997, pp.736-743.   DOI   ScienceOn
24 L. Angelis, I. Stamelo, "A simulation tool for efficient analogy based cost estimation," Empirical Software Engineering, Vol.5, 2000, pp.35-68.   DOI   ScienceOn
25 J.W. Keung, B. A. Kitchenham, D. R. Jeffery, "Analogy-X:Providing Statistical Inference to Analogy- Based Software Cost Estimation", IEEE Transactions on Software Engineering, Vol.34, No.4, 2008.
26 B. Baskeles, B. Turhan, A. Bener, "Software effort estimation using machine learning methods," 22nd international symposium on Computer and information sciences, 2007, pp.1-6.
27 A. Idri, A. Abran, T. M. Khoshgoftaar, "Estimating Software Project Effort by Analogy Based on Linguistic Values,", Eighth IEEE International Symposium on Software Metrics (METRICS'02), 2002, pp.21.
28 C. J. Hsu and C. Y. Huang, "Comparison and Assessment of Improved Grey Relation Analysis for Software Development Effort Estimation," Proceedings of the 3rd International Conference on Management of Innovation and Technology (ICMIT'06), 2006, pp.663-667.
29 M. Azzeh, D. Neagu and P. I. Cowling, "Analogy-based software effort estimation using Fuzzy numbers". Journal of Systems and Software, Vol.84, No.2, 2011, pp.270-284 [doi: 10.1016/j.jss.2010. 09.028]   DOI   ScienceOn
30 Q.Song, M.Shepperd and C.Mair,"Using Grey Relational Analysis to Predict Software Effort with Small Data Sets". Proceedings of the 11th International Symposium on Software Metrics (METRICS'05), 2005, pp.35-45.
31 Q.Song and M. J.Shepperd, "Predicting software project effort: A grey relational analysis based method". Expert Syst. Appl. Vol.38, No.6, 2011, pp.7302-7316. [ doi:10.1016/j.eswa.2010.12.005]   DOI   ScienceOn
32 S. J. Huang, N. H. Chiu and L.W. Chen, "Integration of the grey relational analysis with genetic algorithm for software effort estimation". European Journal of operational and research Vol.188, 2007, pp.898-909. [doi:10.1145/1540438.1540440]   DOI
33 M. V. Kosti, N. Mittas, L. Angelis, " DD-EbA: An algorithm for determining the number of neighbors in cost estimation by analogy using distance distributions", 3d Artificial Intelligence Techniques in Software Engineering Workshop,7 October, 2010, Larnaca, Cyprus.
34 G. Li, J.Ruhe, A. Al-Emran and M.M.Richter, "A flexible method for software effort estimation by analogy", Empirical Software Engineering, Vol.12, No.65, 2007, pp.106. [doi:10.1007/s10664-006- 7552-4]   DOI
35 M. Azzeh, D. Neagu and P. I. Cowling, "Fuzzy grey relational analysis for software effort estimation", Journal of Empirical software Engineering, Vol.15, No.1, 2010. [ doi:10.1007/s10664-009-9113-0]   DOI
36 J.Deng,. "Introduction to Grey System theory", The Journal of Grey System,Vol.1, No.1, 1989, pp.1-24.
37 K. Srinivasan and D. Fisher, "Machine Learning Approaches to Estimating Software Development Effort," IEEE Trans. Software Eng., Vol.21, No.2, 1995, pp.126-137.   DOI   ScienceOn
38 G. Wittig and G. Finnie, "Estimating Software Development Effort with Connectionist Models," Information and Software Technology, Vol.39, No.7, 1997, pp.469-476.   DOI   ScienceOn
39 J. L. Deng, "Control problems of grey system". System and Control Letters, Vol.1, 1982, pp.288-94.   DOI   ScienceOn
40 J.Deng, "Grey information space", The Journal of Grey System Vol.1, No.1, 1989, pp.103-117.
41 J. M. Jou, P. Y.Chen, and J. M.Sun, "The grey prediction search algorithm for block motion estimation". IEEE Transactions on Circuits and Systems for Video Technology, Vol.9, No.6, 1999, pp.843-848.   DOI   ScienceOn
42 S. L. Su, Y. C. Su, and J. F.Huang, "Grey-based power control for DS-CDMA cellular mobile systems". IEEE Transactions on Vehicular Technology, Vol.49, No.6,2000, pp.2081-2088.   DOI   ScienceOn
43 B.C.Jiang, , S. L.Tasi and C. C.Wang, "Machine vision-based gray relational theory applied to IC marking inspection". IEEE Transactions on Semiconductor Manufacturing, Vol.15, No.4, 2002, pp.531-539   DOI   ScienceOn
44 R. C.Luo, T. M.Chen, and K. L. Su, "Target tracking using a hierarchical grey-fuzzy motion decision making method". IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol.31, No.3, 2001, pp.179-186.   DOI   ScienceOn
45 Y. F.Wang, "On-demand forecasting of stock prices using a real-time predictor". IEEE Transactions on Knowledge and Data Engineering, Vol.15, No.4, 2003, pp.1033-1037.   DOI   ScienceOn
46 T. Mukhopadhyay, S. Vicinanza and M .J. Prietula, "Examining the feasibility of a case-based reasoning model for software effort estimation", MIS Quarterly, Vol.16, No.2, 1992, pp.155-171.   DOI   ScienceOn
47 S. J.Huang and C. L.Huang, "Control of an inverted pendulum using grey prediction model". IEEE Transactions on Industry Applications, Vol.36, No.2, 2000, pp.452-458.   DOI   ScienceOn