DOI QR코드

DOI QR Code

Tuning the Architecture of Neural Networks for Multi-Class Classification

다집단 분류 인공신경망 모형의 아키텍쳐 튜닝

  • Received : 2012.10.09
  • Accepted : 2012.11.09
  • Published : 2013.03.31

Abstract

The purpose of this study is to claim the validity of tuning the architecture of neural network models for multi-class classification. A neural network model for multi-class classification is basically constructed by building a series of neural network models for binary classification. Building a neural network model, we are required to set the values of parameters such as number of hidden nodes and weight decay parameter in advance, which draws special attention as the performance of the model can be quite different by the values of the parameters. For better performance of the model, it is absolutely necessary to have a prior process of tuning the parameters every time the neural network model is built. Nonetheless, previous studies have not mentioned the necessity of the tuning process or proved its validity. In this study, we claim that we should tune the parameters every time we build the neural network model for multi-class classification. Through empirical analysis using wine data, we show that the performance of the model with the tuned parameters is superior to those of untuned models.

Keywords

References

  1. 김다윗, 민성환, 한인구, "신경망 분리모형과 사례 기반추론을 이용한 기업 신용 평가", Journal of Information Technology Applications and Management , 제14권, 제2호(2007), pp.151-168.
  2. Altman, E.I., G. Marco, and F. Varetto, "Corporate distress diagnosis comparisons using discriminant analysis and neural networks," Journal of Banking and Finance, Vol.18, No.3 (1994), pp.505-529. https://doi.org/10.1016/0378-4266(94)90007-8
  3. Bartlett, P.L., "For valid generalization, the size of the weights is more important than the size of the network," in M.C. Mozer, M.I. Jordan and T. Petsche (Eds.), Advances in Neural Information Processing Systems, Vol.9, The MIT Press, Cambridge, MA, 1997.
  4. Berry, M.J.A. and G.S. Linoff, Data Mining Techniques : For Marketing, Sales, and Cus tomer Relationship Management, Wiley, Indiana, 2004.
  5. Bishop, C.M., Neural networks for pattern recognition, Oxford University Press, New York, 1995.
  6. Cheng, B. and D.M. Titterington, "Neural Networks : A Review from a Statistical Perspective," Statistical Science, Vol.9, No.1(1994), pp.2-30. https://doi.org/10.1214/ss/1177010638
  7. Cortez, P., A. Cerdeira, F. Almeida, T. Matos, and J. Reis, "Modeling wine preferences by data mining from physicochemical properties," Decision Support Systems, Vol.47, No.4 (2009), pp.547-553. https://doi.org/10.1016/j.dss.2009.05.016
  8. Cybenko, G., "Approximation by superpositions of a sigmoid function," Mathematics of Control, Signals, and Systems, Vol.2, No.4 (1989), pp.303-314. https://doi.org/10.1007/BF02551274
  9. De Veaux, R.D., J. Schumi, J. Schweinsberg, and L.H. Ungar, "Prediction intervals for neural networks via nonlinear regression," American Statistical Association and the American Society for Quality, Vol.40, No.44(1998), pp.273- 282.
  10. De Villiers, J. and E. Barnard, "Backpropagation neural nets with one and two hidden layers," IEEE Transactions on Neural Networks, Vol.4, No.1(1993), pp.136-141. https://doi.org/10.1109/72.182704
  11. Geman, S., E. Bienenstock, and R. Doursat, "Neural networks and the bias/variance dilemma," IEEE Neural Computation, Vol.4, No.1(1992), pp.1-58. https://doi.org/10.1162/neco.1992.4.1.1
  12. Hinton, G.E., "Learning translation invariant recognition in massively parallel networks," In J.W. de Bakker, A.J. Nijman and P.C. Treleaven (Eds.), Proceedings PARLE Conference on Parallel Architectures and Languages Europe, Springer-Verlag, Berlin, 1987.
  13. Hush, D.R. and B.G. Horne, "Progress in Supervised Neural Networks," IEEE Signal Processing Magazine, Vol.10, No.1(1993), pp.8-39.
  14. Jeong, C., J.H. Min, and M.S. Kim, "A tuning method for the architecture of neural network models incorporating GAM and GA as applied to bankruptcy prediction," Expert Systems with Applications, Vol.39, No.3(2012), pp.3650-3658. https://doi.org/10.1016/j.eswa.2011.09.056
  15. Jo, H. and I. Han, "Integration of case-based forecasting, neural network, and discriminant analysis for bankruptcy prediction," Expert Systems with Applications, Vol.11, No.4(1996), pp.415-422. https://doi.org/10.1016/S0957-4174(96)00056-5
  16. Kim, J., H.R. Weistroffer, and R.T. Redmond, "Expert Systems for Bond Rating : A Comparative Analysis of Statistical, Rule-based and Neural Network Systems," Expert Systems, Vol.10, No.3(1993), pp.167-172. https://doi.org/10.1111/j.1468-0394.1993.tb00093.x
  17. Masters, T., Practical Neural Network Recipes in C++, Academic Press, Boston, 1993.
  18. Ou, G. and Y.L. Murphey, "Multi-class pattern classification using neural networks," Pattern Recognition, Vol.40, No.1(2007), pp. 4-18. https://doi.org/10.1016/j.patcog.2006.04.041
  19. Singleton, J.C. and A.J. Surkan, "Neural Networks for Bond Rating Improved by Multiple Hidden Layers," Proceedings of the IEEE International Conference on Neural Networks, Vol.2(1990), pp.163-168.
  20. Zhao, H., A. Sinha, and W. Ge, "Effects of feature construction on classification performance : An empirical study in bank failure prediction," Expert Systems with Applications, Vol.36, No.2(2009), pp.2633-2644. https://doi.org/10.1016/j.eswa.2008.01.053