Browse > Article
http://dx.doi.org/10.5392/IJoC.2014.10.1.018

Effect of Nonlinear Transformations on Entropy of Hidden Nodes  

Oh, Sang-Hoon (Department of Information Communication Engineering Mokwon University)
Publication Information
Abstract
Hidden nodes have a key role in the information processing of feed-forward neural networks in which inputs are processed through a series of weighted sums and nonlinear activation functions. In order to understand the role of hidden nodes, we must analyze the effect of the nonlinear activation functions on the weighted sums to hidden nodes. In this paper, we focus on the effect of nonlinear functions in a viewpoint of information theory. Under the assumption that the nonlinear activation function can be approximated piece-wise linearly, we prove that the entropy of weighted sums to hidden nodes decreases after piece-wise linear functions. Therefore, we argue that the nonlinear activation function decreases the uncertainty among hidden nodes. Furthermore, the more the hidden nodes are saturated, the more the entropy of hidden nodes decreases. Based on this result, we can say that, after successful training of feed-forward neural networks, hidden nodes tend not to be in linear regions but to be in saturated regions of activation function with the effect of uncertainty reduction.
Keywords
Entropy; Hidden Nodes; Nonlinear Activation Function; Feed-Forward Neural Networks;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 K. Hornik, M. Stincombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, 1989, pp. 359-366.   DOI   ScienceOn
2 K. Hornik, "Approximation Capabilities of Multilayer Feedforward Networks," Neural Networks, vol. 4, 1991, pp. 251-257.   DOI   ScienceOn
3 S. Suzuki, "Constructive Function Approximation by Three-layer Artificial Neural Networks," Neural Networks, vol. 11, 1998, pp. 1049-1058.   DOI   ScienceOn
4 Y. Liao, S. C. Fang, and H. L. W. Nuttle, "Relaxed Conditions for Radial-Basis Function Networks to be Universal Approximators," Neural Networks, vol. 16, 2003, pp. 1019-1028.   DOI   ScienceOn
5 S. H. Oh and Y. Lee, "Effect of Nonlinear Transformations on Correlation Between Weighted Sums in Multilayer Perceptrons," IEEE Trans., Neural Networks, vol. 5, 1994, pp. 508-510.   DOI   ScienceOn
6 J. V. Shah and C. S. Poon, "Linear Independence of Internal Representations in Multilayer Perceptrons," IEEE Trans., Neural Networks, vol. 10, 1999, pp. 10-18.   DOI   ScienceOn
7 A. El-Jaroudi and J. Makhoul, "A New Error Criterion for Posterior probability Estimation with Neural Nets," Proc. IJCNN'90, vol. 3, June 1990, pp. 185-192.
8 M. Bichsel and P. Seitz, "Minimum Class Entropy: A maximum Information Approach to Layered Networks," Neural Networks, vol. 2, 1989, pp. 133-141.   DOI   ScienceOn
9 S. Ridella, S. Rovetta, and R. Zunino, "Representation and Generalization Properties of Class-Entropy Networks," IEEE Trans. Neural Networks, vol. 10, 1999, pp. 31-47.   DOI   ScienceOn
10 D. Erdogmus and J. C. Principe, "Entropy Minimization Algorithm for Multilayer Perceptrons," Proc. IJCNN'01, vol. 4, 2001, pp. 3003-3008.
11 D. Erdogmus and J. C. Principe, "Information Transfer Through Classifiers and Its Relation to Probability of Error," Proc. IJCNN'01, vol. 1, 2001, pp. 50-54.
12 K. E. Hild II, D. Erdogmus, K. Torkkola, and J. C. Principe, "Feature Extraction Using Information-Theoretic Learning," IEEE Trans. PAMI, vol. 28, no. 9, 2006, pp. 1385-1392.   DOI   ScienceOn
13 R. Li, W. Liu, and J. C. Principe, "A Unifiying Criterion for Instaneous Blind Source Separation Based on Correntropy," Signal Processing, vol. 87, no. 8, 2007, pp. 1872-1881.   DOI   ScienceOn
14 S. J. Lee, M. T. Jone, and H. L. Tsai, "Constructing Neural Networks for Multiclass-Discretization Based on Information Theory," IEEE Trans. Sys., Man, and Cyb.-Part B, vol. 29, 1999, pp. 445-453.   DOI   ScienceOn
15 R. Kamimura and S. Nakanishi, "Hidden Information maximization for Feature Detection and Rule Discovery," Network: Computation in Neural Systems, vol. 6, 1995, pp. 577-602.   DOI   ScienceOn
16 K. Torkkola, "Nonlinear Feature Transforms Using Maximum Mutual Information," Proc. IJCNN'01, vol. 4, 2001, pp. 2756-2761.
17 A. Papoulis, Probability, Random Variables, and Stochastic Processes, second ed., New York: McGraw-Hill, 1984.
18 Y. Lee, S. H. Oh, and M. W. Kim, "An Analysis of Premature Saturation in Back-Propagation Learning," Neural Networks, vol. 6, 1993, pp. 719-728.   DOI   ScienceOn
19 Y. Lee and S. H. Oh, "Input Noise Immunity of Multilayer Perceptrons," ETRI Journal, vol. 16, 1994, pp. 35-43.   과학기술학회마을   DOI
20 T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley and Sons, INC. 1991.
21 S. H. Oh, "Decreasing of Correlations Among Hidden Neurons of Multilayer Perceptrons," Journal of the Korea Contents Association, vol. 3, no. 3, 2003, pp. 98-102.   과학기술학회마을
22 S. Ekici, S. Yildirim, and M. Poyraz, "Energy and Entropy-Based Feature Extraction for Locating Fault on Transmission Lines by Using Neural Network and Wavelet packet Decomposition," Expert Systems with Applications, vol. 34, 2008, pp. 2937-2944   DOI   ScienceOn
23 Y. Lee and H. K. Song, "Analysis on the Efficiency of Pattern Recognition Layers Using Information Measures," Proc. IJCNN'93 Nagoya, vol. 3, Oct. 1993, pp. 2129-2132.