[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/IJoC.2014.10.1.018

Effect of Nonlinear Transformations on Entropy of Hidden Nodes

Oh, Sang-Hoon (Department of Information Communication Engineering Mokwon University)

Publication Information

International Journal of Contents / v.10, no.1, 2014 , pp. 18-22 More about this Journal

Abstract

Hidden nodes have a key role in the information processing of feed-forward neural networks in which inputs are processed through a series of weighted sums and nonlinear activation functions. In order to understand the role of hidden nodes, we must analyze the effect of the nonlinear activation functions on the weighted sums to hidden nodes. In this paper, we focus on the effect of nonlinear functions in a viewpoint of information theory. Under the assumption that the nonlinear activation function can be approximated piece-wise linearly, we prove that the entropy of weighted sums to hidden nodes decreases after piece-wise linear functions. Therefore, we argue that the nonlinear activation function decreases the uncertainty among hidden nodes. Furthermore, the more the hidden nodes are saturated, the more the entropy of hidden nodes decreases. Based on this result, we can say that, after successful training of feed-forward neural networks, hidden nodes tend not to be in linear regions but to be in saturated regions of activation function with the effect of uncertainty reduction.

Keywords

Entropy; Hidden Nodes; Nonlinear Activation Function; Feed-Forward Neural Networks;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	K. Hornik, M. Stincombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, 1989, pp. 359-366. DOI ScienceOn
2	K. Hornik, "Approximation Capabilities of Multilayer Feedforward Networks," Neural Networks, vol. 4, 1991, pp. 251-257. DOI ScienceOn
3	S. Suzuki, "Constructive Function Approximation by Three-layer Artificial Neural Networks," Neural Networks, vol. 11, 1998, pp. 1049-1058. DOI ScienceOn
4	Y. Liao, S. C. Fang, and H. L. W. Nuttle, "Relaxed Conditions for Radial-Basis Function Networks to be Universal Approximators," Neural Networks, vol. 16, 2003, pp. 1019-1028. DOI ScienceOn
5	S. H. Oh and Y. Lee, "Effect of Nonlinear Transformations on Correlation Between Weighted Sums in Multilayer Perceptrons," IEEE Trans., Neural Networks, vol. 5, 1994, pp. 508-510. DOI ScienceOn
6	J. V. Shah and C. S. Poon, "Linear Independence of Internal Representations in Multilayer Perceptrons," IEEE Trans., Neural Networks, vol. 10, 1999, pp. 10-18. DOI ScienceOn
7	A. El-Jaroudi and J. Makhoul, "A New Error Criterion for Posterior probability Estimation with Neural Nets," Proc. IJCNN'90, vol. 3, June 1990, pp. 185-192.
8	M. Bichsel and P. Seitz, "Minimum Class Entropy: A maximum Information Approach to Layered Networks," Neural Networks, vol. 2, 1989, pp. 133-141. DOI ScienceOn
9	S. Ridella, S. Rovetta, and R. Zunino, "Representation and Generalization Properties of Class-Entropy Networks," IEEE Trans. Neural Networks, vol. 10, 1999, pp. 31-47. DOI ScienceOn
10	D. Erdogmus and J. C. Principe, "Entropy Minimization Algorithm for Multilayer Perceptrons," Proc. IJCNN'01, vol. 4, 2001, pp. 3003-3008.
11	D. Erdogmus and J. C. Principe, "Information Transfer Through Classifiers and Its Relation to Probability of Error," Proc. IJCNN'01, vol. 1, 2001, pp. 50-54.
12	K. E. Hild II, D. Erdogmus, K. Torkkola, and J. C. Principe, "Feature Extraction Using Information-Theoretic Learning," IEEE Trans. PAMI, vol. 28, no. 9, 2006, pp. 1385-1392. DOI ScienceOn
13	R. Li, W. Liu, and J. C. Principe, "A Unifiying Criterion for Instaneous Blind Source Separation Based on Correntropy," Signal Processing, vol. 87, no. 8, 2007, pp. 1872-1881. DOI ScienceOn
14	S. J. Lee, M. T. Jone, and H. L. Tsai, "Constructing Neural Networks for Multiclass-Discretization Based on Information Theory," IEEE Trans. Sys., Man, and Cyb.-Part B, vol. 29, 1999, pp. 445-453. DOI ScienceOn
15	R. Kamimura and S. Nakanishi, "Hidden Information maximization for Feature Detection and Rule Discovery," Network: Computation in Neural Systems, vol. 6, 1995, pp. 577-602. DOI ScienceOn
16	K. Torkkola, "Nonlinear Feature Transforms Using Maximum Mutual Information," Proc. IJCNN'01, vol. 4, 2001, pp. 2756-2761.
17	A. Papoulis, Probability, Random Variables, and Stochastic Processes, second ed., New York: McGraw-Hill, 1984.
18	Y. Lee, S. H. Oh, and M. W. Kim, "An Analysis of Premature Saturation in Back-Propagation Learning," Neural Networks, vol. 6, 1993, pp. 719-728. DOI ScienceOn
19	Y. Lee and S. H. Oh, "Input Noise Immunity of Multilayer Perceptrons," ETRI Journal, vol. 16, 1994, pp. 35-43. 과학기술학회마을 DOI
20	T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley and Sons, INC. 1991.
21	S. H. Oh, "Decreasing of Correlations Among Hidden Neurons of Multilayer Perceptrons," Journal of the Korea Contents Association, vol. 3, no. 3, 2003, pp. 98-102. 과학기술학회마을
22	S. Ekici, S. Yildirim, and M. Poyraz, "Energy and Entropy-Based Feature Extraction for Locating Fault on Transmission Lines by Using Neural Network and Wavelet packet Decomposition," Expert Systems with Applications, vol. 34, 2008, pp. 2937-2944 DOI ScienceOn
23	Y. Lee and H. K. Song, "Analysis on the Efficiency of Pattern Recognition Layers Using Information Measures," Proc. IJCNN'93 Nagoya, vol. 3, Oct. 1993, pp. 2129-2132.