Browse > Article
http://dx.doi.org/10.5392/IJoC.2015.11.2.057

Deriving a New Divergence Measure from Extended Cross-Entropy Error Function  

Oh, Sang-Hoon (Division of Information Communication Engineering Mokwon University)
Wakuya, Hiroshi (Graduate School of Science and Engineering Saga University)
Park, Sun-Gyu (Division of Architecture Mokwon University)
Noh, Hwang-Woo (Department of Visual Design Hanbat National University)
Yoo, Jae-Soo (School of Information and Communication Engineering Chungbuk National University)
Min, Byung-Won (Division of Information Communication Engineering Mokwon University)
Oh, Yong-Sun (Division of Information Communication Engineering Mokwon University)
Publication Information
Abstract
Relative entropy is a divergence measure between two probability density functions of a random variable. Assuming that the random variable has only two alphabets, the relative entropy becomes a cross-entropy error function that can accelerate training convergence of multi-layer perceptron neural networks. Also, the n-th order extension of cross-entropy (nCE) error function exhibits an improved performance in viewpoints of learning convergence and generalization capability. In this paper, we derive a new divergence measure between two probability density functions from the nCE error function. And the new divergence measure is compared with the relative entropy through the use of three-dimensional plots.
Keywords
Cross-Entropy; The n-th Order Extension of Cross-Entropy; Divergence Measure; Information Theory; Neural Networks;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 K. Torkkola, "Nonlinear Feature Transforms Using Maximum Mutual Information," Proc. IJCNN'01, vol. 4, 2001, pp. 2756-2761.
2 S.-J. Lee, M.-T. Jone, and H.-L. Tsai, “Constructing Neural Networks for Multiclass-Discretization Based on Information Theory,” IEEE Trans. Sys., Man, and Cyb.- Part B, vol. 29, 1999, pp. 445-453.   DOI
3 D. Erdogmus and J. C. Principe, "Information Transfer Through Classifiers and Its Relation to Probability of Error," Proc. IJCNN'01, vol. 1, 2001, pp. 50-54.
4 R. Kamimura and S. Nakanishi, “Hidden Information maximization for Feature Detection and Rule Discovery,” Network: Computation in Neural Systems, vol. 6, 1995, pp. 577-602.   DOI
5 T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, 1991.
6 S. Suzuki, “Constructive Function Approximation by Three-Layer Artificial Neural Networks,” Neural Networks, vol. 11, 1998, pp. 1049-1058   DOI
7 S.-H. Oh, “Contour Plots of Objective Functions for FeedForward Neural Networks,” Int. Journal of Contents, vol. 8, no. 4, Dec. 2012, pp. 30-35.   DOI
8 S.-H. Oh, “Statistical Analyses of Various Error Functions For Pattern Classifiers,” CCIS, vol. 206, 2011, pp. 129-133.
9 K. Hornik, “Approximation Capabilities of Multilayer Feedforward Networks,” Neural Networks, vol. 4, 1991, pp. 251-257   DOI
10 D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, Cambridge, MA, 1986.
11 A. van Ooyen and B. Nienhuis, “Improving the Convergence of the Backpropagation Algorithm,” Neural Networks, vol. 5, 1992, pp. 465-471.   DOI
12 S.-H. Oh, “Improving the Error Back-Propagation Algorithm with a Modified Error Function,” IEEE Trans. Neural Networks, vol. 8, 1997, pp. 799-803.   DOI
13 A. El-Jaroudi and J. Makhoul, "A New Error Criterion for Posterior probability Estimation with Neural Nets," Proc. IJCNN'90, vol. III, Jun. 1990, pp. 185-192.
14 M. Bichsel and P. Seitz, “Minimum Class Entropy: A maximum Information Approach to Layered Networks,” Neural Networks, vol. 2, 1989, pp. 133-141.   DOI
15 S. Ridella, S. Rovetta, and R. Zunino, “Representation and Generalization Properties of Class-Entropy Networks,” IEEE Trans. Neural Networks, vol. 10, 1999, pp. 31-47.   DOI
16 D. Erdogmus and J. C. Principe, "Entropy Minimization Algorithm for Multilayer Perceptrons," Proc. IJCNN'01, vol. 4, 2001, pp. 3003-3008.
17 K. E. Hild II, D. Erdogmus, K. Torkkola, and J. C. Principe, “Feature Extraction Using Information-Theoretic Learning,” IEEE Trans. PAMI, vol. 28, no. 9, 2006, pp. 1385-1392.   DOI
18 K. Hornik, M. Stinchcombe, and H. White, “Multilayer Feed-forward Networks are Universal Approximators,” Neural Networks, vol. 2, 1989, pp. 359-366.   DOI