DOI QR코드

DOI QR Code

A NEW EXPONENTIAL DIRECTED DIVERGENCE INFORMATION MEASURE

  • JAIN, K.C. (Department of Mathematics, Malaviya National Institute of Technology) ;
  • CHHABRA, PRAPHULL (Department of Mathematics, Malaviya National Institute of Technology)
  • Received : 2015.09.18
  • Accepted : 2016.01.24
  • Published : 2016.05.30

Abstract

Depending upon the nature of the problem, different divergence measures are suitable. So it is always desirable to develop a new divergence measure. In the present work, new information divergence measure, which is exponential in nature, is introduced and characterized. Bounds of this new measure are obtained in terms of various symmetric and non- symmetric measures together with numerical verification by using two discrete distributions: Binomial and Poisson. Fuzzy information measure and Useful information measure corresponding to new exponential divergence measure are also introduced.

Keywords

1. Introduction

Divergence measures are basically measures of distance between two probability distributions or compare two probability distributions. Divergence measure must increase as probability distributions move apart.

Divergence measures have been demonstrated very useful in a variety of disciplines such as Bayesian model validation [50], quantum information theory [33,35], model validation [4], robust detection [39], economics and political science [48,49], biology [38], analysis of contingency tables [18], approximation of probability distributions [11,29], signal processing [27,28], pattern recognition [2,7,10,26], color image segmentation [34], 3D image segmentation and word alignment [47], cost- sensitive classification for medical diagnosis [42], magnetic resonance image analysis [51] etc.

Also we can use divergence measures in fuzzy mathematics as fuzzy directed divergences and fuzzy entropies [1,20,25], which are very useful to find the amount of average ambiguity or difficulty in making a decision whether an element belongs to a set or not. Fuzzy information measures have recently found applications to fuzzy aircraft control, fuzzy traffic control, engineering, medicines, computer science, management and decision making etc. Divergence measures are also very useful to find the utility of an event [6,44], i.e., an event is how much useful compare to other event.

Without essential loss of insight, we have restricted ourselves to discrete probability distributions, so let , n ≥ 2 be the set of all complete finite discrete probability distributions. The restriction here to discrete distributions is only for convenience, similar results hold for continuous distributions as well. If we take pi ≥ 0 for some i = 1, 2, 3..., n, then we have to suppose that

Some generalized f- information divergence measures had been introduced, characterized and applied in variety of fields, such as: Csiszar’s f- divergence [12,13], Bregman’s f- divergence [8], Burbea- Rao’s f- divergence [9], Renyi’s like f- divergence [40], M- divergence [41], Jain- Saraswat f- divergence [22] etc.

Besides these, The f- divergence measure [3] with respect to two functions (f, g) is also introduced, which is

where g is an increasing function on R and f is real, continuous, and convex function on R+. We obtain many standard divergence measures by suitably defining the function f and g, such as: for f (t) = −t1−r, g (t) = − log (−t), 0 ≤ r ≤ 1, we get called Chernoff Coefficient and at well known the Bhattacharyya distance [5]. Similarly for , we obtain so called Generalized Matusita distance and at r = 1, we obtain the well known Variational distance or l1 distance [30]. Csiszar’s f- divergence is widely used due to its compact nature, which is given by

where f : (0, ∞) → R (set of real no.) is real, continuous, and convex function and P = (p1, p2, ..., pn), Q = (q1, q2, ..., qn) ∈ Γn, where pi and qi are probabilities.

Cf (P,Q) is a natural distance measure from a true probability distribution P to an arbitrary probability distribution Q. Typically P represents observations or a precise calculated probability distribution, whereas Q represents a model, a description or an approximation of P. Fundamental properties of Cf (P,Q) can be seen in literature [36], in detail.

Remark 1.1. For comparing multiple number of discrete probability distributions, following will be the Csiszar’s generalized f- divergence [15]

and following relation can be seen as well in the same literature

Divergences between more than two probability distributions are useful for discrimination and taxonomy.

Definition 1.1. Convex function: A function f (t) is said to be convex over an interval (a, b) if for every t1, t2 ∈ (a, b) and 0 ≤ λ ≤ 1, we have

and said to be strictly convex if equality does not hold only if λ ≠ 0 or λ ≠ 1. Geometrically, it means that if A, B, C are three distinct points on the graph of convex function f with B between A and C, then B is on or below chord AC.

Definition 1.2. Jensen inequality: Let f : I ⊂ R → R be differentiable convex on I0 (I0 is the interior of the interval I), ti ∈ I0 , λi > 0 ∀ i = 1, 2, ..., n and , then we have the following inequality.

If function is concave, then Jensen’s inequality will be reversed.

Corollary 1.3. After replacing λi with qi as and ti with for each i = 1, ..., n by assuming that the function is normalized, i.e., f (1) = 0, we get

The following theorem is well known in literature [13].

Theorem 1.4. If the function f is convex and normalized, i.e., f′′ (t) ≥ 0 ∀ t > 0 and f (1) = 0 respectively, then Cf (P,Q) and its adjoint Cf (Q, P) are both non-negative and convex in the pair of probability distribution (P,Q) ∈ Γn × Γn.

The following theorem is given by Taneja (2005), which relates two generalized f- divergence measures.

Theorem 1.5. Let f1, f2 : I ⊂ R+ → R be two convex differentiable and normalized functions, i.e., and f1 (1) = f2 (1) = 0 respectively and suppose the following assumptions.

(i) f1 and f2 are twice differentiable on (α, β), 0 < α ≤ 1 < β < ∞.

(ii) There exists the real constants m, M such that m < M and

If P,Q ∈ Γn, then we have the following inequalities

where Cf (P,Q) is given by (1).

 

2. New exponential divergence measure and properties

In this section, we introduce a new exponential divergence measure of Csiszar’s class and define the properties.

Let f : (0, ∞) → R be a real differentiable mapping, which is defined as

and

We can check that the function f1 (t) is exponential in nature and convex normalized because and f1 (1) = 0 respectively. Further f1 (t) is monotonically increasing in (0, ∞) as

After putting this exponential function in (1), we obtain

In view of corollary (1.3) and theorem (1.4), we see that Gexp (P,Q) is positive and convex for the pair of probability distribution (P,Q) ∈ Γn × Γn and equal to zero (Non- degeneracy) or attains its minimum value when pi = qi. We can also see that Gexp (P,Q) is non- symmetric divergence w.r.t. P and Q because Gexp (P,Q) ≠ Gexp (Q, P).

Remark 2.1. If function f (t) is convex in interval (0, ∞), then will be a convex function as well because , called conjugate of f (t). By putting this conjugate convex function in (1), we get , and we can see

is a symmetric exponential divergence.

Consequently, we obtain the following intra relations among new exponential divergences by applying remark (1.1), for comparing multiple number of discrete probability distributions taking normalized function

Remark 2.2. Bajaj and Hooda (2010) have defined ’useful’ fuzzy directed divergence of fuzzy set A from fuzzy set B. We can also define a new exponential measure of ’useful’ fuzzy directed divergence on the same lines, for this let A and B be two standard fuzzy sets with same supporting points x1, x2, ..., xn and with fuzzy vectors µA (x1) , ..., µA (xn) and µB (x1) , ..., µB (xn), then fuzzy information measure corresponding to new exponential measure (7), will be

Consequently, let ui > 0 be the utilities of the events Ei with probabilities pi and revised probabilities qi respectively, for all i = 1, 2, ..., n. Then Useful information measure corresponding to new exponential divergence measure (7), will be

If utilities are ignored, i.e., ui = 1 for each i, then we obtain the as usual Gexp (P,Q). Fuzzy information measures are very useful to find the amount of average ambiguity or difficulty in making a decision whether an element belongs to a set or not, whereas Useful information measure are very useful to find utility of an event, i.e., an event is how much useful compare to other event.

FIGURE 1.Convex function f1 (t)

 

3. Bounds of new divergence measure

To estimate the new exponential divergence Gexp (P,Q), it would be very interesting to establish some upper and lower bounds. So in this section, we obtain bounds of the exponential divergence measure (7) in terms of other symmetric and non- symmetric divergence measures.

Proposition 3.1. Let P,Q ∈ Γn and 0 < α ≤ 1 < β < ∞, then we have

where Gexp (P,Q) and ∆ (P,Q) are given by (7) and (15) respectively.

Proof. Let us consider

and

Since and f2 (1) = 0, so f2 (t) is strictly convex and normalized function respectively. By putting f2 (t) in (1), we get

where ∆ (P,Q) is called the Triangular discrimination [14].

Now, let

where are given by (6) and (14) respectively and

It is clear that g′ (t) > 0 for t > 0, therefore g (t) is strictly increasing function in interval (0, ∞). So

and

The result (12) is obtained by using (7), (15), (16), and (17) in inequality (4). □

Proposition 3.2. Let P,Q ∈ Γn and 0 < α ≤ 1 < β < ∞, then we have

where Gexp (P,Q) and h (P,Q) are given by (7) and (21) respectively.

Proof. Let us consider

and

Since and f2 (1) = 0, so f2 (t) is strictly convex and normalized function respectively. By putting f2 (t) in (1), we get

where h (P,Q) is called the Hellinger discrimination or Kolmogorov’s divergence [19].

Now, let

where are given by (6) and (20) respectively and

It is clear that g′ (t) > 0 for t > 0, therefore g (t) is strictly increasing function in interval (0, ∞). So

and

The result (18) is obtained by using (7), (21), (22), and (23) in inequality (4). □

In a similar procedure, we obtain the bounds of Gexp (P,Q) in terms of the other well known divergence measures. The results are as follows.

(a) If , then we have

where

is the Jensen- Shannon divergence or Information radius [9,43].

(b) If f2 (t) = (t − 1) log t, then we have

where

is the J- divergence or Jeffrey- Kullback divergence [24,31].

(c) If , then we have

where

is the Jain- Srivastava divergence [23].

(d) If , then we have

where

is the Symmetric chi- square divergence [17].

(e) If , then we have

where

is the Arithmetic- Geometric mean divergence [45].

(f) If , then we have

where

is the Kumar- Johnson divergence [32].

(g) If , then we have

where

is the Relative J- divergence [16].

(h) If f2 (t) = t log t, then we have

where

is the Kullback- Leibler divergence or Relative entropy or Directed divergence or Information gain [31].

(i) If , then we have

where

is the Relative Arithmetic- Geometric divergence [45].

(j) If f2 (t) = (t − 1)2, then we have

where

is the Chi- square divergence or Pearson divergence [37].

(k) If , then we have

where

is the Relative Jensen- Shannon divergence [43].

(l) If , then we have

where

is the Jain and Chhabra divergence [21].

Remark 3.1. Divergences (15), (21), (25), (27), (29), (31), (33), (35) are symmetric and divergences (37), (39), (41), (43), (45), (47) are non- symmetric with respect to probability distributions P,Q ∈ Γn.

 

4. Numerical verification of bounds

In this section, we take an example for calculating the divergences ∆ (P,Q), h (P,Q), G (P,Q) and Gexp (P,Q) and verify numerically the inequalities (12), (18), and (40) or verify the bounds of Gexp (P,Q).

Example 4.1. Let P be the binomial probability distribution with parameters (n = 10, p = 0.7) and Q its approximated Poisson probability distribution with parameter (λ = np = 7) for the random variable X. Then we have

By using Table 1, we get the followings.

Put the approximated values from (48) to (52) in inequalities (12), (18), and (40) respectively and get the following results

respectively. Hence verified the bounds of Gexp (P,Q) in terms of ∆ (P,Q), h (P,Q) and G (P,Q) for p = 0.7.

TABLE 1.Evaluation of discrete probability distributions for (n = 10, p = 0.7, q = 0.3)

Similarly, we can verify the bounds of Gexp (P,Q) in terms of other divergences or can verify the other inequalities for different values of p and q and for other discrete probability distributions as well, like; Negative binomial, Geometric, uniform etc.

In Figure 2, we have considered pi = (a, 1 − a), qi = (1 − a, a), where a ∈ (0, 1).

FIGURE 2.Comparison of divergence measures with new exponential divergence measure

It is clear from Figure 2 that the new exponential divergence Gexp (P,Q) has a steeper slope than ψ (P,Q), χ2 (P,Q), E (P,Q), ∆ (P,Q), h (P,Q), I (P,Q), J (P,Q), T (P,Q), and JR (P,Q).

 

5. Conclusion and discussion

To design a communication system with a specific message handling capability, we need a measure of information content to be transmitted. Divergence measures are for quantifying the dissimilarity among probability distributions. In this work we introduced a new exponential divergence measure and obtained the bounds by using Csiszar’s information inequality and verified the bounds numerically as well in the interval (α, β), 0 < α ≤ 1 < β < ∞. Fuzzy exponential information measure and Useful exponential information measure also introduced. Work on further generalizations of this new divergence measure is in progress and will be reported elsewhere, like: Application to the mutual information, other relations by using standard algebraic and exponential inequalities, square root of this new measure is a metric space etc.

We hope that this work will motivate the reader to consider the extensions of divergence measures in information theory, other problems of functional analysis and fuzzy mathematics.

References

  1. R.K. Bajaj and D.S. Hooda, Generalized measures of fuzzy directed divergence, total ambiguity and information improvement, Journal of Applied Mathematics, Statistics and Informatics, 6 (2010), 31- 44.
  2. M.B. Bassat, f- Entropies, probability of error and feature selection, Inform. Control, 39 (1978), 227-242. https://doi.org/10.1016/S0019-9958(78)90587-9
  3. M. Basseville, Distance measures for signal processing and pattern recognition, Signal Processing, 18 (1989), 349-369. https://doi.org/10.1016/0165-1684(89)90079-0
  4. A. Benveniste, M. Basseville and G. Moustakides, The asymptotic local approach to change detection and model validation, IEEE Trans. Automatic Control, AC-32 (1987), 583- 592. https://doi.org/10.1109/TAC.1987.1104683
  5. A. Bhattacharyya, On a measure of divergence between two multinomial populations, Sankhaya: The Indian Journal of Statistics (1933-1960), 7 (1946), 401- 406.
  6. J.S. Bhullar, O.P. Vinocha and M. Gupta, Generalized measure for two utility distributions, Proceedings of the World Congress on Engineering, 3 (2010).
  7. D.E. Boekee and J.C.A. Van Der Lubbe, Some aspects of error bounds in feature selection, Pattern Recognition, 11 (1979), 353- 360. https://doi.org/10.1016/0031-3203(79)90047-5
  8. L.M. Bregman, The relaxation method to find the common point of convex sets and its applications to the solution of problems in convex programming, USSR Comput. Math. Phys., 7 (1967), 200-217. https://doi.org/10.1016/0041-5553(67)90040-7
  9. J. Burbea and C.R. Rao, On the convexity of some divergence measures based on entropy functions, IEEE Trans. on Inform. Theory, IT-28 (1982), 489-495. https://doi.org/10.1109/TIT.1982.1056497
  10. H.C. Chen, Statistical pattern recognition, Hoyderc Book Co., Rocelle Park, New York, (1973).
  11. C.K. Chow and C.N. Lin, Approximating discrete probability distributions with dependence trees, IEEE Trans. Inform. Theory, 14 (1968), 462-467. https://doi.org/10.1109/TIT.1968.1054142
  12. I. Csiszar, Information measures: A Critical survey, in Trans. In: Seventh Prague Conf. on Information Theory, Academia, Prague, (1974), 73-86.
  13. I. Csiszar, Information type measures of differences of probability distribution and indirect observations, Studia Math. Hungarica, 2 (1967), 299-318.
  14. D. Dacunha- Castelle, Ecole dEte de Probabilites de, Saint-Flour VII-1977, Berlin, Heidelberg, New York: Springer, (1978).
  15. S.S. Dragomir, A generalized f- divergence for probability vectors and applications, Research Report Collection 5 (2000).
  16. S.S. Dragomir, V. Gluscevic and C.E.M. Pearce, Approximation for the Csiszars f- divergence via midpoint inequalities, in Inequality Theory and Applications - Y.J. Cho, J.K. Kim, and S.S. Dragomir (Eds.), Nova Science Publishers, Inc., Huntington, New York, 1 (2001), 139-154.
  17. S.S. Dragomir, J. Sunde and C. Buse, New inequalities for Jeffreys divergence measure, Tamusi Oxford Journal of Mathematical Sciences, 16 (2000), 295-309.
  18. D.V. Gokhale and S. Kullback, Information in contingency Tables, New York, Marcel Dekker, (1978).
  19. E. Hellinger, Neue begrundung der theorie der quadratischen formen von unendlichen vielen veranderlichen, J. Rein.Aug. Math., 136 (1909), 210-271.
  20. D.S. Hooda, On generalized measures of fuzzy entropy, Mathematica Slovaca, 54 (2004), 315- 325.
  21. K.C. Jain and P. Chhabra, New series of information divergence measures and their properties, Accepted in Applied Mathematics and Information Sciences.
  22. K.C. Jain and R.N. Saraswat, Some new information inequalities and its applications in information theory, International Journal of Mathematics Research, 4 (2012), 295- 307.
  23. K.C. Jain and A. Srivastava, On symmetric information divergence measures of Csiszar’s f- divergence class, Journal of Applied Mathematics, Statistics and Informatics, 3 (2007), 85-102.
  24. H. Jeffreys, An invariant form for the prior probability in estimation problem, Proc. Roy. Soc. Lon. Ser. A, 186 (1946), 453-461. https://doi.org/10.1098/rspa.1946.0056
  25. P. Jha and V.K. Mishra, Some new trigonometric, hyperbolic and exponential measures of fuzzy entropy and fuzzy directed divergence, International Journal of Scientific and Engineering Research, 3 (2012), 1-5. https://doi.org/10.15373/22778179/OCT2013/11
  26. L. Jones and C. Byrne, General entropy criteria for inverse problems with applications to data compression, pattern classification and cluster analysis, IEEE Trans. Inform. Theory, 36 (1990), 23-30. https://doi.org/10.1109/18.50370
  27. T.T. Kadota and L.A. Shepp, On the best finite set of linear observables for discriminating two Gaussian signals, IEEE Trans. Inform. Theory, 13 (1967), 288-294.
  28. T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Comm. Technology, COM-15 (1967), 52-60. https://doi.org/10.1109/TCOM.1967.1089532
  29. D. Kazakos and T. Cotsidas, A decision theory approach to the approximation of discrete probability densities, IEEE Trans. Perform. Anal. Machine Intell, 1 (1980), 61- 67. https://doi.org/10.1109/TPAMI.1980.4766971
  30. A.N. Kolmogorov, On the approximation of distributions of sums of independent summands by infinitely divisible distributions, Sankhya, 25, 159-174.
  31. S. Kullback and R.A. Leibler, On information and sufficiency, Ann. Math. Statist., 22 (1951), 79-86. https://doi.org/10.1214/aoms/1177729694
  32. P. Kumar and A. Johnson, On a symmetric divergence measure and information inequalities, Journal of Inequalities in Pure and Applied Mathematics, 6 (2005), 1-13.
  33. P.W. Lamberti, A.P. Majtey, A. Borras, M. Casas and A. Plastino, Metric character of the quantum Jensen- Shannon divergence, Physical Review A, 77 (2008), 052311. https://doi.org/10.1103/PhysRevA.77.052311
  34. F. Nielsen and S. Boltz, The Burbea-Rao and Bhattacharyya centroids, Apr. (2010), Arxiv.
  35. M.A. Nielsen and I.L. Chuang, Quantum computation and information, Cambridge University Press, Cambridge, UK, 3 (2000), 9.
  36. F. Osterreicher, Csiszar's f- divergence basic properties, Homepage: http://www.sbg.ac.at/mat/home.html, November 22, (2002).
  37. K. Pearson, On the Criterion that a given system of deviations from the probable in the case of correlated system of variables is such that it can be reasonable supposed to have arisen from random sampling, Phil. Mag., 50 (1900), 157-172. https://doi.org/10.1080/14786440009463897
  38. E.C. Pielou, Ecological diversity, New York, Wiley, (1975).
  39. H.V. Poor, Robust decision design using a distance criterion, IEEE Trans. Inf. Th., IT 26 (1980), 575- 587. https://doi.org/10.1109/TIT.1980.1056249
  40. A. Renyi, On measures of entropy and information, Proc. 4th Berkeley Symposium on Math. Statist. and Prob., 1 (1961), 547-561.
  41. M. Salicru, Measures of information associated with Csiszar’s divergences, Kybernetika, 30 (1994), 563- 573.
  42. R. Santos-Rodriguez, D. Garcia-Garcia and J. Cid-Sueiro, Cost-sensitive classification based on Bregman divergences for medical diagnosis, In M.A. Wani, editor, Proceedings of the 8th International Conference on Machine Learning and Applications (ICMLA'09), Miami Beach, Fl., USA, December 13-15, (2009), 551- 556.
  43. R. Sibson, Information radius, Z. Wahrs. Undverw. Geb., 1 (1969), 149-160. https://doi.org/10.1007/BF00537520
  44. H.C. Taneja and R.K. Tuteja, Characterization of a quantitative- qualitative measure of inaccuracy, Kybernetika, 22 (1986), 393- 402.
  45. I.J. Taneja, New developments in generalized information measures, Chapter in: Advances in Imaging and Electron Physics, Ed. P.W. Hawkes, 91 (1995), 37-135.
  46. I.J. Taneja, Generalized symmetric divergence measures and inequalities, RGMIA Research Report Collection, http://rgmia.vu.edu.au, 7(2004), Art. 9. Available on-line at: arXiv:math.ST/0501301 v1 19 Jan 2005.
  47. B. Taskar, S. Lacoste-Julien and M.I. Jordan, Structured prediction, dual extra gradient and Bregman projections, Journal of Machine Learning Research, 7 (2006), 1627-1653.
  48. H. Theil Statistical decomposition analysis, Amsterdam, North-Holland, 1972.
  49. H. Theil, Economics and information theory, Amsterdam, North-Holland, 1967.
  50. K. Tumer and J. Ghosh, Estimating the Bayes error rate through classifier combining, Proceedings of 13th International Conference on Pattern Recognition, (1996), 695-699.
  51. B. Vemuri, M. Liu, S. Amari and F. Nielsen, Total Bregman divergence and its applications to DTI analysis, IEEE Transactions on Medical Imaging, (2010).

Cited by

  1. An Extended Shapley TODIM Approach Using Novel Exponential Fuzzy Divergence Measures for Multi-Criteria Service Quality in Vehicle Insurance Firms vol.12, pp.9, 2020, https://doi.org/10.3390/sym12091452