DOI QR코드

DOI QR Code

Adaptive Algorithm in Image Reconstruction Based on Information Geometry

  • Wang, Meng (School of Applied Science Beijing Information Science and Technology University) ;
  • Ning, Zhen Hu (College of Computer Science, Faculty of Information Technology Beijing University of Technology) ;
  • Yu, Jing (College of Computer Science, Faculty of Information Technology Beijing University of Technology) ;
  • Xiao, Chuang Bai (College of Computer Science, Faculty of Information Technology Beijing University of Technology)
  • 투고 : 2019.07.30
  • 심사 : 2020.11.17
  • 발행 : 2021.02.28

초록

Compressed sensing in image reconstruction has attracted attention and many studies are proposed. As we know, adding prior knowledge about the distribution of the support on the original signal to CS can improve the quality of reconstruction. However, it is still difficult for a recovery framework adjusts its strategy for exploiting the prior knowledge efficiently according to the current estimated signals in serial iterations. With the theory of information geometry, we propose an adaptive strategy based on the current estimated signal in each iteration of the recovery. We also improve the performance of existing algorithms through the adaptive strategy for exploiting the prior knowledge according to the current estimated signal. Simulations are presented to validate the results. In the end, we also show the application of the model in the image.

키워드

1. Introduction

Compressed Sensing (CS) is a signal processing technique to recover sparse signals of interest from far fewer samples and has been widely used in image reconstruction due to the booming of media [1]. Chai et al. proposed an image encryption algorithm based on the memristive chaotic system, elementary cellular automata and CS [2]. Zha et al. aproposed a group-based sparse representation method with non-convex regularization for image CS reconstruction [3]. Furthermore, CS was applied to reconstruction of under-sampled atomic images [4]. In 2018, the scholars proposed an image encryption method integrating CS and detour cylindrical diffraction [5].

The CS is to acquire and reconstruct a signal x∈RN by finding a solution to underdetermined linear system [6-8]

\(y=\Phi x+e\)       (1)

where y∈Ris the compressed representation of x, Φ=(Φij)M×N is the random projection matrix, e∈Rrepresents the noise, and M<<N. x is inherently sparse or sparse in some domain. That is to say, most of its entries are zero, or there exists an orthogonal transformational matrix ψ satisfying most of the entries of ψ-1x are zero. One may recover an estimated x as the solution to a l0-minimization problem P0 [9], which is discontinuous and NP-hard. Considering that under the restricted isometry property [10], the P0 is usually relaxed to :

\(\text { P1 } \quad \hat{x}=\underset{x \in R^{N}}{\arg \min }\|x\|_{l_{1}} \quad \text { s.t. } y=\Phi x\)       (2)

To solve the two problems (P0 and P1), many achievements have been obtained. Recovery frameworks fall into three categories: greedy algorithms [11], convex optimization algorithms [12-14], and statistical algorithms [15-17].

Because of lower computational complexity, greedy algorithms are used widely in CS. A typical greedy algorithm is Orthogonal Matching Pursuit (OMP) [18]. OMP can accommodate various scenarios. And it can exactly recover a signal with S nonzero entries in dimension N by O(S ln N) random linear measurements. So OMP has been studied by many scholars and a family of variants are supported, such as Stagewise Orthogonal Matching Pursuit (StOMP) [19], Regularized Orthogonal Matching Pursuit (ROMP) [20], Compressed Sampling Matching Pursuit (CoSaMP) [21], Iterative Hard Threshold (IHT) [22], Gradient Descent with Sparsification (GraDeS) [23] and Generalized Orthogonal Matching Pursuit (GOMP) [24], and so on.

A well-known convex optimization algorithm is the Least Absolute Shrinkage and Selection Operator (LASSO) [25]

\(\begin{aligned} &\tilde{x}=\underset{x \in R^{N}}{\arg \min } \frac{1}{2}\|y-\Phi x\|_{l_{2}}^{2}\\ &\text { s.t. } \quad\|x\|_{l_{1}} \leq t \end{aligned}\)       (3)

where t≥0 is a tuning parameter. A stage-wise fast LASSO algorithm for the image reconstruction from CS optimizes an insensitive Huber objective function to achieve a decision function [26]. Lian et al. [27] exploit a multilevel prior support information model and incorporate it into the LASSO using a weighted l1-norm penalty function. Least Angle Regression (LARS) provides an efficient algorithm for computing the solution paths of LASSO [28]. LARS is an algorithm for fitting linear regression models to high-dimensional data [29]. In 2007, Keerthi et al. [30] applied a fast tracking algorithm for LARS to sparse kernel logistic regression.

In many applications, it may be possible to catch some prior knowledge about the vector representation of a natural image in the domain of some linear transform [31,32]. For example, it can be known that some regions are more likely to be highlight than the others in magnetic resonance images [33]. In statistical algorithms, adding the prior knowledge to CS can improve the quality of reconstruction. There are many studies on sparse recovery with prior knowledge, which enhance the signal recovery by statistic in various degrees. Next, we introduce previous typical statistical algorithms. Babacan et al. [34] proposed an excellent greedy algorithm using Laplace priors based on Bayesian framework. Besides that, the main idea of model-CS is to introduce an extra structure to exploit the distribution of the original signal [35]. In [36], a method based on approximate message passing and Markov-tree prior was supported. Furthermore, neural networks were trained to learn the prior knowledge for recover signal [37]. However, the computational complexity is high because of the extra structure as a model. For the tree based algorithms, the reconstruction involves the travel across the whole tree in every iteration. As far as we know, the average complexity per iteration for the fastest tree-based algorithm for CS is O(MN2)[38]. In addition, the algorithm based on neural networks involves the forwarding from the bottom layer to the top layer of the networks. And the lower bound on average complexity per iteration for the algorithm is O(2N) for a single forwarding computation with a simplest architecture of a hidden layer of 2[(α-2)/M+1] hidden neurons, where  is the size of the training set and   N [39]. Instead of the models, a modified method works by exerting punishment on the selection for a wrong support [40]. In [41], relations between entries of the original signal were built by Kalman filter based on the assumption that support changes slowly. But the Kalman filter does not always use the actual co-variances of the changes of the estimated signals. Furthermore, weighted minimization provided better upper bounds on the reconstruction error where the weights were determined by prior knowledge about a specific probability of nonzero entries [42,43]. In 2018, the weights were chosen to minimize the expected statistical dimension of the descent cones of a weighted cross-polytope [44]. Moreover, such optimal weights have been used to improve greedy algorithms and convex optimization algorithms [45], such as the Weighted Orthogonal Matching Pursuit (WOMP) and Weighted LARS (W-LARS). The algorithms combine either greedy algorithms or convex optimization algorithms with statistic by weights. In the algorithms, a new estimated signal is generated in an iteration according to the result from its previous iterations. However, the optimal weights are fixed in all iterations in the whole procedure of the reconstruction without the consideration of the changing of the estimated signal. Thus, it is more reasonable to adjust the strategy for exploiting the prior knowledge according to the current estimated signal.

Information geometry [46] is a mathematical branch that applies the techniques of differential geometry to study probability theory and statistic. It has been widely used to many applications, such as asymptotic theory of statistical inference [47], the expectation maximisation (EM) algorithm [48], the learning of neural networks [49-51], and many others [52-56]. Wang et al. proposed a model-CS method with neural networks based on information geometry [57]. However, the computational complexity is high because of the neural networks as an extra structure.

To address the problems of the previous algorithms mentioned above, we introduce information geometry to improve the performance of the greedy algorithms and convex optimization algorithms with no extra structure. It is possible to reflect the image features with a specific manifold structure [58]. Based on the fact, we evaluate the candidates of the estimated signals with the Fisher distance [59] between the distribution over estimated support set and the known distribution over the true support on the original signal. Then, we choose the recovery signal which has a distribution close to the true one according to the evaluations. The strategy can be seen as adaptive to the developed support sets generated in different iterations, because we calculate the estimated distribution according to the current estimated support set in each iteration. Experiments are presented to validate the results.

2. Methods

2.1 Construction of Geometry Model

As discussed above, the true distribution of the indexes of the non-zero entries of x is able to be known by statistic. The index distribution can be described by a specific Probability Density Function (PDF) p(k|θ) where θ=(θ12,…,θN)'.

As prior knowledge, we know that the true parameters for the original signals are θ1. In addition, θ2 are estimated parameters in the model. In the iterative algorithm, we estimate the distribution by computing θ2 over the set of optional indexes in each iteration. We choose the recovery signal in the sense that its support has a distribution closer to the known distribution. The details are described as follows.

A distribution from the exponential family [60] is defined as a function in the following form

\(p(k \mid \theta)=\exp \{\theta \cdot k-\varphi(\theta)\}\)       (4)

where ψ is a function of θ.

As a discrete distribution, the position distribution is from the exponential family naturally. Suppose that the frequency of the occurrence of index i∈{0,1,…,N} is p(k=i|θ)=piwhere θ=(θ12,…,θN)'. And we denote the i th element of θ as θ1. Then, it can be rewritten as

\(p(\xi \mid \theta)=\exp \{\theta \cdot \xi-\varphi(\theta)\}\)       (5)

where

\(\xi_{i}=\delta_{i}(k)=\left\{\begin{array}{l} 1, k=i \\ 0, k \neq i \end{array}\right.\)       (6)

\(\theta_{i}=\log \frac{p_{i}}{p_{0}}\)       (7)

then,

\(p_{0}=1-\sum_{i=1}^{N} p_{i}=1-p_{0} \sum_{i=1}^{N} e^{\theta_{i}}\)       (8)

From (8), we obtain

\(\varphi(\theta)=\log \left(1+\sum_{i=1}^{N} e^{\theta_{i}}\right)\)       (9)

A classic parametric space for this family of PDF’s is

\(H=\left\{\theta \in R^{N} \mid \theta \geq 0\right\}\)       (10)

The Fisher Information Matrix (FIM) is defined as

\(G(\theta)=\left[g_{i j}(\theta)\right]\)       (11)

\(g_{i j}(\theta)=E\left[\frac{\partial \log p(k \mid \theta)}{\partial \theta_{i}} \cdot \frac{\partial \log p(k \mid \theta)}{\partial \theta_{j}}\right]\)       (12)

where E means the expectation. The FIM is a way of measuring the amount of information of the parameter. And a distance called Fisher distance arises from the FIM [61]. The distance between two points θ1=θ(t1) and θ2=θ(t2) in the half-plane H measures the dissimilarity between the associated distributions p(x|θ1) and p(x|θ2). Furthermore, the Fisher distance is given by the value which is the minimum of the lengths of all the piece-wise smooth paths θ(t) that joins θ1 and θ2, t1≤t≤t2. That is 

\(D_{F}\left(\theta^{1}, \theta^{2}\right):=\min _{\left\{\theta(t): \theta\left(t_{1}\right)=\theta^{1}, \theta\left(t_{2}\right)=\theta^{2}\right\}} \int_{t_{1}}^{t_{2}} \sqrt{\left(\frac{d \theta}{d t}\right)^{\prime} G(\theta) \frac{d \theta}{d t}} d t\)       (13)

where t is the parameter of the curve θ(t). The above equations can be transformed to the Euler-Lagrange equations as

\(\frac{d^{2} \theta_{h}}{d t^{2}}+\sum_{q=1}^{n} \sum_{z=1}^{n}\left[\frac{1}{2} \sum_{l=1}^{n} g^{h l}\left(\frac{\partial g_{q l}}{\partial \theta_{z}}+\frac{\partial g_{z l}}{\partial \theta_{q}}-\frac{\partial g_{q z}}{\partial \theta_{l}}\right)\right] \frac{d \theta_{q}}{d t} \frac{d \theta_{z}}{d t}=0, \forall h, l, q, z \in\{1,2, \ldots, N\}\)       (14)

where the subscripts have the same meanings as above, while the superscripts index the elements in the inverse matrix of G(θ) in (11).

2.2 Construction of Geometry Model

How to choose support is important to the recovery algorithms. The common method is described as follows. Firstly, the correlation between each column of measurement matrix and residual is calculated. Secondly, the index (or indexes) which has (have) the highest correlation is (are) added to the support. So the correlation is the indicator of determining support.

In this paper, we improve the traditional indicator by a reward term. If the addition of an index into the support leads to a decrease of DF between the estimated distribution and the known distribution, the reward will be set higher. Define the reward term by:

\(\operatorname{reward}(\tilde{A})=\frac{1}{\lambda\left(\theta^{1}, S\right)+D_{F}\left(\theta^{1}, \theta^{2}\right)}\)       (15)

where λ(θ1, S) is a constant which offsets DF’s influence on the indicator. It depends on the parameters and the sparsity. The details will be discussed in section 3.

θis determined with the support \(\widetilde{A}\) generated in each iteration. In an iteration, the support set \(\tilde{A}=\left\{k^{1}, k^{2}, \ldots, k^{j}\right\}\) has j elements. We obtain θ2 by the Maximum Likelihood Estimate (MLE) [62].

\(\theta^{2}=\underset{\theta}{\arg \max } \sum_{i=1}^{j} p\left(k^{i} \mid \theta\right)\)       (16)

The reward term is used to modify the evaluation of the support in each iteration. An indicator is multiplied by its reward term. It is adaptive according to the different estimated support set in each iteration. In another word, it is an adaptive strategy based on the current estimated signal in each iteration of the recovery. The improved algorithm for greedy algorithm is shown in Algorithm 1. The improved algorithm for convex optimization algorithm is shown in Algorithm 2.

Algorithm 1

Algorithm 2

fun() in (*) is defined by

\(\operatorname{fun}\left(r^{j-1}, \phi_{i}\right)=\left\{\begin{array}{lr} \left\langle W r^{j-1}, \phi_{i}>,\right. & j \leq \text { thres }_{\mathrm{min}} \\ <W r^{j-1}, \phi_{i}>\cdot \operatorname{reward}\left(A^{j-1} \cup k^{i}\right), j>\text { thres }_{\mathrm{min}} \end{array}\right.\)       (17)

where <> represents inner product in Euclidean space, W is a positive definite matrix and thresmin is a threshold. For non-weighted algorithms, W is a unit matrix and for algorithms with weights, W is a diagonal.

In the algorithms discussed above, the computing power mainly focuses on the computation of (15) to (17). In the j th iteration, the computation described above will be implemented for N-j+1 candidates. However, the computation doesn’t involve any travel or propagation across a model with a complex architecture. Thus, we infer that the computational complexity of the proposed algorithm is lower than that of the model-based algorithms introduced in section 1.

3. Experimental Results and Discussion

Generally, the analysis of various applications allows the extraction of prior knowledge about the specific distribution over the supports of the signal’s sparse representation. In this section, we take a series of experiments with the simulated signals and imaging signals.

3.1 Simulation

We choose two PDF randomly in Table 1 as the known index distributions to generate 1000 simulated signals following each of them, respectively. We suppose that the latitude of each nonzero entry follows a standard normal distribution. And N is set to 300. We set itermax=8×M. The resmin is 10-5, while the thresmin is 30 (it is big enough for static). λ is determined based on different sparsity for each PDF respectively, as shown in Table 1.

Table 1. Settings for PDF

aFor simplicity, λ is set to the expectation of DF over the samples subtracted by 1.

The signal is acquired from noisy measurements with SNRmes=30dB. Accurate recovery is declared when

\(\|x-\tilde{x}\|_{l_{2}} /\|x\|_{l_{2}} \leq 10^{-1}\)       (18)

During the experiment, M/N is increasing gradually. Meanwhile, we compute the accurate recovery rate. Firstly, the proposed method is used to improve OMP and WOMP, respectively. The improved algorithms are called IG-OMP and IG-WOMP, respectively. We illustrate the results in Fig. 1 and Fig. 2. Secondly, proposed method is used to improve LARS and WLARS, respectively. The improved algorithms are called IG-LARS and IG-WLARS, respectively. We illustrate the results in Fig. 3 and Fig. 4.

Fig. 1. Comparation among OMP, IG-OMP, WOMP and IG-WOMP with the distribution described by f1

Fig. 2. Comparation among OMP, IG-OMP, WOMP and IG-WOMP with the distribution described by f2

Fig. 3. Comparation among LARS, IG-LARS, WLARS and IG-WLARS with the distribution described by f1

Fig. 4. Comparation among LARS, IG-LARS, WLARS and IG-WLARS with the distribution described by f2

To validate the superiority of the proposed algorithm in the computational complexity, we compare the performance of an algorithm based on information geometry with a typical model-based algorithm for CS (i.e., neural networks for CS introduced in section 1) in both of the average running time per iteration and the accurate recovery rate. The experimental settings are as the same as above. For each algorithm, we take the average over all conditions. Naturally, the improved algorithms are more complex than the original algorithms. In addition, OMP has a lower time complexity than LARS. Similarly, IG-WOMP has a lower time complexity than IG-WLARS. Hence, IG-WLARS is the most complex one among the algorithms (OMP, IG- OMP, WOMP, IG-WOMP, LARS, IG-LARS, WLARS and IG-WLARS). So the running time of the lower complex algorithms are meaningless in the validation. Then, we choose the result of the most complex one (i.e., IG-WLARS) for the comparison in Table 2. For simplicity, Neural Networks for CS is abbreviated as NNCS.

Table 2. Results

3.2 Application Examples

Imaging is a typical signal whose support follows a specific distribution. Here, x is an imaging signal in DCT domain. The image is blocked, so that N is 32 ×32. And the M is set to 200.

In Table 3 and Table 4, the results are shown numerically. Moreover, the results are shown intuitively in Fig. 5 and Fig. 6.

Table 3. Results

aThe definition of the Peak-Signal-to-Noise Ratio (PSNR) is given in [63].

Table 4. Results

Fig. 5. (a) Original image (b) IG-WOMP (c) WOMP (d) IG-OMP (e) OMP

Fig. 6. (a) Original image (b) IG-WLARS (c) WLARS (d) IG-LARS (e) LARS

To validate the superiority of the proposed algorithm in the computational complexity, we compare the performance of the most complex algorithm based on information geometry in the paper (i.e., IG-WLARS) with a typical model based algorithm for CS (i.e., NNCS) in both of the average running time and the recovery qualities of the images (i.e., the average PSNR over all images). And we list the results in Table 5.

Table 5. Results

3.3 Discussion

Firstly, we discuss Fig. 1, Fig. 2, Fig. 3, and Fig. 4. S-sparse means x is with S nonzero entries. Under different sparsities, algorithms improved by information geometry outperform original algorithms constantly. The results of two types of decoders are shown: original decoders (OMP, WOMP, LARS, and WLARS) and improved decoders (IG-OMP, IG-WOMP, IG-LARS, and IG-WLARS). Original decoders have worse performance.

Secondly, we discuss the results in Fig. 5, Fig. 6, Table 3, and Table 4. As it is shown in Fig. 5, Fig. 6, Table 3, and Table 4, the recovery is applied to different standard test images (fruits, boat, camera man). In Table 3 and Table 4, a higher PSNR indicates that the reconstruction is of higher quality. In Fig. 5 and Fig. 6, the reconstructions of images show that algorithms improved by the proposed method outperform the original algorithms in real data.

Thirdly, we discuss the results in Table 2 and Table 5. In Table 2, the running time of IGWLARS is lower than NNCS, while the values of accurate recovery rate are close. In Table 5, the running time of IG-WLARS is lower than NNCS, while the recovery quality of the IGWLARS is better. The results of running time of these algorithms contrast sharply with each other, while the values of the recovery quality are very close. The results support the conclusion that the proposed algorithm has a lower computational complexity, while keeping a commensurate recovery accuracy.

4. Conclusion

With the theory of information geometry, we propose an adaptive strategy based on the current estimated signal in each iteration of the recovery. And we improve the performance of recovery algorithms through the adaptive strategy for exploiting the prior knowledge about the index distribution. Simulations are presented to validate the results. In the end, we also show the application of the model in the image.

Acknowledgement

This research was funded by the Beijing Science and Technology Planning Program of China (Z171100004717001), Beijing Natural Science Foundation (4172002), Natural Science Foundation of China (61701009), Conventional Project of Promoting of the Connotation Developing of Colleges and Universities (5112011030), and Natural Science Foundation of China (11772063).

참고문헌

  1. J. Zhou, J. Ai, Z. Wang, S. Chen, and Q. Wei, "Discovering attractive segments in the user-generated video streams," Lecture Notes in Computer Science, vol. 11642, pp. 236-250, 2019.
  2. X. Chai, Z. Xiaoyu, G. Zhihua, H. Daojun, and Y. Chen, "An image encryption algorithm based on chaotic system and compressive sensing," Signal Processing, vol. 148, pp. 124-144, July 2018. https://doi.org/10.1016/j.sigpro.2018.02.007
  3. Z. Zha, X. Zhang, Q. Wang, L. Tang, and X. Liu, "Group-based sparse representation for image compressive sensing reconstruction with non-convex regularization," Neurocomputing, vol. 296, pp. 55-63, June 2018. https://doi.org/10.1016/j.neucom.2018.03.027
  4. G. Han and B. Lin, "Optimal sampling and reconstruction of undersampled atomic force microscope images using compressive sensing," Ultramicroscopy, vol. 189, pp. 85-94, June 2018. https://doi.org/10.1016/j.ultramic.2018.03.019
  5. J. Wang, Q. Wang, and Yu. Hu, "Image Encryption Using Compressive Sensing and Detour Cylindrical Diffraction," IEEE Photonics Journal, vol. 10, no. 3, pp.1-14, June 2018.
  6. E. J. Candes and M. B. Wakin, "An introduction to compressive sampling," IEEE Trans signal processing magazine, vol. 25, no. 2, pp. 21-30, Mar. 2008.
  7. D. L. Donoho, "Compressed sensing," IEEE Trans Inform Theory, vol. 52, no. 4, pp. 1289-1306, May 2006.
  8. E. Candes, J. Romberg, and T. Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information," IEEE Transactions Inform. Theory, vol. 52, no. 2, pp. 489-509, Feb. 2006.
  9. S. Muthukrishnan, "Data Streams: Algorithms and Applications," Foundations & Trends in Theoretical Computer Science, p. 135, 2005.
  10. E. J. Candes and T. Tao, "Decoding by linear programming," IEEE Trans on information theory, vol. 51, no. 12, pp. 4203-4215, Dec. 2005.
  11. S. G. Mallat and Z. Zhang, "Matching pursuit with time-frequency dictionaries," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397-3415, Dec. 1993.
  12. M. Figueiredo, D. Robert, and S. J. Wright, "Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems," IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586-597, Dec. 2007. https://doi.org/10.1109/JSTSP.2007.910281
  13. S. Chen, D. Donoho, and M. Saunders, "Atomic decomposition by basis pursuit," Society for Industrial and Applied Mathematics, vol. 43, no. 1, pp. 129-159, 2001.
  14. I. Daubechies, M. Defrise, and C. D. Mol, "An iterative thresholding algorithm for linear inverse problems with a sparsity constraint," Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413-1457, Dec. 2004. https://doi.org/10.1002/cpa.20042
  15. H. Zayyani, M. Babaie-Zadeh, and C. Jutten, "Bayesian pursuit algorithm for sparse representation," in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.
  16. D. Baron, S. Sarvotham, and R. G. Baraniuk, "Bayesian Compressive Sensing Via Belief Propagation," IEEE Transactions on Signal Processing, vol. 58, no. 1, pp. 269-280, Jan. 2010. https://doi.org/10.1109/TSP.2009.2027773
  17. M. W. Seeger, "Bayesian inference and optimal design for the sparse linear model," Journal of Machine Learning Research, vol. 9, pp. 759-813, May 2008.
  18. J. A. Tropp and A. C. Gilbert, "Signal recovery from random measurements via orthogonal matching pursuit," IEEE Trans on information theory, vol. 53, no. 12, pp. 4655-4666, Dec. 2007. https://doi.org/10.1109/TIT.2007.909108
  19. D. L. Donoho, Y. Tsaig, and I. Drori, "Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit," Statistics Technical Report Aerospace Corp., vol. 58, no 2, pp. 1094-1211, Feb. 2012.
  20. D. Needell and R. Vershynin, "Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit," Foundation of Computational Mathematics, vol. 9, pp. 317-334, Jan. 2009. https://doi.org/10.1007/s10208-008-9031-3
  21. D. Needell and J. A. Tropp, "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples," Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301-321, May 2009. https://doi.org/10.1016/j.acha.2008.07.002
  22. T. Blumensath and M. E. Davies, "Iterative hard thresholding for compressed sensing," Applied and Computational Harmonic Analysis, vol. 27, pp. 265-274, Feb. 2009. https://doi.org/10.1016/j.acha.2009.04.002
  23. R. Garg and R. Khandeka, "Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property," in Proc. of the 26th International Conference on Machine Learning, pp. 337-344, 2009.
  24. J. Wang, S. Kwon, and B. Shim, "Generalized orthogonal matching pursuit," IEEE Transactions on Signal Processing, vol. 60, pp. 6202-6216, Dec. 2012. https://doi.org/10.1109/TSP.2012.2218810
  25. R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical Society, vol. 73, no. 3, pp. 273-282, 2011. https://doi.org/10.1111/j.1467-9868.2011.00771.x
  26. W. Jiao, L. Fang, and J. Licheng, "Reconstruction of images from compressive sensing based on the stagewise fast LASSO," in Proc. of SPIE, vol. 7498, 2009.
  27. L. Lian, A. Liu, and V. K. N. Lau, "Weighted LASSO for Sparse Recovery with Statistical Prior Support Information," IEEE Transactions on Signal Processing, vol. 66, no. 6, pp. 1607-1618, 2018. https://doi.org/10.1109/tsp.2018.2791949
  28. C. Y. Yau and T. S. Hui, "LARS-type algorithm for group lasso," Statistics and Computing, vol. 27, pp. 1041-1048, 2017.
  29. B. Radley EFron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least Angle Regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004. https://doi.org/10.1214/009053604000000067
  30. S. S. Keerthi and S. Shevade, "A Fast Tracking Algorithm for Generalized LARS/LASSO," IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1826-1830, 2007. https://doi.org/10.1109/TNN.2007.900229
  31. M. A. Khajehnejad, W. Xu, A. Avestimehr, and B. Hassibi, "Analyzing Weighted l1 Minimization for Sparse Recovery with Nonuniform Sparse Models," IEEE Transactions on Signal Processing, vol. 59, pp.1985-2001, Jan. 2011. https://doi.org/10.1109/TSP.2011.2107904
  32. Z. Wang, K. Chen, M. Zhang, P. He, Y. Wang, P. Zhu, and Y. Yang. "Multi-scale aggregation network for temporal action proposals," Pattern Recognition Letters, vol. 122, no. 1, pp. 60-65, 2019.
  33. W. Lu and N. Vaswani, "Modified compressive sensing for real-time dynamic MR imaging," Proc. IEEE Int. Conf. Image Process, 2009.
  34. S. D. Babacan, R. Molina, and A. K. Katsaggelos, "Bayesian compressive sensing using Laplace priors," IEEE Transactions Image Process., vol. 19, no. 1, pp. 53-63, Jan. 2010. https://doi.org/10.1109/TIP.2009.2032894
  35. R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, "Model-based compressive sensing," IEEE Transactions Inf. Theory, vol. 56, no. 4, pp. 1982-2001, 2010. https://doi.org/10.1109/TIT.2010.2040894
  36. S. Som and P. Schniter, "Compressive imaging using approximate message passing and a Markov-tree prior," IEEE Transactions Signal Process, vol. 60, no. 7, pp. 3439-3448, 2012. https://doi.org/10.1109/TSP.2012.2191780
  37. D. Merhej, C. Diab, M. Khalil, and R. Prost, "Embedding prior knowledge within compressed sensing by neural networks," IEEE Transactions Neural Netw., vol. 22, no. 10, pp. 1638-1649, Oct. 2011. https://doi.org/10.1109/TNN.2011.2164810
  38. H. Q. Bui, C. N. H. La, and M. N. Da, "A fast tree-based algorithm for Compressed Sensing with sparse-tree prior," Signal Processing, vol. 108, pp. 628-641, 2015. https://doi.org/10.1016/j.sigpro.2014.10.026
  39. Z. Zhang, X. Ma, and Y. Yang, "Bounds on the number of hidden neurons in three-layer binary neural networks," Neural Networks, vol. 16, no. 7, pp. 995-1002, 2003. https://doi.org/10.1016/S0893-6080(03)00006-6
  40. N. Vaswani and W. Lu, "Modified-CS: Modifying compressive sensing for problems with partially known support," IEEE Transactions Signal Process., vol. 58, no. 9, pp. 4595-4607, Sep. 2010. https://doi.org/10.1109/TSP.2010.2051150
  41. N. Vaswani, "Kalman filtered compressed sensing," in Proc. of IEEE International Conference Image Process, 2008.
  42. M. A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi, "Analyzing Weighted Minimization for Sparse Recovery With Nonuniform Sparse Models," IEEE Transactions Signal Process, vol. 59, no. 5, pp. 1985-2001, May 2011. https://doi.org/10.1109/TSP.2011.2107904
  43. C. La and M. Do, "Tree-based Orthogonal Matching Pursuit algorithm for signal reconstruction," in Proc. of IEEE International Conference on Image Processing, pp. 1277-1280, May 2006.
  44. D. Mateo, J. Mauricio, R. Felipe, and V. Mauricio, "Compressed sensing of data with a known distribution," Applied and Computational Harmonic Analysis, vol. 45, no. 3, pp. 486-504, Nov. 2018. https://doi.org/10.1016/j.acha.2016.12.001
  45. D. Escoda, L. Granai, and P. Vandergheynst, "On the use of a priori information for sparse signal approximations," IEEE Transactions Signal Processing, vol. 54, no. 9, pp. 3468-3482, Sep. 2006. https://doi.org/10.1109/TSP.2006.879306
  46. N. Ay, J. Jost, H. V. Le, and L. Schwachhofer. Information Geometry, Springer, 2017.
  47. R.E. Kass and P.W. Vos, Geometrical Foundations of Asymptotic Inference, New York, USA: Wiley-Interscience, 1997.
  48. S. Amari, "Information geometry of the EM and em algorithms for neural networks," Neural Networks, vol. 8, no. 9, pp. 1379-1408, Dec. 1995. https://doi.org/10.1016/0893-6080(95)00003-8
  49. D. Meng and R. Liu, "Information geometry-Geometric methods for Computational Neuroscience," Acta Biophysica Sinica, vol. 15, pp. 243-248, May 1999. https://doi.org/10.3321/j.issn:1000-6737.1999.02.001
  50. M. Hu, Y. Yang, F. Shen, N. Xie, R. Hong, and H. T. Shen, "Collective Reconstructive Embeddings for Cross-Modal Hashing," IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2770-2784, 2019. https://doi.org/10.1109/tip.2018.2890144
  51. Z. Yang and J. Laaksonen, "Principal whitened gradient for information geometry," Neural Networks, vol. 21, pp. 232-240, Nov. 2008. https://doi.org/10.1016/j.neunet.2007.12.016
  52. S. Amari, K. Kurata, and H. Nagaoka, "Information geometry of Boltzmann machines," IEEE Transactions on Neural Networks, vol. 3, no. 2, pp. 260-271, Jan. 1992.
  53. S. Amari, "Natural gradient works efficiently in learning," Neural Computation, vol. 10, pp. 251-276, Sep. 1998. https://doi.org/10.1162/089976698300017746
  54. S. Amari, "Fisher information under restriction of Shannon information in multi-terminal situations," Annals of the Institute of Statistical Mathematics, vol. 41, pp. 623-648, Jan. 1989. https://doi.org/10.1007/BF00057730
  55. L. Campbell, "The relation between information theory and the differential geometry approach to statistics," Information Sciences, vol. 35, no. 3, pp. 199-210, Feb. 1985. https://doi.org/10.1016/0020-0255(85)90050-7
  56. S. Amari, "Information geometry on hierarchy of probability distributions," IEEE Transactions on Information Theory, vol. 47, no. 5, pp. 1701-1711, Mar. 2001.
  57. M. Wang, C. B. Xiao, and Z. H. Ning, "Neural Networks for Compressed Sensing Based on Information Geometry," Circuits Systems and Signal Process, vol. 38, no. 2, pp. 569-589, Feb. 2019. https://doi.org/10.1007/s00034-018-0869-6
  58. M. Hu, Y. Yang, F. Shen, N. Xie, R. Hong, and H. T. Shen, "Collective Reconstructive Embeddings for Cross-Modal Hashing," IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2770-2784, 2019. https://doi.org/10.1109/tip.2018.2890144
  59. S. He, B. Wang, Z. Wang, Y. Yang, F. Shen, Z. Huang, and H. T. Shen, "Bidirectional Discrete Matrix Factorization Hashing for Image Search," IEEE Transactions on Cybernetics, vol. 5, no. 9, pp. 4157-4168, 2020.
  60. B. R. Frieden, Science from Fisher Information: A Unification, Cambridge Univ. Press, 2004.
  61. F. Nielsen and V. Garcia, "Statistical exponential families: A digest with flash cards," arXiv.org:0911.4863, 2009.
  62. M. Menendez, D. Morales, L. Pardo, and M. Salicrij, "Statistical tests based on geodesic distances," Applied Mathematics Letters, vol. 8, no. 1, pp. 65-69, Jun. 1995. https://doi.org/10.1016/0893-9659(94)00112-P
  63. H. Quan and G. Mohammed, "The accuracy of PSNR in predicting video quality for different video scenes and frame rates," Telecommunication Systems, vol. 49, pp. 35-48, Feb. 2012. https://doi.org/10.1007/s11235-010-9351-x