1. Introduction
Compressed Sensing (CS) is a signal processing technique to recover sparse signals of interest from far fewer samples and has been widely used in image reconstruction due to the booming of media [1]. Chai et al. proposed an image encryption algorithm based on the memristive chaotic system, elementary cellular automata and CS [2]. Zha et al. aproposed a group-based sparse representation method with non-convex regularization for image CS reconstruction [3]. Furthermore, CS was applied to reconstruction of under-sampled atomic images [4]. In 2018, the scholars proposed an image encryption method integrating CS and detour cylindrical diffraction [5].
The CS is to acquire and reconstruct a signal x∈RN by finding a solution to underdetermined linear system [6-8]
\(y=\Phi x+e\) (1)
where y∈RM is the compressed representation of x, Φ=(Φij)M×N is the random projection matrix, e∈RM represents the noise, and M<<N. x is inherently sparse or sparse in some domain. That is to say, most of its entries are zero, or there exists an orthogonal transformational matrix ψ satisfying most of the entries of ψ-1x are zero. One may recover an estimated x as the solution to a l0-minimization problem P0 [9], which is discontinuous and NP-hard. Considering that under the restricted isometry property [10], the P0 is usually relaxed to :
\(\text { P1 } \quad \hat{x}=\underset{x \in R^{N}}{\arg \min }\|x\|_{l_{1}} \quad \text { s.t. } y=\Phi x\) (2)
To solve the two problems (P0 and P1), many achievements have been obtained. Recovery frameworks fall into three categories: greedy algorithms [11], convex optimization algorithms [12-14], and statistical algorithms [15-17].
Because of lower computational complexity, greedy algorithms are used widely in CS. A typical greedy algorithm is Orthogonal Matching Pursuit (OMP) [18]. OMP can accommodate various scenarios. And it can exactly recover a signal with S nonzero entries in dimension N by O(S ln N) random linear measurements. So OMP has been studied by many scholars and a family of variants are supported, such as Stagewise Orthogonal Matching Pursuit (StOMP) [19], Regularized Orthogonal Matching Pursuit (ROMP) [20], Compressed Sampling Matching Pursuit (CoSaMP) [21], Iterative Hard Threshold (IHT) [22], Gradient Descent with Sparsification (GraDeS) [23] and Generalized Orthogonal Matching Pursuit (GOMP) [24], and so on.
A well-known convex optimization algorithm is the Least Absolute Shrinkage and Selection Operator (LASSO) [25]
\(\begin{aligned} &\tilde{x}=\underset{x \in R^{N}}{\arg \min } \frac{1}{2}\|y-\Phi x\|_{l_{2}}^{2}\\ &\text { s.t. } \quad\|x\|_{l_{1}} \leq t \end{aligned}\) (3)
where t≥0 is a tuning parameter. A stage-wise fast LASSO algorithm for the image reconstruction from CS optimizes an insensitive Huber objective function to achieve a decision function [26]. Lian et al. [27] exploit a multilevel prior support information model and incorporate it into the LASSO using a weighted l1-norm penalty function. Least Angle Regression (LARS) provides an efficient algorithm for computing the solution paths of LASSO [28]. LARS is an algorithm for fitting linear regression models to high-dimensional data [29]. In 2007, Keerthi et al. [30] applied a fast tracking algorithm for LARS to sparse kernel logistic regression.
In many applications, it may be possible to catch some prior knowledge about the vector representation of a natural image in the domain of some linear transform [31,32]. For example, it can be known that some regions are more likely to be highlight than the others in magnetic resonance images [33]. In statistical algorithms, adding the prior knowledge to CS can improve the quality of reconstruction. There are many studies on sparse recovery with prior knowledge, which enhance the signal recovery by statistic in various degrees. Next, we introduce previous typical statistical algorithms. Babacan et al. [34] proposed an excellent greedy algorithm using Laplace priors based on Bayesian framework. Besides that, the main idea of model-CS is to introduce an extra structure to exploit the distribution of the original signal [35]. In [36], a method based on approximate message passing and Markov-tree prior was supported. Furthermore, neural networks were trained to learn the prior knowledge for recover signal [37]. However, the computational complexity is high because of the extra structure as a model. For the tree based algorithms, the reconstruction involves the travel across the whole tree in every iteration. As far as we know, the average complexity per iteration for the fastest tree-based algorithm for CS is O(MN2)[38]. In addition, the algorithm based on neural networks involves the forwarding from the bottom layer to the top layer of the networks. And the lower bound on average complexity per iteration for the algorithm is O(2N) for a single forwarding computation with a simplest architecture of a hidden layer of 2[(α-2)/M+1] hidden neurons, where is the size of the training set and N [39]. Instead of the models, a modified method works by exerting punishment on the selection for a wrong support [40]. In [41], relations between entries of the original signal were built by Kalman filter based on the assumption that support changes slowly. But the Kalman filter does not always use the actual co-variances of the changes of the estimated signals. Furthermore, weighted minimization provided better upper bounds on the reconstruction error where the weights were determined by prior knowledge about a specific probability of nonzero entries [42,43]. In 2018, the weights were chosen to minimize the expected statistical dimension of the descent cones of a weighted cross-polytope [44]. Moreover, such optimal weights have been used to improve greedy algorithms and convex optimization algorithms [45], such as the Weighted Orthogonal Matching Pursuit (WOMP) and Weighted LARS (W-LARS). The algorithms combine either greedy algorithms or convex optimization algorithms with statistic by weights. In the algorithms, a new estimated signal is generated in an iteration according to the result from its previous iterations. However, the optimal weights are fixed in all iterations in the whole procedure of the reconstruction without the consideration of the changing of the estimated signal. Thus, it is more reasonable to adjust the strategy for exploiting the prior knowledge according to the current estimated signal.
Information geometry [46] is a mathematical branch that applies the techniques of differential geometry to study probability theory and statistic. It has been widely used to many applications, such as asymptotic theory of statistical inference [47], the expectation maximisation (EM) algorithm [48], the learning of neural networks [49-51], and many others [52-56]. Wang et al. proposed a model-CS method with neural networks based on information geometry [57]. However, the computational complexity is high because of the neural networks as an extra structure.
To address the problems of the previous algorithms mentioned above, we introduce information geometry to improve the performance of the greedy algorithms and convex optimization algorithms with no extra structure. It is possible to reflect the image features with a specific manifold structure [58]. Based on the fact, we evaluate the candidates of the estimated signals with the Fisher distance [59] between the distribution over estimated support set and the known distribution over the true support on the original signal. Then, we choose the recovery signal which has a distribution close to the true one according to the evaluations. The strategy can be seen as adaptive to the developed support sets generated in different iterations, because we calculate the estimated distribution according to the current estimated support set in each iteration. Experiments are presented to validate the results.
2. Methods
2.1 Construction of Geometry Model
As discussed above, the true distribution of the indexes of the non-zero entries of x is able to be known by statistic. The index distribution can be described by a specific Probability Density Function (PDF) p(k|θ) where θ=(θ1,θ2,…,θN)'.
As prior knowledge, we know that the true parameters for the original signals are θ1. In addition, θ2 are estimated parameters in the model. In the iterative algorithm, we estimate the distribution by computing θ2 over the set of optional indexes in each iteration. We choose the recovery signal in the sense that its support has a distribution closer to the known distribution. The details are described as follows.
A distribution from the exponential family [60] is defined as a function in the following form
\(p(k \mid \theta)=\exp \{\theta \cdot k-\varphi(\theta)\}\) (4)
where ψ is a function of θ.
As a discrete distribution, the position distribution is from the exponential family naturally. Suppose that the frequency of the occurrence of index i∈{0,1,…,N} is p(k=i|θ)=pi, where θ=(θ1,θ2,…,θN)'. And we denote the i th element of θ as θ1. Then, it can be rewritten as
\(p(\xi \mid \theta)=\exp \{\theta \cdot \xi-\varphi(\theta)\}\) (5)
where
\(\xi_{i}=\delta_{i}(k)=\left\{\begin{array}{l} 1, k=i \\ 0, k \neq i \end{array}\right.\) (6)
\(\theta_{i}=\log \frac{p_{i}}{p_{0}}\) (7)
then,
\(p_{0}=1-\sum_{i=1}^{N} p_{i}=1-p_{0} \sum_{i=1}^{N} e^{\theta_{i}}\) (8)
From (8), we obtain
\(\varphi(\theta)=\log \left(1+\sum_{i=1}^{N} e^{\theta_{i}}\right)\) (9)
A classic parametric space for this family of PDF’s is
\(H=\left\{\theta \in R^{N} \mid \theta \geq 0\right\}\) (10)
The Fisher Information Matrix (FIM) is defined as
\(G(\theta)=\left[g_{i j}(\theta)\right]\) (11)
\(g_{i j}(\theta)=E\left[\frac{\partial \log p(k \mid \theta)}{\partial \theta_{i}} \cdot \frac{\partial \log p(k \mid \theta)}{\partial \theta_{j}}\right]\) (12)
where E means the expectation. The FIM is a way of measuring the amount of information of the parameter. And a distance called Fisher distance arises from the FIM [61]. The distance between two points θ1=θ(t1) and θ2=θ(t2) in the half-plane H measures the dissimilarity between the associated distributions p(x|θ1) and p(x|θ2). Furthermore, the Fisher distance is given by the value which is the minimum of the lengths of all the piece-wise smooth paths θ(t) that joins θ1 and θ2, t1≤t≤t2. That is
\(D_{F}\left(\theta^{1}, \theta^{2}\right):=\min _{\left\{\theta(t): \theta\left(t_{1}\right)=\theta^{1}, \theta\left(t_{2}\right)=\theta^{2}\right\}} \int_{t_{1}}^{t_{2}} \sqrt{\left(\frac{d \theta}{d t}\right)^{\prime} G(\theta) \frac{d \theta}{d t}} d t\) (13)
where t is the parameter of the curve θ(t). The above equations can be transformed to the Euler-Lagrange equations as
\(\frac{d^{2} \theta_{h}}{d t^{2}}+\sum_{q=1}^{n} \sum_{z=1}^{n}\left[\frac{1}{2} \sum_{l=1}^{n} g^{h l}\left(\frac{\partial g_{q l}}{\partial \theta_{z}}+\frac{\partial g_{z l}}{\partial \theta_{q}}-\frac{\partial g_{q z}}{\partial \theta_{l}}\right)\right] \frac{d \theta_{q}}{d t} \frac{d \theta_{z}}{d t}=0, \forall h, l, q, z \in\{1,2, \ldots, N\}\) (14)
where the subscripts have the same meanings as above, while the superscripts index the elements in the inverse matrix of G(θ) in (11).
2.2 Construction of Geometry Model
How to choose support is important to the recovery algorithms. The common method is described as follows. Firstly, the correlation between each column of measurement matrix and residual is calculated. Secondly, the index (or indexes) which has (have) the highest correlation is (are) added to the support. So the correlation is the indicator of determining support.
In this paper, we improve the traditional indicator by a reward term. If the addition of an index into the support leads to a decrease of DF between the estimated distribution and the known distribution, the reward will be set higher. Define the reward term by:
\(\operatorname{reward}(\tilde{A})=\frac{1}{\lambda\left(\theta^{1}, S\right)+D_{F}\left(\theta^{1}, \theta^{2}\right)}\) (15)
where λ(θ1, S) is a constant which offsets DF’s influence on the indicator. It depends on the parameters and the sparsity. The details will be discussed in section 3.
θ2 is determined with the support \(\widetilde{A}\) generated in each iteration. In an iteration, the support set \(\tilde{A}=\left\{k^{1}, k^{2}, \ldots, k^{j}\right\}\) has j elements. We obtain θ2 by the Maximum Likelihood Estimate (MLE) [62].
\(\theta^{2}=\underset{\theta}{\arg \max } \sum_{i=1}^{j} p\left(k^{i} \mid \theta\right)\) (16)
The reward term is used to modify the evaluation of the support in each iteration. An indicator is multiplied by its reward term. It is adaptive according to the different estimated support set in each iteration. In another word, it is an adaptive strategy based on the current estimated signal in each iteration of the recovery. The improved algorithm for greedy algorithm is shown in Algorithm 1. The improved algorithm for convex optimization algorithm is shown in Algorithm 2.
Algorithm 1
Algorithm 2
fun() in (*) is defined by
\(\operatorname{fun}\left(r^{j-1}, \phi_{i}\right)=\left\{\begin{array}{lr} \left\langle W r^{j-1}, \phi_{i}>,\right. & j \leq \text { thres }_{\mathrm{min}} \\ <W r^{j-1}, \phi_{i}>\cdot \operatorname{reward}\left(A^{j-1} \cup k^{i}\right), j>\text { thres }_{\mathrm{min}} \end{array}\right.\) (17)
where <> represents inner product in Euclidean space, W is a positive definite matrix and thresmin is a threshold. For non-weighted algorithms, W is a unit matrix and for algorithms with weights, W is a diagonal.
In the algorithms discussed above, the computing power mainly focuses on the computation of (15) to (17). In the j th iteration, the computation described above will be implemented for N-j+1 candidates. However, the computation doesn’t involve any travel or propagation across a model with a complex architecture. Thus, we infer that the computational complexity of the proposed algorithm is lower than that of the model-based algorithms introduced in section 1.
3. Experimental Results and Discussion
Generally, the analysis of various applications allows the extraction of prior knowledge about the specific distribution over the supports of the signal’s sparse representation. In this section, we take a series of experiments with the simulated signals and imaging signals.
3.1 Simulation
We choose two PDF randomly in Table 1 as the known index distributions to generate 1000 simulated signals following each of them, respectively. We suppose that the latitude of each nonzero entry follows a standard normal distribution. And N is set to 300. We set itermax=8×M. The resmin is 10-5, while the thresmin is 30 (it is big enough for static). λ is determined based on different sparsity for each PDF respectively, as shown in Table 1.
Table 1. Settings for PDF
aFor simplicity, λ is set to the expectation of DF over the samples subtracted by 1.
The signal is acquired from noisy measurements with SNRmes=30dB. Accurate recovery is declared when
\(\|x-\tilde{x}\|_{l_{2}} /\|x\|_{l_{2}} \leq 10^{-1}\) (18)
During the experiment, M/N is increasing gradually. Meanwhile, we compute the accurate recovery rate. Firstly, the proposed method is used to improve OMP and WOMP, respectively. The improved algorithms are called IG-OMP and IG-WOMP, respectively. We illustrate the results in Fig. 1 and Fig. 2. Secondly, proposed method is used to improve LARS and WLARS, respectively. The improved algorithms are called IG-LARS and IG-WLARS, respectively. We illustrate the results in Fig. 3 and Fig. 4.
Fig. 1. Comparation among OMP, IG-OMP, WOMP and IG-WOMP with the distribution described by f1
Fig. 2. Comparation among OMP, IG-OMP, WOMP and IG-WOMP with the distribution described by f2
Fig. 3. Comparation among LARS, IG-LARS, WLARS and IG-WLARS with the distribution described by f1
Fig. 4. Comparation among LARS, IG-LARS, WLARS and IG-WLARS with the distribution described by f2
To validate the superiority of the proposed algorithm in the computational complexity, we compare the performance of an algorithm based on information geometry with a typical model-based algorithm for CS (i.e., neural networks for CS introduced in section 1) in both of the average running time per iteration and the accurate recovery rate. The experimental settings are as the same as above. For each algorithm, we take the average over all conditions. Naturally, the improved algorithms are more complex than the original algorithms. In addition, OMP has a lower time complexity than LARS. Similarly, IG-WOMP has a lower time complexity than IG-WLARS. Hence, IG-WLARS is the most complex one among the algorithms (OMP, IG- OMP, WOMP, IG-WOMP, LARS, IG-LARS, WLARS and IG-WLARS). So the running time of the lower complex algorithms are meaningless in the validation. Then, we choose the result of the most complex one (i.e., IG-WLARS) for the comparison in Table 2. For simplicity, Neural Networks for CS is abbreviated as NNCS.
Table 2. Results
3.2 Application Examples
Imaging is a typical signal whose support follows a specific distribution. Here, x is an imaging signal in DCT domain. The image is blocked, so that N is 32 ×32. And the M is set to 200.
In Table 3 and Table 4, the results are shown numerically. Moreover, the results are shown intuitively in Fig. 5 and Fig. 6.
Table 3. Results
aThe definition of the Peak-Signal-to-Noise Ratio (PSNR) is given in [63].
Table 4. Results
Fig. 5. (a) Original image (b) IG-WOMP (c) WOMP (d) IG-OMP (e) OMP
Fig. 6. (a) Original image (b) IG-WLARS (c) WLARS (d) IG-LARS (e) LARS
To validate the superiority of the proposed algorithm in the computational complexity, we compare the performance of the most complex algorithm based on information geometry in the paper (i.e., IG-WLARS) with a typical model based algorithm for CS (i.e., NNCS) in both of the average running time and the recovery qualities of the images (i.e., the average PSNR over all images). And we list the results in Table 5.
Table 5. Results
3.3 Discussion
Firstly, we discuss Fig. 1, Fig. 2, Fig. 3, and Fig. 4. S-sparse means x is with S nonzero entries. Under different sparsities, algorithms improved by information geometry outperform original algorithms constantly. The results of two types of decoders are shown: original decoders (OMP, WOMP, LARS, and WLARS) and improved decoders (IG-OMP, IG-WOMP, IG-LARS, and IG-WLARS). Original decoders have worse performance.
Secondly, we discuss the results in Fig. 5, Fig. 6, Table 3, and Table 4. As it is shown in Fig. 5, Fig. 6, Table 3, and Table 4, the recovery is applied to different standard test images (fruits, boat, camera man). In Table 3 and Table 4, a higher PSNR indicates that the reconstruction is of higher quality. In Fig. 5 and Fig. 6, the reconstructions of images show that algorithms improved by the proposed method outperform the original algorithms in real data.
Thirdly, we discuss the results in Table 2 and Table 5. In Table 2, the running time of IGWLARS is lower than NNCS, while the values of accurate recovery rate are close. In Table 5, the running time of IG-WLARS is lower than NNCS, while the recovery quality of the IGWLARS is better. The results of running time of these algorithms contrast sharply with each other, while the values of the recovery quality are very close. The results support the conclusion that the proposed algorithm has a lower computational complexity, while keeping a commensurate recovery accuracy.
4. Conclusion
With the theory of information geometry, we propose an adaptive strategy based on the current estimated signal in each iteration of the recovery. And we improve the performance of recovery algorithms through the adaptive strategy for exploiting the prior knowledge about the index distribution. Simulations are presented to validate the results. In the end, we also show the application of the model in the image.
Acknowledgement
This research was funded by the Beijing Science and Technology Planning Program of China (Z171100004717001), Beijing Natural Science Foundation (4172002), Natural Science Foundation of China (61701009), Conventional Project of Promoting of the Connotation Developing of Colleges and Universities (5112011030), and Natural Science Foundation of China (11772063).
참고문헌
- J. Zhou, J. Ai, Z. Wang, S. Chen, and Q. Wei, "Discovering attractive segments in the user-generated video streams," Lecture Notes in Computer Science, vol. 11642, pp. 236-250, 2019.
- X. Chai, Z. Xiaoyu, G. Zhihua, H. Daojun, and Y. Chen, "An image encryption algorithm based on chaotic system and compressive sensing," Signal Processing, vol. 148, pp. 124-144, July 2018. https://doi.org/10.1016/j.sigpro.2018.02.007
- Z. Zha, X. Zhang, Q. Wang, L. Tang, and X. Liu, "Group-based sparse representation for image compressive sensing reconstruction with non-convex regularization," Neurocomputing, vol. 296, pp. 55-63, June 2018. https://doi.org/10.1016/j.neucom.2018.03.027
- G. Han and B. Lin, "Optimal sampling and reconstruction of undersampled atomic force microscope images using compressive sensing," Ultramicroscopy, vol. 189, pp. 85-94, June 2018. https://doi.org/10.1016/j.ultramic.2018.03.019
- J. Wang, Q. Wang, and Yu. Hu, "Image Encryption Using Compressive Sensing and Detour Cylindrical Diffraction," IEEE Photonics Journal, vol. 10, no. 3, pp.1-14, June 2018.
- E. J. Candes and M. B. Wakin, "An introduction to compressive sampling," IEEE Trans signal processing magazine, vol. 25, no. 2, pp. 21-30, Mar. 2008.
- D. L. Donoho, "Compressed sensing," IEEE Trans Inform Theory, vol. 52, no. 4, pp. 1289-1306, May 2006.
- E. Candes, J. Romberg, and T. Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information," IEEE Transactions Inform. Theory, vol. 52, no. 2, pp. 489-509, Feb. 2006.
- S. Muthukrishnan, "Data Streams: Algorithms and Applications," Foundations & Trends in Theoretical Computer Science, p. 135, 2005.
- E. J. Candes and T. Tao, "Decoding by linear programming," IEEE Trans on information theory, vol. 51, no. 12, pp. 4203-4215, Dec. 2005.
- S. G. Mallat and Z. Zhang, "Matching pursuit with time-frequency dictionaries," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397-3415, Dec. 1993.
- M. Figueiredo, D. Robert, and S. J. Wright, "Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems," IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586-597, Dec. 2007. https://doi.org/10.1109/JSTSP.2007.910281
- S. Chen, D. Donoho, and M. Saunders, "Atomic decomposition by basis pursuit," Society for Industrial and Applied Mathematics, vol. 43, no. 1, pp. 129-159, 2001.
- I. Daubechies, M. Defrise, and C. D. Mol, "An iterative thresholding algorithm for linear inverse problems with a sparsity constraint," Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413-1457, Dec. 2004. https://doi.org/10.1002/cpa.20042
- H. Zayyani, M. Babaie-Zadeh, and C. Jutten, "Bayesian pursuit algorithm for sparse representation," in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.
- D. Baron, S. Sarvotham, and R. G. Baraniuk, "Bayesian Compressive Sensing Via Belief Propagation," IEEE Transactions on Signal Processing, vol. 58, no. 1, pp. 269-280, Jan. 2010. https://doi.org/10.1109/TSP.2009.2027773
- M. W. Seeger, "Bayesian inference and optimal design for the sparse linear model," Journal of Machine Learning Research, vol. 9, pp. 759-813, May 2008.
- J. A. Tropp and A. C. Gilbert, "Signal recovery from random measurements via orthogonal matching pursuit," IEEE Trans on information theory, vol. 53, no. 12, pp. 4655-4666, Dec. 2007. https://doi.org/10.1109/TIT.2007.909108
- D. L. Donoho, Y. Tsaig, and I. Drori, "Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit," Statistics Technical Report Aerospace Corp., vol. 58, no 2, pp. 1094-1211, Feb. 2012.
- D. Needell and R. Vershynin, "Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit," Foundation of Computational Mathematics, vol. 9, pp. 317-334, Jan. 2009. https://doi.org/10.1007/s10208-008-9031-3
- D. Needell and J. A. Tropp, "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples," Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301-321, May 2009. https://doi.org/10.1016/j.acha.2008.07.002
- T. Blumensath and M. E. Davies, "Iterative hard thresholding for compressed sensing," Applied and Computational Harmonic Analysis, vol. 27, pp. 265-274, Feb. 2009. https://doi.org/10.1016/j.acha.2009.04.002
- R. Garg and R. Khandeka, "Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property," in Proc. of the 26th International Conference on Machine Learning, pp. 337-344, 2009.
- J. Wang, S. Kwon, and B. Shim, "Generalized orthogonal matching pursuit," IEEE Transactions on Signal Processing, vol. 60, pp. 6202-6216, Dec. 2012. https://doi.org/10.1109/TSP.2012.2218810
- R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical Society, vol. 73, no. 3, pp. 273-282, 2011. https://doi.org/10.1111/j.1467-9868.2011.00771.x
- W. Jiao, L. Fang, and J. Licheng, "Reconstruction of images from compressive sensing based on the stagewise fast LASSO," in Proc. of SPIE, vol. 7498, 2009.
- L. Lian, A. Liu, and V. K. N. Lau, "Weighted LASSO for Sparse Recovery with Statistical Prior Support Information," IEEE Transactions on Signal Processing, vol. 66, no. 6, pp. 1607-1618, 2018. https://doi.org/10.1109/tsp.2018.2791949
- C. Y. Yau and T. S. Hui, "LARS-type algorithm for group lasso," Statistics and Computing, vol. 27, pp. 1041-1048, 2017.
- B. Radley EFron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least Angle Regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004. https://doi.org/10.1214/009053604000000067
- S. S. Keerthi and S. Shevade, "A Fast Tracking Algorithm for Generalized LARS/LASSO," IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1826-1830, 2007. https://doi.org/10.1109/TNN.2007.900229
- M. A. Khajehnejad, W. Xu, A. Avestimehr, and B. Hassibi, "Analyzing Weighted l1 Minimization for Sparse Recovery with Nonuniform Sparse Models," IEEE Transactions on Signal Processing, vol. 59, pp.1985-2001, Jan. 2011. https://doi.org/10.1109/TSP.2011.2107904
- Z. Wang, K. Chen, M. Zhang, P. He, Y. Wang, P. Zhu, and Y. Yang. "Multi-scale aggregation network for temporal action proposals," Pattern Recognition Letters, vol. 122, no. 1, pp. 60-65, 2019.
- W. Lu and N. Vaswani, "Modified compressive sensing for real-time dynamic MR imaging," Proc. IEEE Int. Conf. Image Process, 2009.
- S. D. Babacan, R. Molina, and A. K. Katsaggelos, "Bayesian compressive sensing using Laplace priors," IEEE Transactions Image Process., vol. 19, no. 1, pp. 53-63, Jan. 2010. https://doi.org/10.1109/TIP.2009.2032894
- R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, "Model-based compressive sensing," IEEE Transactions Inf. Theory, vol. 56, no. 4, pp. 1982-2001, 2010. https://doi.org/10.1109/TIT.2010.2040894
- S. Som and P. Schniter, "Compressive imaging using approximate message passing and a Markov-tree prior," IEEE Transactions Signal Process, vol. 60, no. 7, pp. 3439-3448, 2012. https://doi.org/10.1109/TSP.2012.2191780
- D. Merhej, C. Diab, M. Khalil, and R. Prost, "Embedding prior knowledge within compressed sensing by neural networks," IEEE Transactions Neural Netw., vol. 22, no. 10, pp. 1638-1649, Oct. 2011. https://doi.org/10.1109/TNN.2011.2164810
- H. Q. Bui, C. N. H. La, and M. N. Da, "A fast tree-based algorithm for Compressed Sensing with sparse-tree prior," Signal Processing, vol. 108, pp. 628-641, 2015. https://doi.org/10.1016/j.sigpro.2014.10.026
- Z. Zhang, X. Ma, and Y. Yang, "Bounds on the number of hidden neurons in three-layer binary neural networks," Neural Networks, vol. 16, no. 7, pp. 995-1002, 2003. https://doi.org/10.1016/S0893-6080(03)00006-6
- N. Vaswani and W. Lu, "Modified-CS: Modifying compressive sensing for problems with partially known support," IEEE Transactions Signal Process., vol. 58, no. 9, pp. 4595-4607, Sep. 2010. https://doi.org/10.1109/TSP.2010.2051150
- N. Vaswani, "Kalman filtered compressed sensing," in Proc. of IEEE International Conference Image Process, 2008.
- M. A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi, "Analyzing Weighted Minimization for Sparse Recovery With Nonuniform Sparse Models," IEEE Transactions Signal Process, vol. 59, no. 5, pp. 1985-2001, May 2011. https://doi.org/10.1109/TSP.2011.2107904
- C. La and M. Do, "Tree-based Orthogonal Matching Pursuit algorithm for signal reconstruction," in Proc. of IEEE International Conference on Image Processing, pp. 1277-1280, May 2006.
- D. Mateo, J. Mauricio, R. Felipe, and V. Mauricio, "Compressed sensing of data with a known distribution," Applied and Computational Harmonic Analysis, vol. 45, no. 3, pp. 486-504, Nov. 2018. https://doi.org/10.1016/j.acha.2016.12.001
- D. Escoda, L. Granai, and P. Vandergheynst, "On the use of a priori information for sparse signal approximations," IEEE Transactions Signal Processing, vol. 54, no. 9, pp. 3468-3482, Sep. 2006. https://doi.org/10.1109/TSP.2006.879306
- N. Ay, J. Jost, H. V. Le, and L. Schwachhofer. Information Geometry, Springer, 2017.
- R.E. Kass and P.W. Vos, Geometrical Foundations of Asymptotic Inference, New York, USA: Wiley-Interscience, 1997.
- S. Amari, "Information geometry of the EM and em algorithms for neural networks," Neural Networks, vol. 8, no. 9, pp. 1379-1408, Dec. 1995. https://doi.org/10.1016/0893-6080(95)00003-8
- D. Meng and R. Liu, "Information geometry-Geometric methods for Computational Neuroscience," Acta Biophysica Sinica, vol. 15, pp. 243-248, May 1999. https://doi.org/10.3321/j.issn:1000-6737.1999.02.001
- M. Hu, Y. Yang, F. Shen, N. Xie, R. Hong, and H. T. Shen, "Collective Reconstructive Embeddings for Cross-Modal Hashing," IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2770-2784, 2019. https://doi.org/10.1109/tip.2018.2890144
- Z. Yang and J. Laaksonen, "Principal whitened gradient for information geometry," Neural Networks, vol. 21, pp. 232-240, Nov. 2008. https://doi.org/10.1016/j.neunet.2007.12.016
- S. Amari, K. Kurata, and H. Nagaoka, "Information geometry of Boltzmann machines," IEEE Transactions on Neural Networks, vol. 3, no. 2, pp. 260-271, Jan. 1992.
- S. Amari, "Natural gradient works efficiently in learning," Neural Computation, vol. 10, pp. 251-276, Sep. 1998. https://doi.org/10.1162/089976698300017746
- S. Amari, "Fisher information under restriction of Shannon information in multi-terminal situations," Annals of the Institute of Statistical Mathematics, vol. 41, pp. 623-648, Jan. 1989. https://doi.org/10.1007/BF00057730
- L. Campbell, "The relation between information theory and the differential geometry approach to statistics," Information Sciences, vol. 35, no. 3, pp. 199-210, Feb. 1985. https://doi.org/10.1016/0020-0255(85)90050-7
- S. Amari, "Information geometry on hierarchy of probability distributions," IEEE Transactions on Information Theory, vol. 47, no. 5, pp. 1701-1711, Mar. 2001.
- M. Wang, C. B. Xiao, and Z. H. Ning, "Neural Networks for Compressed Sensing Based on Information Geometry," Circuits Systems and Signal Process, vol. 38, no. 2, pp. 569-589, Feb. 2019. https://doi.org/10.1007/s00034-018-0869-6
- M. Hu, Y. Yang, F. Shen, N. Xie, R. Hong, and H. T. Shen, "Collective Reconstructive Embeddings for Cross-Modal Hashing," IEEE Transactions on Image Processing, vol. 28, no. 6, pp. 2770-2784, 2019. https://doi.org/10.1109/tip.2018.2890144
- S. He, B. Wang, Z. Wang, Y. Yang, F. Shen, Z. Huang, and H. T. Shen, "Bidirectional Discrete Matrix Factorization Hashing for Image Search," IEEE Transactions on Cybernetics, vol. 5, no. 9, pp. 4157-4168, 2020.
- B. R. Frieden, Science from Fisher Information: A Unification, Cambridge Univ. Press, 2004.
- F. Nielsen and V. Garcia, "Statistical exponential families: A digest with flash cards," arXiv.org:0911.4863, 2009.
- M. Menendez, D. Morales, L. Pardo, and M. Salicrij, "Statistical tests based on geodesic distances," Applied Mathematics Letters, vol. 8, no. 1, pp. 65-69, Jun. 1995. https://doi.org/10.1016/0893-9659(94)00112-P
- H. Quan and G. Mohammed, "The accuracy of PSNR in predicting video quality for different video scenes and frame rates," Telecommunication Systems, vol. 49, pp. 35-48, Feb. 2012. https://doi.org/10.1007/s11235-010-9351-x