DOI QR코드

DOI QR Code

Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow

  • Xiao, Zhuolei (College of Electronic and Optical Engineering & College of Microelectronics, Nanjing University of Posts and Telecommunications) ;
  • Zhang, Yerong (College of Electronic and Optical Engineering & College of Microelectronics, Nanjing University of Posts and Telecommunications) ;
  • Yang, Jie (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)
  • Received : 2020.06.19
  • Accepted : 2020.11.15
  • Published : 2020.11.30

Abstract

Phase retrieval, recovering a signal from phaseless measurements, is generally considered to be an NP-hard problem. This paper adopts an amplitude-based nonconvex optimization cost function to develop a new stochastic gradient algorithm, named stochastic reweighted phase retrieval (SRPR). SRPR is a stochastic gradient iteration algorithm, which runs in two stages: First, we use a truncated sample stochastic variance reduction algorithm to initialize the objective function. The second stage is the gradient refinement stage, which uses continuous updating of the amplitude-based stochastic weighted gradient algorithm to improve the initial estimate. Because of the stochastic method, each iteration of the two stages of SRPR involves only one equation. Therefore, SRPR is simple, scalable, and fast. Compared with the state-of-the-art phase retrieval algorithm, simulation results show that SRPR has a faster convergence speed and fewer magnitude-only measurements required to reconstruct the signal, under the real- or complex- cases.

Keywords

1. Introduction

Phase retrieval, recovering a signal from given intensity or magnitude only measurements, is reflected in various fields of science and engineering, such as astronomy [1], electron microscopy [2], optics [3], and X-ray crystallography [4]. Generally, in optical imaging, due to optical detectors' physical limitations [5], optical detection equipment such as photosensitive film, charge-coupled device (CCD) camera, or human eye can only record the intensity but not the phase of a light wave. However, for an image, the structural information carried by its phase is far greater than that taken by its intensity. Hence it is challenging to reconstruct the signal from the intensity- or magnitude-only measurements are of great significance.

Mathematically speaking, Phase Retrieval is the problem of recovering 𝑁𝑁-dimensional real or complex signal 𝒙𝒙 from 𝑀𝑀 linear transformation (or Fourier transform) measurements

\(y_{i}=\left|\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right|^{2}, \quad 1 \leq i \leq m\)       (1)

where the given measurements vector 𝒚 ∶= [𝑦1 ⋯ 𝑦𝑚]𝑇 and the sensing/feature vector 𝒂𝒊 ∈ ℝ𝑛 or ℂ𝑛 are known, and the signal vector 𝒙 ∈ ℝ𝑛 or ℂ𝑛 is the wanted unknown vector. It is worth noting that 𝑦𝑖 denotes the intensity or the squared modulus, whereas \(\psi_{i}:=\sqrt{y_{i}}\) denotes the magnitude vector or the amplitude vector. Phrased differently, we want to uniquely recover 𝒙 (up to the global sign) from the system of quadratic equations (1), or retrieve the sign or phase of \(\left\{\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right\}_{i=1}^{m}\) under the setting of real or complex values. Due to the loss of sign or phase information, to recover 𝒙 exactly, the number of measurements 𝑚𝑚 and the dimension 𝑛𝑛 of the wanted original signal should satisfy m ≥ 2n − 1 or 𝑚 ≥ 4𝑛 − 4 when 𝒂𝑖, 𝒙 ∈ ℝ𝑛 or ℂ𝑛 , respectively, while 𝑚 = 2𝑛 − 1 is also necessary [6], [7]. Nevertheless, the quadratic equations in (1) is still an NP-hard problem, and its solution is always challenging.

1.1 Related Works

The measurements are generally corrupted with noise. When these additional noises follow the Gaussian distribution, we consider using the least-squares method (which is consistent with the maximum likelihood estimate of the original signal), the problem of solving systems of quadratic equations in (1) can be rewritten as the following three different empirical loss functions: the intensity-based empirical loss [8], [9]

\(\operatorname{minimize}_{z \in \mathbb{R}^{n} / \mathbb{C}^{n}} \frac{1}{2 m} \sum_{i=1}^{m}\left(y_{i}-\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|^{2}\right)^{2},\)       (2)

and its related Poisson likelihood [10]

\(\operatorname{minimize}_{\mathrm{z} \in \mathbb{R}^{n} / \mathbb{C}^{n}} \frac{1}{2 m} \sum_{i=1}^{m}\left|\left\langle\boldsymbol{a}_{\boldsymbol{i}}, \mathbf{z}\right\rangle\right|^{2}-y_{i} \log \left(\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|^{2}\right),\)       (3)

and the amplitude-based empirical loss [11]

\(\operatorname{minimize}_{\mathbf{z} \in \mathbb{R}^{n} / \mathrm{C}^{n}} \frac{1}{2 m} \sum_{i=1}^{m}\left(\psi_{i}-\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|\right)^{2}\)       (4)

Obviously, due to the quadratic terms in (2) and (3), or the modulus in (4), these empirical loss functions are non-convex, so there will be many stationary points when minimizing the objective function, so this problem is NP-hard. In the 1-dimensional (1D) phase retrieval, there is no uniqueness based on spectral factorization, i.e., even with trivial ambiguities (global phase shift, conjugate inversion, and spatial shift) [12], there are multiple solutions with the same magnitude. There are several approaches to overcome this ill-posed problem. We can add constraints to the signal to be sought, such as sparsity or non-negativity [13]–[16]; Another solution is to introduce redundancy for the measurements, including shorttime Fourier transform [17], [18], coded diffraction patterns using random masks and structured illuminations [19], or random measurement assumption [10], [11], [20]–[25]. In this paper, we will adopt this assumption {𝜓𝑖} with random Gaussian {𝒂𝑖} designs, which is independently and identically distributed (i.i.d.). It’s 𝒂𝑖~𝒩(𝟎, 𝑰𝑛) for the real-valued Gaussian setting, and 𝒂𝑖~𝒞𝒩(𝟎, 𝑰𝑛) ∶= 𝒩(𝟎, 𝑰𝑛⁄2) + 𝑗𝒩(𝟎, 𝑰𝑛⁄2) for the complexvalued one.

There are two ways to solve such problems: convex and nonconvex ones. Based on the so-called matrix-lifting technique, the convex ones express solving semidefinite programming (SDP), such as PhaseLift [22], PhaseCut [26], and CoRk [27]. Other convex approaches rely on convex relaxation to reformulate the problem as sparse recovery, solving a basis pursuit problem in the dual domain, such as PhaseMax [28], SPARTA [29]. On the other hand, the nonconvex approaches include alternating projection (e.g. Gerchberg-Saxton, Fineup method) [30], [31], AltMinPhase [21], Wirtinger Flow and its variants (WF/TWF/RWF) [10], [20], [32], trust-region [33], proximal linear algorithms [33], [34], (stochastic) truncated amplitude flow variants (TAF/RAF/STAF) [11], [23]–[25]. These non-convex methods run directly on vector optimized variables, so they have significant computational advantages over convex methods based on matrix lifting techniques.

However, although non-convex optimization can obtain the best statistical guarantee in no-noise or noisy environment, even if random Gaussian design is introduced, non-convex optimization will still have several shortcomings: First, there will be a lot of stationary points in the non-convex optimization process, which will bring a lot of trouble to our calculation process, so that the optimization often falls into the local optimal, Our task is to develop an algorithm that approximates the global optimal based on existing algorithms; More troublesome is the existence of saddle points, which often makes optimization dead, how to get rid of saddle points is also one of the primary purposes of this paper; It is well known that there is an information theory limit 𝑚 = 2𝑛 − 1 or 4𝑛 − 4, hence it will be very challenging to develop a practical algorithm that can achieve perfect recovery and optimal statistical accuracy recovery when the number of measurements is close to the limit of information theory; Finally, the current application and data scale are getting larger and larger, the existing incremental gradient method [35] and stochastic gradient method [23], [36] can no longer meet the requirements of accurate recovery and convergence speed, it is also challenging to increase the convergence speed and reduce the computational complexity of the algorithm in largescale applications.

1.2 Our Contribution

Inspired by TAF, this paper uses a nonconvex optimization formulation based on amplitude to develop a new stochastic gradient algorithm, referred to stochastic reweighted phase retrieval (SRPR). SRPR is an iterative algorithm based on and extends TAF [11] to be applied to large-scale phase retrieval applications such as imaging. There are two stages of the proposed algorithm: The first stage is the initialization stage, which adopts a sample stochastic variance reduction algorithm to initialize the objective function. Like TAF, we use the orthogonal promotion to determine the truncated threshold. In the gradient refinement stage, we use the weighted gradient method to update the estimated initial value, and the stochastic gradient method is used to accelerate the process.

Even in a high-dimensional environment, SRPR can perfectly reconstruct any signal given a minimum number of equations (𝑚 = 2𝑛 − 1) without any additional assumptions. This paper aims to use a stochastic variance reduction gradient (SVRG) variant algorithm to estimate the initialization stage's initial value. Compared with the classic SVRG algorithm [23] and power method using truncated procedure [11], the new algorithm retains a gradient snapshot at each iteration to replace the full gradient. It can significantly reduce the calculation time. In the gradient refinement phase, we use the stochastic gradient method to update the estimated initial value, and the truncated rule is used to eliminate the wrong estimate. By calculating these two stages, even under the condition of approaching the limit of information theory, SRPR can still accurately recover the signal. At the same time, in such relatively largescale dimensions as image applications, SRPR is superior to existing algorithms in terms of accurate recovery performance and convergence speed, such as (T) AF [11], ITWF [37], (T)WF [8], [10], RAF [24], [25] and STAF [23].

The rest of this article is organized as follows. Section 2 describes and analyzes the amplitude-based loss function, describes the two stages of the SRPR algorithm in detail. Furthermore, simulated tests comparing the SRPR algorithm with (T)WF [8], [10], TAF [11], RAF [24], [25], STAF [23] are presented in Section 3.

Notation: Lower-(upper-) case boldface letters denote vectors (matrices). The symbol 𝒯 (ℋ) denotes transposition (conjugate transposition). The symbol ⌈∙⌉ stands for the largest integer no more significant than the given number. If 𝒙 ∈ ℝ𝒏 is the true solution of (1), then −𝒙 is also. We define Euclidean distance of any estimate 𝒛 ∈ ℝ𝒏 to the true solution set {±𝒙} of (1) as dist(𝒛, 𝒙 ) ∶= min‖𝒛 ± 𝒙‖.

2. Stochastic Reweighted Phase Retrieval

SRPR is based on TAF [11], and the stochastic gradient descent method is used in the initialization and gradient refinement stages. Therefore, we will detail SRPR from two stages.

For concreteness, we mainly discuss real-valued Gaussian models with a real signal 𝒙 ∈ ℝ𝒏 and i.i.d. Gaussian random designs 𝒂𝑖~𝒩(𝟎, 𝑰), 1 ≤ 𝑖 ≤ 𝑚. Nevertheless, SRPR still applies to complex-valued Gaussian models with a complex vector 𝒙 ∈ ℂ𝒏 and i.i.d. design vectors 𝒂𝑖~𝒞𝒩(𝟎, 𝑰𝑛) ∶= 𝒩(𝟎, 𝑰𝑛⁄2) + 𝑗𝒩(𝟎, 𝑰𝑛⁄2). In this paper, using the least-squares criterion, we can convert solving the phaseless quadratic equation in (1) to minimize the amplitude-based empirical loss function

\(\underset{\mathbf{z} \in \mathbb{R}^{n}}{\operatorname{minimize}} \ell(\mathbf{z}):=\frac{1}{2 m} \sum_{i=1}^{m}\left(\psi_{i}-\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|\right)^{2}\)       (5)

2.1 Sample Stochastic Variance Reducing Initialization

In this section, we will elaborate on the initialization stage. Since (5) is non-convex and non-smooth, it makes sense to obtain a good initialization. We developed a new initialization method based on the orthogonality-promoting initialization (OPI) method in TAF. Specifically, since any vectors are approximately orthogonal in pairs in high-dimensional space [11], we adopt

\(\cos ^{2} \theta_{i}:=\frac{\left|\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right|^{2}}{\left\|\boldsymbol{a}_{i}\right\|^{2}\|\boldsymbol{x}\|^{2}}=\frac{\psi_{i}^{2}}{\left\|\boldsymbol{a}_{i}\right\|^{2}\|\boldsymbol{x}\|^{2}}, \quad 1 \leq i \leq m\)      (6)

to represent the geometric relationship between 𝒂𝑖 and 𝒙, where 𝜃𝑖 represents the angle between 𝒂𝑖 and 𝒙, The larger 𝜃𝑖 , the smaller cos2 𝜃𝑖 , indicating the more orthogonal between 𝒂𝑖 and 𝒙. Based on this feature, we can approximate 𝒙 by finding the most orthogonal vector 𝒛 to {𝒂𝑖}𝑖∈ℐ0, where the index set ℐ0 includes indexes of 𝒂𝑖 that make cos2 𝜃𝑖 the smallest.

Without loss of generality, assuming ‖𝒙‖ = 1, we find the estimated vector \(\tilde{\mathbf{z}}_{0}\) through

\(\tilde{\mathbf{z}}_{0}:=\arg \operatorname{minimize}_{\|\mathbf{z}\|=1}{\mathbf{z}}^{\mathcal{T}} \boldsymbol{Y} \mathbf{z}:=\mathbf{z}^{\mathcal{T}}\left(\frac{1}{\left|\mathcal{J}_{0}\right|} \sum_{i \in \mathcal{J}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\right) \boldsymbol{z}\)       (7)

where \(Y:=\frac{1}{\left|{J}_{0}\right|} \sum_{i \in \mathcal{J}_{0}} \frac{\boldsymbol{a}_{\boldsymbol{i}} \boldsymbol{a}_{i}^{T}}{\left\|\boldsymbol{a}_{\boldsymbol{i}}\right\|^{2}},\left|\mathcal{J}_{0}\right|\) is on the order of n. On the other hand, when ‖𝒙‖ ≠ 1, we can scale \(\tilde{\mathbf{z}}_{0}\) by the norm estimation of 𝒙 to obtain the current estimate \(\boldsymbol{z}_{0}:=\sqrt{\frac{1}{m} \sum_{i=1}^{m} y_{i}} \tilde{\mathbf{z}}_{0}\)[11].

Solving (7) is equivalent to solving the eigenvector corresponding to the smallest eigenvalue of Y , which requires eigendecomposition or matrix inversion, and the computational complexity is 𝒪(𝑛3). Define \(\overline{\mathcal{J}}_{0}\) as the complement of ℐ0 in [m]. In order to reduce the computational complexity, according to SLLN, for the i.i.d. standard Gaussian random matrix \(\left\{\boldsymbol{a}_{i} \sim \mathcal{N}\left(0, I_{n}\right)\right\}_{i=1}^{m}\), there are

\(\frac{1}{m} \sum_{i=1}^{m} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}} \approx \mathbb{E}\left[\frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\right]=\frac{1}{n} I_{n}\)       (8)

where 𝔼[∙] expresses expectations. Thus

\(\sum_{i \in J_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}=\sum_{i \in[m]} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}-\sum_{i \in \bar{J}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}} \cong \frac{m}{\mathrm{n}} I_{n}-\sum_{i \in \bar{J}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\)       (9)

Defining  \(\boldsymbol{B}=\left[\boldsymbol{a}_{i} /\left\|\boldsymbol{a}_{i}\right\|\right]_{i \in \bar{J}_{0}}\), the eigenvector corresponding to the minimum eigenvalue of solving 𝒀 in (7) can be reduced to the eigenvector corresponding to the largest eigenvalue of solving \(\overline{\boldsymbol{Y}}=\boldsymbol{B} \boldsymbol{B}^{\mathcal{T}}\), i.e.

\(\tilde{\mathbf{z}}_{0}:=\arg \max _{\|\mathbf{z}\|=1} \mathbf{z}^{\mathcal{T}} \overline{\boldsymbol{Y}} \boldsymbol{z}:=\mathbf{z}^{\mathcal{T}}\left(\frac{1}{\left|\overline{\mathcal{J}}_{0}\right|} \sum_{\mathbf{i} \in \overline{\bar{J}}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\right) \mathbf{z}\)        (10)

which is a typical principal component analysis (PCA) problem. On the one hand, WF/TWF/TAF [8], [10], [11] uses the power method to solve this problem. The computational complexity required by the power method to obtain 𝜖𝜖-accurate solutions is \(O\left(\frac{1}{\delta} \mathrm{n}\left|\overline{\mathcal{J}}_{0}\right| \log (1\epsilon))\right.\), where the eigengap δ > 0 is defined as the difference between the largest and secondlargest eigenvalues [40]. However, the power method is not suitable for relatively large dimensional data such as image applications, especially applications with small eigengaps. On the other hand, some stochastic algorithms are used to solve (10). STAF uses variancereducing orthogonality-promoting initialization (VR-OPI) algorithm [11], which is a stochastic variance reduction algorithm (SVRG) [38]. It consists of a series of epochs, which contains a large number of iterative updates. VR-OPI scans the entire dataset every time the epoch starts and calculates a snapshot of the full gradient. And for each iteration in one epoch, the algorithm randomly selects an instance from \(\left|\bar{\jmath}_{0}\right|\) to calculate the gradient, and combines the gradient of the outer loop epochs and the inner loop iterations to reduce the variance of the random gradient and achieve effective convergence [39]. However, if there are too many instances of the data set and the data dimension is very large, since the calculation of the full gradient in each epoch requires the scan the entire dataset, the snapshot of the full gradient results in a large number of gradient calculations, which is very time-consuming. If fullgradient snapshots can be avoided in each epoch, VR-OPI [11] can be significantly accelerated, which will bring better convergence performance.

This paper uses a new acceleration algorithm sampleVR [40] to solve the orthogonal acceleration initialization problem in (10), we named it sample stochastic variance reducing orthogonality-promoting initialization (SSVR-OPI), summarized in Algorithm 1. Specifically, SSVR-OPI iteratively updates two sets of loops. Each execution of the inner loop is called an iteration, and each execution of the outer loop is called an epoch. SSVR-OPI has a total of S epochs, and each epoch contains T iterations. The core of the algorithm is to estimate the snapshot of the full gradient  \(\frac{1}{\left|\bar{\jmath}_{0}\right|} \sum_{j=1}^{\left|\bar{\jmath}_{0}\right|} \boldsymbol{a}_{i_{j}} \boldsymbol{a}_{i_{j}}^{\mathcal{T}} \widetilde{\boldsymbol{\omega}}_{s+1}\)  through the stochastic gradient  \(g=\frac{1}{k} \sum_{j=1}^{k} \boldsymbol{a}_{i_{j}} \boldsymbol{a}_{i_{j}}^{\mathcal{T}} \widetilde{\boldsymbol{\omega}}_{S+1}\)  in step 7, which replaces the snapshot of the full gradient in each epoch in the VR-OPI [11] algorithm, thereby reducing the computational complexity. The stochastic gradient g is obtained by calculating k \(\mathrm{k}\left(\mathrm{k}<<\left|\bar{\jmath}_{0}\right|\right)\) sample instances from the inner loop, representing the average value of the stochastic gradient of k instances, and uses it as the progressive direction of the next outer loop used in Step 4.

We can rewrite (10) as

\(\mathrm{F}(\boldsymbol{\omega})=\omega^{\mathcal{T}} \overline{\boldsymbol{Y}} \omega=\frac{1}{\left|\overline{\mathcal{J}}_{0}\right|} \sum_{\mathrm{i} \in \overline{\mathcal{J}}_{0}} f_{i}(\boldsymbol{\omega})\)         (11)

where  \(f_{i}(\boldsymbol{\omega})=\boldsymbol{\omega}^{\mathcal{T}} \boldsymbol{a}_{i_{t}} \boldsymbol{a}_{i_{t}}^{\mathcal{T}} \boldsymbol{\omega}\)  is L-Liptchiz continuous, i.e., for all 𝝎𝑖 and 𝝎𝑗 , there is  \(\left\|f_{i}\left(\boldsymbol{\omega}_{i}\right)-f_{i}\left(\boldsymbol{\omega}_{j}\right)\right\| \leq L\left\|\boldsymbol{\omega}_{i}-\boldsymbol{\omega}_{j}\right\|\). The following Proposition 1 [39, Theorem2] shows that SSVR-OPI causes the optimization goal to converge at a linear rate, up to a constant neighborhood of the optimal, proven in [40]. Since SSVR-OPI does not calculate the full gradient in each epoch, but only calculates the average value of the stochastic gradient of k instances, it is significantly better than other stochastic algorithms in terms of computational complexity. These advantages can be found in the simulation test in Section III.

Proposition 1 [40]: Let k represents the number of sampled instances for the sth epoch. With any μ > 0, when \(\delta=\frac{1+4 L \mu\left|\bar{\jmath}_{0}\right| \eta^{2}}{\mu\left|\bar{\jmath}_{0}\right| \eta\left(1-2 \eta L-\frac{\eta}{\mu}\right)}<1\) holds, F(𝝎) converges as

\(\mathbb{E}\left[F\left(\widetilde{\boldsymbol{\omega}}_{s+1}\right)-F\left(\widetilde{\boldsymbol{\omega}}_{*}\right)\right] \leq \delta \mathbb{E}\left[F\left(\widetilde{\boldsymbol{\omega}}_{s}\right)-F\left(\widetilde{\boldsymbol{\omega}}_{*}\right)\right]+\mathcal{O}\left(\left(\frac{\left(\overline{\mathcal{J}}_{0} \mid-k\right.}{\left|\bar{\jmath}_{0}\right|}\right)^{2}\right)\)       (12)

Although the full gradient estimate g is an unbiased estimate, there will still be a variance from the proper full gradient, especially when the estimated value is close to the optimum. We can improve this by increasing the number k of sampling instances. An increase in k means that there will be an additional computational cost when estimating the full gradient g. In contrast, a decrease in k means that the variance between the full gradient and the estimate increases. Here we need to effectively balance the value of k between computational consumption and variance. 

Let

\(g=\frac{1}{k} \sum_{j=1}^{k}\left\|\nabla f_{i}\left(\widetilde{\boldsymbol{\omega}}_{s+1}\right)\right\|=\frac{1}{k} \sum_{j=1}^{k}\left\|\boldsymbol{a}_{i_{j}} \boldsymbol{a}_{i_{j}}^{T} \widetilde{\boldsymbol{\omega}}_{s+1}\right\|,\)       (13)

and its expectation is \(\mathbb{E}(g)=\frac{1}{\left|\bar{j}_{0}\right|} \sum_{j=1}^{\left|\bar{\jmath}_{0}\right|}\left\|\nabla f_{i}\left(\widetilde{\boldsymbol{\omega}}_{s+1}\right)\right\|\) . 𝑓𝑖(𝝎) is L-Lipchitz continuous, that is, ‖∇𝑓𝑖(𝝎)‖ ≤ 𝐿. According to Hoeffding’s Inequality, we have

\(\alpha=P(|g-\mathbb{E}(g)| \geq t) \leq 2 e^{\frac{-2 k^{2} t^{2}}{\sum_{i=1}^{k}(L-0)^{2}}}=2 \mathrm{e}^{\frac{-2 k^{2} t^{2}}{k L^{2}}}=2 \mathrm{e}^{\frac{-2 k t^{2}}{L^{2}}},\)       (14)

𝛼 denotes the significance level, 1 − α is the confidence interval of [−𝑡,𝑡], such k should satisfy

\(k \leq-L^{2} \frac{\log \frac{\alpha}{2}}{2 t^{2}}=-\frac{\log \frac{\alpha}{2}}{2 t^{2} / L^{2}}=-\frac{\log \frac{\alpha}{2}}{\epsilon}\)       (15)

where 𝜖 = 2𝑡2 /𝐿2  is a positive real number. In this way, the value of k will be determined by ϵ. With the increase of iterations, the decrease of ϵ causes a continuous increase of k. The variance of the estimated value continues to decrease, to achieve the purpose of significant convergence.

q.PNG 이미지qq.PNG 이미지

2.2 Stochastic Reweighted Gradient Stage

The amplitude-based empirical loss function

\(\ell(\mathbf{z})=\frac{1}{2 m} \sum_{i=1}^{m} \ell_{i}(\mathbf{z})=\frac{1}{2 \mathrm{~m}} \sum_{i=1}^{m}\left(\psi_{i}-\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|\right)^{2}\)       (16)

in (5) is a non-convex non-smooth function, and its gradient is

\(\nabla \ell(\mathbf{z})=\frac{1}{m} \sum_{i \in m} \nabla \ell_{i}(\mathbf{z})=\frac{1}{m} \sum_{i \in m}\left(\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}-\psi_{i} \frac{\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}}{\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|}\right) \boldsymbol{a}_{i}\)       (17)

Since \(\frac{a_{i}^{\mathcal{T}} \mathbf{z}}{\left|\boldsymbol{a}_{i}^{T} \mathbf{z}\right|}\) introduce a bias in the update direction, such as \(\frac{a_{i}^{\mathcal{T}} \mathbf{z}_{t}}{\left|\boldsymbol{a}_{i}^{T} \mathbf{z}_{t}\right|}\)\(\frac{a_{i}^{T} x}{\left|\boldsymbol{a}_{i}^{T} x\right|}\) , which may cause the Wirtinger derivative to be too large, resulting in a "bad/misleading" search direction. These gradients may cause z to move in the opposite direction to x, especially when approaching the information theory limit 𝑚 = 2𝑛 − 1 or 𝑚 = 4𝑛 − 4.

There are many ways to solve this problem. TAF [11] introduces a truncated rule to detect and remove these "bad" gradient components to ensure the right search direction. However, this method may cause too many gradient components to be lost or retain defective gradient components, thereby affecting the search direction. STAF [41] goes a step further. It refines the initial estimation by the stochastic truncated gradient iteration. Each iteration evaluates the current generalized gradient and randomly selects from the gradient components or with a given probability. This truncated rule is very likely to reject those "bad" gradient component. RAF [24], [25] takes a different approach and adopts the (timely) adaptive weighted gradient method by adding different weights 𝜔𝑖 to each gradient vector ∇𝑙𝑖(𝒛) of the current point 𝒛𝑡, to find a suitable search direction for the current iteration process, i.e.

\(\mathbf{z}_{t+1}=\mathbf{z}_{t}-\mu \nabla \ell(\mathbf{z})=\mathbf{z}_{t}-\frac{\mu}{m} \sum_{i \in m} \omega_{i} \nabla \ell_{i}(\mathbf{z})\)       (18)

The weights 𝜔𝜔𝑖𝑖 corresponding to different gradient components ∇𝑙𝑖(𝒛) are closely related to the truncated rule  \(\mathcal{L}:=\left\{1 \leq i \leq m: \frac{\left|\boldsymbol{a}_{i}^{T} \mathbf{z}\right|}{\left|a_{i}^{T} x\right|} \geq \frac{1}{1+\gamma}\right\}\)  in TAF. Studies in [11] have shown that the gradient component corresponding to a large  \(\frac{\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|}{\left|\boldsymbol{a}_{\boldsymbol{i}}^{\mathcal{T}} \boldsymbol{x}\right|}\) will have a great probability of pointing in the direction of the true value x. If  \(\frac{\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|}{\left|\boldsymbol{a}_{\boldsymbol{i}}^{\mathcal{T}} \boldsymbol{x}\right|}\)  is regarded as the confidence parameter α of the corresponding gradient component, then according to the difference of the confidence parameter α, the corresponding gradient components can be distinguished, so as to determine the contribution of these gradient components to the overall search direction.

The method of TAF/STAF is to discard the gradient components corresponding to the relatively small α through the truncated rule. RAF adds weights to all gradient components. The gradient components with large confidence parameters are added with large weights, and small weights are added to those "bad" gradient components. Based on the truncated rule, the weight corresponding to the gradient component 𝜔𝑖 is

\(\omega_{i}:=\frac{1}{1+\beta_{i} /\left(\left|\boldsymbol{a}_{\boldsymbol{i}}^{*} \mathbf{z}\right| /\left|\boldsymbol{a}_{\boldsymbol{i}}^{*} \boldsymbol{x}\right|\right)}, \quad 1 \leq i \leq m\)       (19)

where {𝛽𝑖}1≤𝑖≤𝑚 are pre-selected parameters. In this way, compared with TAF/STAF, adding the weight 𝜔𝑖 not only strengthens the importance of good gradient components, but also retains the information of weak gradient components.

In large-scale applications, due to the high data dimension (more samples), the time cost of passes over the entire data by the classic gradient descent (GD) method is very high. Hence we adopt the stochastic gradient descent (SGD) method with fast convergence speed and low calculation complexity to refine the gradient. On the one hand, the number of measurements required for phase retrieval to achieve accurate recovery has not reached the information theory limit 𝑚 = 2𝑛 − 1. TWF needs a ϵ-accurate solution to obtain 𝑚 ≥ 4.5𝑛 noise-free measurement value, while TAF is about 3𝑛𝑛. In contrast, STAF, which also adopts a stochastic algorithm in the gradient stage, only needs about 2.3𝑛𝑛 measurement value.

On the other hand, as can be seen from [33], the phase retrieval problem is essentially a non-convex optimization problem. Therefore, there are many saddle points in its search area, and the gradient iteration will most likely fall into the saddle point. SGD method can automatically escape from the saddle point and the poor local best and converge globally to a local minimum with strong generalization.

Therefore, to better apply the RAF-based gradient optimization method to large-scale data, we introduce the gradient method into the weighted gradient iteration and refine the initial estimate 𝒛0 through this stochastic reweighted gradient iteration method to accelerate the gradient iteration stage. Specifically, SRPR uses the following stochastic reweighted gradient iteration to update continuously z

\(\mathbf{z}_{t+1}=\mathbf{z}_{t}-\mu \omega_{i} \nabla \ell_{k_{t}}\left(\mathbf{z}_{t}\right)=\mathbf{z}_{t}-\mu \omega_{i}\left(\boldsymbol{a}_{k_{t}}^{\mathcal{T}} \mathbf{z}_{t}-\psi_{i} \frac{\boldsymbol{a}_{k_{t}}^{\mathcal{T}} \mathbf{z}_{t}}{\left|\boldsymbol{a}_{k_{t}}^{T} \mathbf{z}_{t}\right|}\right) \boldsymbol{a}_{k_{t}}\)       (20)

where the step size 𝜇𝜇 is set to a constant. Each iteration of SRPR selects the current gradient component function ∇ℓ𝑘𝑡 (𝒛𝑡) by randomly sampling the index 𝑘𝑡, where t is the current iteration number, and the sampling method of the index 𝑘𝑘𝑡𝑡 can be random sampling from {1,2, ⋯ 𝑚}, or sampling with a given probability. After obtaining the gradient component ∇ℓ𝑘𝑡 (𝒛𝑡), we use (19) to evaluate the additional weight 𝜔𝑖. The proposed SRPR algorithm is expressed in Algorithm 2.

111.JPG 이미지

It is worth emphasizing that this stochastic reweighted gradient method is very effective. If the randomly selected gradient component is reliable, its additional weight will be higher, which will make a more outstanding contribution to the search direction and iterative process. Conversely, if it is an unreliable gradient component, its additional weight will be minimal, resulting in almost no refinement in the current iteration. Moreover, if the current iteration point zt satisfies\(\boldsymbol{a}_{\boldsymbol{i}}^{\mathcal{T}} \boldsymbol{Z}_{t}=0\), that is, there is a nonsmooth item, its weight is set to 𝜔𝑖 = 0 to eliminate the nonsmooth item and prevent it from affecting the search direction and the iteration process. The specific experimental results will be presented in the experimental part of Section 3.

3. Simulation Results

This section presents the evaluation of the performance of the proposed algorithm SRPR relative to TWF [8], [10], TAF [11], RAF [24], [25], and STAF [23]. To be fair, the parameters in each algorithm have adopted their suggested values. We use the relative error∶= dist(𝒛, 𝒙) / ‖𝒙‖2 as the performance metric of the algorithm, where dist(𝒛, 𝒙) ∶= min‖𝒛 ± 𝒙‖ is the Euclidean distance modular constant of the estimation 𝒛 ∈ ℝ𝒏 to the true solution {±𝒙}. All simulation tests are performed 100 independent Monte Carlo trials. In each trial, the algorithm runs 100 initialization iterations, while the number of gradient refinement iterations are 1000. It is worth emphasizing that each iteration of stochastic algorithms such as STAF and SRPR is equivalent to m stochastic iterations over the entire data. A success is declared in a trial when the returned estimate attains a relative error less than 10−5.

The first experimental comparison is in the initialization stage, including the Power Method in TAF, the VR-OPI algorithm in STAF, and the SSVR-OPI algorithm proposed in this paper. We compare two aspects of exact recovery and convergence speed to observe these initialization algorithms' performance in large-scale applications. We use real- or complex-valued Gaussian models with a data dimension of n = 10000, and all tests are based on known theoretical limits of information, i.e., 𝑚 = 2𝑛 − 1 in the real case and 𝑚𝑚 = 4𝑛 − 4 in the complex case. The first comparison is the error evolution of the iterates used in the initialization algorithm; that is, the number of iterations used by each algorithm to obtain its exact solution. The error is expressed in logarithmic form, defined as \(\log _{10}(1-\left.\left\|\boldsymbol{B}^{\mathcal{T}} \boldsymbol{\omega}_{i}\right\|^{2} /\left\|\boldsymbol{B}^{T} \boldsymbol{u}_{0}\right\|^{2}\right)\), where 𝒖0 is the exact leading eigenvector of \(\overline{\boldsymbol{Y}}=\boldsymbol{B} \boldsymbol{B}^{\mathcal{T}}\) in (10) accurately calculated by SVD.

Fig. 1 shows the error evolution capabilities of Power Method, VR-OPI, and SSVR-OPI. To be honest, due to the limit of the number of iterations, each iteration of SSVR-OPI only estimates the gradient's snapshot, which does not pass the entire data to estimate the entire gradient-like VR-OPI. Hence the exact recovery effect of SSVR-OPI is slightly weaker than VR-OPI but significantly higher than the power method. On the other hand, because SSVROPI does not require passing through all data for each iteration like VR-OPI, it requires less computational complexity or time consumption, about 0.813s per iteration whereas 1.129s in VR-OPI under the real case as shown in Table 1.

Fig. 1. Error evolution of the iterates using different initialization to solve the problem (11): (top) Noiseless real-valued Gaussian model with a data dimension of n = 10000 and 𝑚 = 2𝑛 − 1; (bottom) Noiseless complex-valued Gaussian model with a data dimension of n = 10000 and 𝑚 = 4𝑛 − 4.

Table 1. Comparison of time cost between VR-OPI and SSVR-OPI​​​​​​​

In the second experiment, the algorithms equipped with their own initial method and suggested parameter settings were simulated to compare the empirical success rate and the convergence speed. We test the experiment separately under noiseless real- and complexGaussian models. In the real case, the signal is generated as 𝒙~𝒩(𝟎, 𝑰1000), i.i.d. sampling vector is 𝒂𝑖~𝒩(𝟎, 𝑰1000), and 𝒙~𝒞𝒩(𝟎, 𝑰1000), 𝒂𝑖~𝒞𝒩(𝟎, 𝑰1000) under the complex case. Fig. 2 depicts the empirical success rate of all algorithms. For the real case, when m/n = 1.7, the success rate of the SRPR exceeds 90%, and guarantees the perfect recovery when m = 1.9n, as shown in Fig. 2.

Fig. 2. Empirical success rate with n = 1000. Noiseless real-valued Gaussian model with m/n varying by 0.1 from 1 to 5.​​​​​​​

Fig. 3 uses 𝑚 = 2𝑛 − 1 and 𝑚 = 4𝑛 − 4 to compare the convergence speed of each algorithm in the real and complex cases, respectively. It is worth mentioning that TWF was not added in Fig. 3 (a) because it did not perform well. SRPR can solve the phase retrieval problems with fewer iterations compared with other competitive algorithms.

Fig. 3. Relative error versus iterations for 𝑛𝑛 = 1000. (top) Noiseless real-valued Gaussian model with 𝑚/𝑛 = 2. (bottom) Noiseless complex-valued Gaussian model with 𝑚/𝑛 = 4.​​​​​​​

The last experiment was to test the robustness of SRPR against additive noise destruction. We set up a real noisy Gaussian model 𝜓𝑖 = |⟨𝒂𝑖, 𝒙⟩| + 𝜂𝑖 with 𝜂𝑖~𝒩(𝟎, 𝜎2𝑰), where 𝜎𝜎2 is set to satisfy the signal-to-noise ratio (SNR)

\(S N R:=10 \log _{10} \sum_{i=1}^{m}\left|\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right|^{2} / m \sigma^{2}\)       (21)

which is varied from 10 dB to 50 dB. And the noisy data was generated as \(\psi_{i}=\sqrt{y_{i}}\). Fig. 4 show the normalized mean-square error

\(\mathrm{NMSE}:=\operatorname{dist}^{2}\left(\mathbf{z}_{T}, \boldsymbol{x}\right) /\|x\|^{2}\)        (22)

as a function of SNR for different 𝑚/𝑛 values. It can be seen from Fig. 4 that the values of NMSE with different 𝑚/𝑛 are inversely proportional to SNR. From Fig. 4, we can see that when the ratio of 𝑚𝑚/𝑛𝑛 is 3 to 6, the curves almost overlap, which proves the stability of SRPR under the influence of additive noise.

Fig. 4. The normalized mean-square error as a function of SNR for different 𝑚𝑚/𝑛𝑛 values.​​​​​​​

4. Conclusion

This paper proposed a stochastic reweighted gradient algorithm to solve large-scale phase retrieval problems. The algorithm is divided into two stages. First, a new stochastic variance reduction method is used to solve the initialization stage's valuation problem. Then by a new stochastic reweighted amplitude-based iterations method, we refined the initial estimated value. The simulation test proves that, compared with TAF, TWF, STAF, RAF, the recovery success rate and convergence speed of SRPR are improved in both real and complex cases. Moreover, approaching the limit of information theory, SRPR can still perfectly recover the signal, about 1.9𝑛 under real Gaussian models.

References

  1. J. C. Dainty and J. R. Fienup, "Phase Retrieval and Image Reconstruction for Astronomy," Image Recover. theory Appl., pp. 231-275, 1987.
  2. S. C. Mayo et al., "X-ray phase-contrast microscopy and microtomography," Opt. Express, vol. 11, no. 19, pp. 2289-2302, 2003. https://doi.org/10.1364/OE.11.002289
  3. R. P. Millane, "Phase retrieval in crystallography and optics," J. Opt. Soc. Am. A, vol. 7, no. 3, p. 394-411, 1990. https://doi.org/10.1364/josaa.7.000394
  4. J. Miao, P. Charalambous, J. Kirz, and D. Sayre, "Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens," Nature, vol. 400, no. 6742, pp. 342-344, 1999. https://doi.org/10.1038/22498
  5. C. Jaramillo, R. G. Valenti, L. Guo, and J. Xiao, "Design and analysis of a single-Camera omnistereo sensor for quadrotor Micro Aerial Vehicles (MAVs)," Sensors (Switzerland), vol. 16, no. 2, 2016.
  6. R. Balan, P. Casazza, and D. Edidin, "On signal reconstruction without phase," Appl. Comput. Harmon. Anal., vol. 20, no. 3, pp. 345-356, 2006. https://doi.org/10.1016/j.acha.2005.07.001
  7. A. S. Bandeira, J. Cahill, D. G. Mixon, and A. A. Nelson, "Saving phase: Injectivity and stability for phase retrieval," Appl. Comput. Harmon. Anal., vol. 37, no. 1, pp. 106-125, 2014. https://doi.org/10.1016/j.acha.2013.10.002
  8. E. J. Candes, X. Li, and M. Soltanolkotabi, "Phase retrieval via Wirtinger flow: Theory and algorithms," IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1985-2007, 2015. https://doi.org/10.1109/TIT.2015.2399924
  9. Y. C. Eldar and S. Mendelson, "Phase retrieval: Stability and recovery guarantees," Appl. Comput. Harmon. Anal., vol. 36, no. 3, pp. 473-494, 2014. https://doi.org/10.1016/j.acha.2013.08.003
  10. Y. Chen and E. J. Candes, "Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems," Commun. Pure Appl. Math., vol. 70, no. 5, pp. 822-883, 2017. https://doi.org/10.1002/cpa.21638
  11. G. Wang, G. B. Giannakis, and Y. C. Eldar, "Solving systems of random quadratic equations via truncated amplitude flow," IEEE Trans. Inf. Theory, vol. 64, no. 2, pp. 773-794, 2018. https://doi.org/10.1109/tit.2017.2756858
  12. Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, "Phase Retrieval with Application to Optical Imaging: A contemporary overview," IEEE Signal Process. Mag., vol. 32, no. 3, pp. 87-109, 2015. https://doi.org/10.1109/MSP.2014.2352673
  13. Y. Shechtman, A. Beck, and Y. C. Eldar, "GESPAR: Efficient phase retrieval of sparse signals," IEEE Trans. Signal Process., vol. 62, no. 4, pp. 928-938, 2014. https://doi.org/10.1109/TSP.2013.2297687
  14. Z. Xiao, Y. Zhang, K. Zhang, D. Zhao, and G. Gui, "GARLM: Greedy autocorrelation retrieval Levenberg-Marquardt algorithm for improving sparse phase retrieval," Appl. Sci., vol. 8, no. 10, 2018.
  15. Y. C. Eldar, P. Sidorenko, D. G. Mixon, S. Barel, and O. Cohen, "Sparse phase retrieval from shorttime fourier measurements," IEEE Signal Process. Lett., vol. 22, no. 5, pp. 638-642, 2015. https://doi.org/10.1109/LSP.2014.2364225
  16. K. Jaganathan, S. Oymak, and B. Hassibi, "Sparse phase retrieval: Uniqueness guarantees and recovery algorithms," IEEE Trans. Signal Process., vol. 65, no. 9, pp. 2402-2410, 2017. https://doi.org/10.1109/TSP.2017.2656844
  17. K. Jaganathan, Y. C. Eldar, and B. Hassibi, "STFT Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms," IEEE J. Sel. Top. Signal Process., vol. 10, no. 4, pp. 770-781, Jun. 2016. https://doi.org/10.1109/JSTSP.2016.2549507
  18. T. Bendory, Y. C. Eldar and N. Boumal, "Non-Convex Phase Retrieval From STFT Measurements," IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 467-484, Jan. 2018. https://doi.org/10.1109/tit.2017.2745623
  19. E. J. Candes, X. Li, and M. Soltanolkotabi, "Phase retrieval from coded diffraction patterns," Appl. Comput. Harmon. Anal., vol. 39, no. 2, pp. 277-299, 2015. https://doi.org/10.1016/j.acha.2014.09.004
  20. E. J. Candes, X. Li, and M. Soltanolkotabi, "Phase retrieval via wirtinger flow: Theory and algorithms," IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1985-2007, 2015. https://doi.org/10.1109/TIT.2015.2399924
  21. P. Netrapalli, P. Jain, and S. Sanghavi, "Phase retrieval using alternating minimization," IEEE Trans. Signal Process., vol. 63, no. 18, pp. 4814-4826, 2015. https://doi.org/10.1109/TSP.2015.2448516
  22. E. J. Candes, T. Strohmer, and V. Voroninski, "PhaseLift: Exact and stable signal recovery from magnitude measurements via convex programming," Commun. Pure Appl. Math., vol. 66, no. 8, pp. 1241-1274, 2013. https://doi.org/10.1002/cpa.21432
  23. G. Wang, G. B. Giannakis, and J. Chen, "Scalable Solvers of Random Quadratic Equations via Stochastic Truncated Amplitude Flow," IEEE Trans. Signal Process., vol. 65, no. 8, pp. 1961-1974, 2017. https://doi.org/10.1109/TSP.2017.2652392
  24. G. Wang, G. B. Giannakis, Y. Saad, and J. Chen, "Phase Retrieval via Reweighted Amplitude Flow," IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2818-2833, 2018. https://doi.org/10.1109/tsp.2018.2818077
  25. L. Zhang, G. Wang, G. B. Giannakis, and J. Chen, "Compressive Phase Retrieval via Reweighted Amplitude Flow," IEEE Trans. Signal Process., vol. 66, no. 19, pp. 5029-5040, 2018. https://doi.org/10.1109/tsp.2018.2862395
  26. I. Waldspurger, A. D'Aspremont, and S. Mallat, "Phase recovery, MaxCut and complex semidefinite programming," Math. Program., vol. 149, no. 1-2, pp. 47-81, 2015. https://doi.org/10.1007/s10107-013-0738-9
  27. K. Huang, Y. C. Eldar, and N. D. Sidiropoulos, "Phase Retrieval from 1D Fourier Measurements: Convexity, Uniqueness, and Algorithms," IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6105-6117, 2016. https://doi.org/10.1109/TSP.2016.2601291
  28. T. Goldstein and C. Studer, "PhaseMax: Convex Phase Retrieval via Basis Pursuit," IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2675-2689, 2018. https://doi.org/10.1109/TIT.2018.2800768
  29. G. Wang, L. Zhang, G. B. Giannakis, M. Akcakaya, and J. Chen, "Sparse phase retrieval via truncated amplitude flow," IEEE Trans. Signal Process., vol. 66, no. 2, pp. 479-491, 2018. https://doi.org/10.1109/TSP.2017.2771733
  30. R. W. Gerchberg and W. O. Saxton, "A practical algorithm for the determination of phase from image and diffraction plane pictures," Optik (Stuttg), vol. 35, no. 2, pp. 237-246, 1972.
  31. J. R. Fienup, "Reconstruction of an object from the modulus of its Fourier transform," Opt. Lett., vol. 3, no. 1, pp. 27-29, 1978. https://doi.org/10.1364/OL.3.000027
  32. H. Zhang, Y. Zhou, Y. Liang, and Y. Chi, "Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations," to be published.
  33. J. Sun, Q. Qu, and J. Wright, "A Geometric Analysis of Phase Retrieval," Found. Comput. Math., vol. 18, no. 5, pp. 1131-1198, 2018. https://doi.org/10.1007/s10208-017-9365-9
  34. T. Qiu and D. P. Palomar, "Undersampled Sparse Phase Retrieval via Majorization-Minimization," IEEE Trans. Signal Process., vol. 65, no. 22, pp. 5957-5969, 2017. https://doi.org/10.1109/TSP.2017.2745459
  35. R. Kolte and A. Ozgur, "Phase Retrieval via Incremental Truncated Wirtinger Flow," to be published.
  36. K. Wei, "Solving systems of phaseless equations via Kaczmarz methods: A proof of concept study," Inverse Probl., vol. 31, no. 12, 2015.
  37. R. Kolte and A. Ozgur, "Phase Retrieval via Incremental Truncated Wirtinger Flow," to be published.
  38. O. Shamir, "Fast stochastic algorithms for SVD and PCA: Convergence properties and convexity," in Proc. of 33rd International Conference on Machine Learning, ICML 2016, pp. 248-256, 2016.
  39. R. Johnson and T. Zhang, "Accelerating stochastic gradient descent using predictive variance reduction," Advances in Neural Information Processing Systems, vol. 26, pp. 1-9, 2013.
  40. E. Min, J. Long and J. Cui, "Analysis of the Variance Reduction in SVRG and a New Acceleration Method," IEEE Access, vol. 6, pp. 16165-16175, 2018. https://doi.org/10.1109/access.2018.2814212
  41. G. Wang, G. B. Giannakis and J. Chen, "Solving large-scale systems of random quadratic equations via stochastic truncated amplitude flow," in Proc. of 2017 25th European Signal Processing Conference (EUSIPCO), Kos, pp. 1420-1424, 2017.