Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow

Xiao, Zhuolei;Zhang, Yerong;Yang, Jie;

doi:10.3837/tiis.2020.11.006

KSII Transactions on Internet and Information Systems (TIIS)

Volume 14 Issue 11
/
Pages.4355-4371
/
2020
/
1976-7277(pISSN)
/
1976-7277(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow

Xiao, Zhuolei (College of Electronic and Optical Engineering & College of Microelectronics, Nanjing University of Posts and Telecommunications) ;
Zhang, Yerong (College of Electronic and Optical Engineering & College of Microelectronics, Nanjing University of Posts and Telecommunications) ;
Yang, Jie (College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications)

Received : 2020.06.19
Accepted : 2020.11.15
Published : 2020.11.30

https://doi.org/10.3837/tiis.2020.11.006 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

Phase retrieval, recovering a signal from phaseless measurements, is generally considered to be an NP-hard problem. This paper adopts an amplitude-based nonconvex optimization cost function to develop a new stochastic gradient algorithm, named stochastic reweighted phase retrieval (SRPR). SRPR is a stochastic gradient iteration algorithm, which runs in two stages: First, we use a truncated sample stochastic variance reduction algorithm to initialize the objective function. The second stage is the gradient refinement stage, which uses continuous updating of the amplitude-based stochastic weighted gradient algorithm to improve the initial estimate. Because of the stochastic method, each iteration of the two stages of SRPR involves only one equation. Therefore, SRPR is simple, scalable, and fast. Compared with the state-of-the-art phase retrieval algorithm, simulation results show that SRPR has a faster convergence speed and fewer magnitude-only measurements required to reconstruct the signal, under the real- or complex- cases.

Keywords

1. Introduction

Phase retrieval, recovering a signal from given intensity or magnitude only measurements, is reflected in various fields of science and engineering, such as astronomy [1], electron microscopy [2], optics [3], and X-ray crystallography [4]. Generally, in optical imaging, due to optical detectors' physical limitations [5], optical detection equipment such as photosensitive film, charge-coupled device (CCD) camera, or human eye can only record the intensity but not the phase of a light wave. However, for an image, the structural information carried by its phase is far greater than that taken by its intensity. Hence it is challenging to reconstruct the signal from the intensity- or magnitude-only measurements are of great significance.

Mathematically speaking, Phase Retrieval is the problem of recovering 𝑁𝑁-dimensional real or complex signal 𝒙𝒙 from 𝑀𝑀 linear transformation (or Fourier transform) measurements

\(y_{i}=\left|\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right|^{2}, \quad 1 \leq i \leq m\) (1)

where the given measurements vector 𝒚 ∶= [𝑦₁ ⋯ 𝑦_𝑚]^𝑇 and the sensing/feature vector 𝒂_𝒊 ∈ ℝ^𝑛 or ℂ^𝑛 are known, and the signal vector 𝒙 ∈ ℝ^𝑛 or ℂ^𝑛 is the wanted unknown vector. It is worth noting that 𝑦_𝑖 denotes the intensity or the squared modulus, whereas \(\psi_{i}:=\sqrt{y_{i}}\) denotes the magnitude vector or the amplitude vector. Phrased differently, we want to uniquely recover 𝒙 (up to the global sign) from the system of quadratic equations (1), or retrieve the sign or phase of \(\left\{\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right\}_{i=1}^{m}\) under the setting of real or complex values. Due to the loss of sign or phase information, to recover 𝒙 exactly, the number of measurements 𝑚𝑚 and the dimension 𝑛𝑛 of the wanted original signal should satisfy m ≥ 2n − 1 or 𝑚 ≥ 4𝑛 − 4 when 𝒂_𝑖, 𝒙 ∈ ℝ^𝑛 or ℂ^𝑛 , respectively, while 𝑚 = 2𝑛 − 1 is also necessary [6], [7]. Nevertheless, the quadratic equations in (1) is still an NP-hard problem, and its solution is always challenging.

1.1 Related Works

The measurements are generally corrupted with noise. When these additional noises follow the Gaussian distribution, we consider using the least-squares method (which is consistent with the maximum likelihood estimate of the original signal), the problem of solving systems of quadratic equations in (1) can be rewritten as the following three different empirical loss functions: the intensity-based empirical loss [8], [9]

\(\operatorname{minimize}_{z \in \mathbb{R}^{n} / \mathbb{C}^{n}} \frac{1}{2 m} \sum_{i=1}^{m}\left(y_{i}-\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|^{2}\right)^{2},\) (2)

and its related Poisson likelihood [10]

\(\operatorname{minimize}_{\mathrm{z} \in \mathbb{R}^{n} / \mathbb{C}^{n}} \frac{1}{2 m} \sum_{i=1}^{m}\left|\left\langle\boldsymbol{a}_{\boldsymbol{i}}, \mathbf{z}\right\rangle\right|^{2}-y_{i} \log \left(\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|^{2}\right),\) (3)

and the amplitude-based empirical loss [11]

\(\operatorname{minimize}_{\mathbf{z} \in \mathbb{R}^{n} / \mathrm{C}^{n}} \frac{1}{2 m} \sum_{i=1}^{m}\left(\psi_{i}-\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|\right)^{2}\) (4)

Obviously, due to the quadratic terms in (2) and (3), or the modulus in (4), these empirical loss functions are non-convex, so there will be many stationary points when minimizing the objective function, so this problem is NP-hard. In the 1-dimensional (1D) phase retrieval, there is no uniqueness based on spectral factorization, i.e., even with trivial ambiguities (global phase shift, conjugate inversion, and spatial shift) [12], there are multiple solutions with the same magnitude. There are several approaches to overcome this ill-posed problem. We can add constraints to the signal to be sought, such as sparsity or non-negativity [13]–[16]; Another solution is to introduce redundancy for the measurements, including shorttime Fourier transform [17], [18], coded diffraction patterns using random masks and structured illuminations [19], or random measurement assumption [10], [11], [20]–[25]. In this paper, we will adopt this assumption {𝜓_𝑖} with random Gaussian {𝒂_𝑖} designs, which is independently and identically distributed (i.i.d.). It’s 𝒂_𝑖~𝒩(𝟎, 𝑰𝑛) for the real-valued Gaussian setting, and 𝒂_𝑖~𝒞𝒩(𝟎, 𝑰𝑛) ∶= 𝒩(𝟎, 𝑰𝑛⁄2) + 𝑗𝒩(𝟎, 𝑰𝑛⁄2) for the complexvalued one.

There are two ways to solve such problems: convex and nonconvex ones. Based on the so-called matrix-lifting technique, the convex ones express solving semidefinite programming (SDP), such as PhaseLift [22], PhaseCut [26], and CoRk [27]. Other convex approaches rely on convex relaxation to reformulate the problem as sparse recovery, solving a basis pursuit problem in the dual domain, such as PhaseMax [28], SPARTA [29]. On the other hand, the nonconvex approaches include alternating projection (e.g. Gerchberg-Saxton, Fineup method) [30], [31], AltMinPhase [21], Wirtinger Flow and its variants (WF/TWF/RWF) [10], [20], [32], trust-region [33], proximal linear algorithms [33], [34], (stochastic) truncated amplitude flow variants (TAF/RAF/STAF) [11], [23]–[25]. These non-convex methods run directly on vector optimized variables, so they have significant computational advantages over convex methods based on matrix lifting techniques.

However, although non-convex optimization can obtain the best statistical guarantee in no-noise or noisy environment, even if random Gaussian design is introduced, non-convex optimization will still have several shortcomings: First, there will be a lot of stationary points in the non-convex optimization process, which will bring a lot of trouble to our calculation process, so that the optimization often falls into the local optimal, Our task is to develop an algorithm that approximates the global optimal based on existing algorithms; More troublesome is the existence of saddle points, which often makes optimization dead, how to get rid of saddle points is also one of the primary purposes of this paper; It is well known that there is an information theory limit 𝑚 = 2𝑛 − 1 or 4𝑛 − 4, hence it will be very challenging to develop a practical algorithm that can achieve perfect recovery and optimal statistical accuracy recovery when the number of measurements is close to the limit of information theory; Finally, the current application and data scale are getting larger and larger, the existing incremental gradient method [35] and stochastic gradient method [23], [36] can no longer meet the requirements of accurate recovery and convergence speed, it is also challenging to increase the convergence speed and reduce the computational complexity of the algorithm in largescale applications.

1.2 Our Contribution

Inspired by TAF, this paper uses a nonconvex optimization formulation based on amplitude to develop a new stochastic gradient algorithm, referred to stochastic reweighted phase retrieval (SRPR). SRPR is an iterative algorithm based on and extends TAF [11] to be applied to large-scale phase retrieval applications such as imaging. There are two stages of the proposed algorithm: The first stage is the initialization stage, which adopts a sample stochastic variance reduction algorithm to initialize the objective function. Like TAF, we use the orthogonal promotion to determine the truncated threshold. In the gradient refinement stage, we use the weighted gradient method to update the estimated initial value, and the stochastic gradient method is used to accelerate the process.

Even in a high-dimensional environment, SRPR can perfectly reconstruct any signal given a minimum number of equations (𝑚 = 2𝑛 − 1) without any additional assumptions. This paper aims to use a stochastic variance reduction gradient (SVRG) variant algorithm to estimate the initialization stage's initial value. Compared with the classic SVRG algorithm [23] and power method using truncated procedure [11], the new algorithm retains a gradient snapshot at each iteration to replace the full gradient. It can significantly reduce the calculation time. In the gradient refinement phase, we use the stochastic gradient method to update the estimated initial value, and the truncated rule is used to eliminate the wrong estimate. By calculating these two stages, even under the condition of approaching the limit of information theory, SRPR can still accurately recover the signal. At the same time, in such relatively largescale dimensions as image applications, SRPR is superior to existing algorithms in terms of accurate recovery performance and convergence speed, such as (T) AF [11], ITWF [37], (T)WF [8], [10], RAF [24], [25] and STAF [23].

The rest of this article is organized as follows. Section 2 describes and analyzes the amplitude-based loss function, describes the two stages of the SRPR algorithm in detail. Furthermore, simulated tests comparing the SRPR algorithm with (T)WF [8], [10], TAF [11], RAF [24], [25], STAF [23] are presented in Section 3.

Notation: Lower-(upper-) case boldface letters denote vectors (matrices). The symbol 𝒯 (ℋ) denotes transposition (conjugate transposition). The symbol ⌈∙⌉ stands for the largest integer no more significant than the given number. If 𝒙 ∈ ℝ^𝒏 is the true solution of (1), then −𝒙 is also. We define Euclidean distance of any estimate 𝒛 ∈ ℝ^𝒏 to the true solution set {±𝒙} of (1) as dist(𝒛, 𝒙 ) ∶= min‖𝒛 ± 𝒙‖.

2. Stochastic Reweighted Phase Retrieval

SRPR is based on TAF [11], and the stochastic gradient descent method is used in the initialization and gradient refinement stages. Therefore, we will detail SRPR from two stages.

For concreteness, we mainly discuss real-valued Gaussian models with a real signal 𝒙 ∈ ℝ^𝒏 and i.i.d. Gaussian random designs 𝒂_𝑖~𝒩(𝟎, 𝑰), 1 ≤ 𝑖 ≤ 𝑚. Nevertheless, SRPR still applies to complex-valued Gaussian models with a complex vector 𝒙 ∈ ℂ^𝒏 and i.i.d. design vectors 𝒂_𝑖~𝒞𝒩(𝟎, 𝑰𝑛) ∶= 𝒩(𝟎, 𝑰𝑛⁄2) + 𝑗𝒩(𝟎, 𝑰𝑛⁄2). In this paper, using the least-squares criterion, we can convert solving the phaseless quadratic equation in (1) to minimize the amplitude-based empirical loss function

\(\underset{\mathbf{z} \in \mathbb{R}^{n}}{\operatorname{minimize}} \ell(\mathbf{z}):=\frac{1}{2 m} \sum_{i=1}^{m}\left(\psi_{i}-\left|\left\langle\boldsymbol{a}_{i}, \mathbf{z}\right\rangle\right|\right)^{2}\) (5)

2.1 Sample Stochastic Variance Reducing Initialization

In this section, we will elaborate on the initialization stage. Since (5) is non-convex and non-smooth, it makes sense to obtain a good initialization. We developed a new initialization method based on the orthogonality-promoting initialization (OPI) method in TAF. Specifically, since any vectors are approximately orthogonal in pairs in high-dimensional space [11], we adopt

\(\cos ^{2} \theta_{i}:=\frac{\left|\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right|^{2}}{\left\|\boldsymbol{a}_{i}\right\|^{2}\|\boldsymbol{x}\|^{2}}=\frac{\psi_{i}^{2}}{\left\|\boldsymbol{a}_{i}\right\|^{2}\|\boldsymbol{x}\|^{2}}, \quad 1 \leq i \leq m\) (6)

to represent the geometric relationship between 𝒂_𝑖 and 𝒙, where 𝜃_𝑖 represents the angle between 𝒂_𝑖 and 𝒙, The larger 𝜃_𝑖 , the smaller cos² 𝜃_𝑖, indicating the more orthogonal between 𝒂_𝑖 and 𝒙. Based on this feature, we can approximate 𝒙 by finding the most orthogonal vector 𝒛 to {𝒂_𝑖}_𝑖∈ℐ0, where the index set ℐ₀ includes indexes of 𝒂_𝑖 that make cos²𝜃_𝑖 the smallest.

Without loss of generality, assuming ‖𝒙‖ = 1, we find the estimated vector \(\tilde{\mathbf{z}}_{0}\) through

\(\tilde{\mathbf{z}}_{0}:=\arg \operatorname{minimize}_{\|\mathbf{z}\|=1}{\mathbf{z}}^{\mathcal{T}} \boldsymbol{Y} \mathbf{z}:=\mathbf{z}^{\mathcal{T}}\left(\frac{1}{\left|\mathcal{J}_{0}\right|} \sum_{i \in \mathcal{J}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\right) \boldsymbol{z}\) (7)

where \(Y:=\frac{1}{\left|{J}_{0}\right|} \sum_{i \in \mathcal{J}_{0}} \frac{\boldsymbol{a}_{\boldsymbol{i}} \boldsymbol{a}_{i}^{T}}{\left\|\boldsymbol{a}_{\boldsymbol{i}}\right\|^{2}},\left|\mathcal{J}_{0}\right|\) is on the order of n. On the other hand, when ‖𝒙‖ ≠ 1, we can scale \(\tilde{\mathbf{z}}_{0}\) by the norm estimation of 𝒙 to obtain the current estimate \(\boldsymbol{z}_{0}:=\sqrt{\frac{1}{m} \sum_{i=1}^{m} y_{i}} \tilde{\mathbf{z}}_{0}\)[11].

Solving (7) is equivalent to solving the eigenvector corresponding to the smallest eigenvalue of Y , which requires eigendecomposition or matrix inversion, and the computational complexity is 𝒪(𝑛³). Define \(\overline{\mathcal{J}}_{0}\) as the complement of ℐ₀ in [m]. In order to reduce the computational complexity, according to SLLN, for the i.i.d. standard Gaussian random matrix \(\left\{\boldsymbol{a}_{i} \sim \mathcal{N}\left(0, I_{n}\right)\right\}_{i=1}^{m}\), there are

\(\frac{1}{m} \sum_{i=1}^{m} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}} \approx \mathbb{E}\left[\frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\right]=\frac{1}{n} I_{n}\) (8)

where 𝔼[∙] expresses expectations. Thus

\(\sum_{i \in J_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}=\sum_{i \in[m]} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}-\sum_{i \in \bar{J}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}} \cong \frac{m}{\mathrm{n}} I_{n}-\sum_{i \in \bar{J}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\) (9)

Defining \(\boldsymbol{B}=\left[\boldsymbol{a}_{i} /\left\|\boldsymbol{a}_{i}\right\|\right]_{i \in \bar{J}_{0}}\), the eigenvector corresponding to the minimum eigenvalue of solving 𝒀 in (7) can be reduced to the eigenvector corresponding to the largest eigenvalue of solving \(\overline{\boldsymbol{Y}}=\boldsymbol{B} \boldsymbol{B}^{\mathcal{T}}\), i.e.

\(\tilde{\mathbf{z}}_{0}:=\arg \max _{\|\mathbf{z}\|=1} \mathbf{z}^{\mathcal{T}} \overline{\boldsymbol{Y}} \boldsymbol{z}:=\mathbf{z}^{\mathcal{T}}\left(\frac{1}{\left|\overline{\mathcal{J}}_{0}\right|} \sum_{\mathbf{i} \in \overline{\bar{J}}_{0}} \frac{\boldsymbol{a}_{i} \boldsymbol{a}_{i}^{\mathcal{T}}}{\left\|\boldsymbol{a}_{i}\right\|^{2}}\right) \mathbf{z}\) (10)

which is a typical principal component analysis (PCA) problem. On the one hand, WF/TWF/TAF [8], [10], [11] uses the power method to solve this problem. The computational complexity required by the power method to obtain 𝜖𝜖-accurate solutions is \(O\left(\frac{1}{\delta} \mathrm{n}\left|\overline{\mathcal{J}}_{0}\right| \log (1\epsilon))\right.\), where the eigengap δ > 0 is defined as the difference between the largest and secondlargest eigenvalues [40]. However, the power method is not suitable for relatively large dimensional data such as image applications, especially applications with small eigengaps. On the other hand, some stochastic algorithms are used to solve (10). STAF uses variancereducing orthogonality-promoting initialization (VR-OPI) algorithm [11], which is a stochastic variance reduction algorithm (SVRG) [38]. It consists of a series of epochs, which contains a large number of iterative updates. VR-OPI scans the entire dataset every time the epoch starts and calculates a snapshot of the full gradient. And for each iteration in one epoch, the algorithm randomly selects an instance from \(\left|\bar{\jmath}_{0}\right|\) to calculate the gradient, and combines the gradient of the outer loop epochs and the inner loop iterations to reduce the variance of the random gradient and achieve effective convergence [39]. However, if there are too many instances of the data set and the data dimension is very large, since the calculation of the full gradient in each epoch requires the scan the entire dataset, the snapshot of the full gradient results in a large number of gradient calculations, which is very time-consuming. If fullgradient snapshots can be avoided in each epoch, VR-OPI [11] can be significantly accelerated, which will bring better convergence performance.

This paper uses a new acceleration algorithm sampleVR [40] to solve the orthogonal acceleration initialization problem in (10), we named it sample stochastic variance reducing orthogonality-promoting initialization (SSVR-OPI), summarized in Algorithm 1. Specifically, SSVR-OPI iteratively updates two sets of loops. Each execution of the inner loop is called an iteration, and each execution of the outer loop is called an epoch. SSVR-OPI has a total of S epochs, and each epoch contains T iterations. The core of the algorithm is to estimate the snapshot of the full gradient \(\frac{1}{\left|\bar{\jmath}_{0}\right|} \sum_{j=1}^{\left|\bar{\jmath}_{0}\right|} \boldsymbol{a}_{i_{j}} \boldsymbol{a}_{i_{j}}^{\mathcal{T}} \widetilde{\boldsymbol{\omega}}_{s+1}\) through the stochastic gradient \(g=\frac{1}{k} \sum_{j=1}^{k} \boldsymbol{a}_{i_{j}} \boldsymbol{a}_{i_{j}}^{\mathcal{T}} \widetilde{\boldsymbol{\omega}}_{S+1}\) in step 7, which replaces the snapshot of the full gradient in each epoch in the VR-OPI [11] algorithm, thereby reducing the computational complexity. The stochastic gradient g is obtained by calculating k \(\mathrm{k}\left(\mathrm{k}<<\left|\bar{\jmath}_{0}\right|\right)\) sample instances from the inner loop, representing the average value of the stochastic gradient of k instances, and uses it as the progressive direction of the next outer loop used in Step 4.

We can rewrite (10) as

\(\mathrm{F}(\boldsymbol{\omega})=\omega^{\mathcal{T}} \overline{\boldsymbol{Y}} \omega=\frac{1}{\left|\overline{\mathcal{J}}_{0}\right|} \sum_{\mathrm{i} \in \overline{\mathcal{J}}_{0}} f_{i}(\boldsymbol{\omega})\) (11)

where \(f_{i}(\boldsymbol{\omega})=\boldsymbol{\omega}^{\mathcal{T}} \boldsymbol{a}_{i_{t}} \boldsymbol{a}_{i_{t}}^{\mathcal{T}} \boldsymbol{\omega}\) is L-Liptchiz continuous, i.e., for all 𝝎𝑖 and 𝝎𝑗 , there is \(\left\|f_{i}\left(\boldsymbol{\omega}_{i}\right)-f_{i}\left(\boldsymbol{\omega}_{j}\right)\right\| \leq L\left\|\boldsymbol{\omega}_{i}-\boldsymbol{\omega}_{j}\right\|\). The following Proposition 1 [39, Theorem2] shows that SSVR-OPI causes the optimization goal to converge at a linear rate, up to a constant neighborhood of the optimal, proven in [40]. Since SSVR-OPI does not calculate the full gradient in each epoch, but only calculates the average value of the stochastic gradient of k instances, it is significantly better than other stochastic algorithms in terms of computational complexity. These advantages can be found in the simulation test in Section III.

Proposition 1 [40]: Let k represents the number of sampled instances for the sth epoch. With any μ > 0, when \(\delta=\frac{1+4 L \mu\left|\bar{\jmath}_{0}\right| \eta^{2}}{\mu\left|\bar{\jmath}_{0}\right| \eta\left(1-2 \eta L-\frac{\eta}{\mu}\right)}<1\) holds, F(𝝎) converges as

\(\mathbb{E}\left[F\left(\widetilde{\boldsymbol{\omega}}_{s+1}\right)-F\left(\widetilde{\boldsymbol{\omega}}_{*}\right)\right] \leq \delta \mathbb{E}\left[F\left(\widetilde{\boldsymbol{\omega}}_{s}\right)-F\left(\widetilde{\boldsymbol{\omega}}_{*}\right)\right]+\mathcal{O}\left(\left(\frac{\left(\overline{\mathcal{J}}_{0} \mid-k\right.}{\left|\bar{\jmath}_{0}\right|}\right)^{2}\right)\) (12)

Although the full gradient estimate g is an unbiased estimate, there will still be a variance from the proper full gradient, especially when the estimated value is close to the optimum. We can improve this by increasing the number k of sampling instances. An increase in k means that there will be an additional computational cost when estimating the full gradient g. In contrast, a decrease in k means that the variance between the full gradient and the estimate increases. Here we need to effectively balance the value of k between computational consumption and variance.

Let

\(g=\frac{1}{k} \sum_{j=1}^{k}\left\|\nabla f_{i}\left(\widetilde{\boldsymbol{\omega}}_{s+1}\right)\right\|=\frac{1}{k} \sum_{j=1}^{k}\left\|\boldsymbol{a}_{i_{j}} \boldsymbol{a}_{i_{j}}^{T} \widetilde{\boldsymbol{\omega}}_{s+1}\right\|,\) (13)

and its expectation is \(\mathbb{E}(g)=\frac{1}{\left|\bar{j}_{0}\right|} \sum_{j=1}^{\left|\bar{\jmath}_{0}\right|}\left\|\nabla f_{i}\left(\widetilde{\boldsymbol{\omega}}_{s+1}\right)\right\|\) . 𝑓_𝑖(𝝎) is L-Lipchitz continuous, that is, ‖∇𝑓_𝑖(𝝎)‖ ≤ 𝐿. According to Hoeffding’s Inequality, we have

\(\alpha=P(|g-\mathbb{E}(g)| \geq t) \leq 2 e^{\frac{-2 k^{2} t^{2}}{\sum_{i=1}^{k}(L-0)^{2}}}=2 \mathrm{e}^{\frac{-2 k^{2} t^{2}}{k L^{2}}}=2 \mathrm{e}^{\frac{-2 k t^{2}}{L^{2}}},\) (14)

𝛼 denotes the significance level, 1 − α is the confidence interval of [−𝑡,𝑡], such k should satisfy

\(k \leq-L^{2} \frac{\log \frac{\alpha}{2}}{2 t^{2}}=-\frac{\log \frac{\alpha}{2}}{2 t^{2} / L^{2}}=-\frac{\log \frac{\alpha}{2}}{\epsilon}\) (15)

where 𝜖 = 2𝑡² /𝐿² is a positive real number. In this way, the value of k will be determined by ϵ. With the increase of iterations, the decrease of ϵ causes a continuous increase of k. The variance of the estimated value continues to decrease, to achieve the purpose of significant convergence.

q.PNG 이미지 qq.PNG 이미지

2.2 Stochastic Reweighted Gradient Stage

The amplitude-based empirical loss function

\(\ell(\mathbf{z})=\frac{1}{2 m} \sum_{i=1}^{m} \ell_{i}(\mathbf{z})=\frac{1}{2 \mathrm{~m}} \sum_{i=1}^{m}\left(\psi_{i}-\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|\right)^{2}\) (16)

in (5) is a non-convex non-smooth function, and its gradient is

\(\nabla \ell(\mathbf{z})=\frac{1}{m} \sum_{i \in m} \nabla \ell_{i}(\mathbf{z})=\frac{1}{m} \sum_{i \in m}\left(\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}-\psi_{i} \frac{\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}}{\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|}\right) \boldsymbol{a}_{i}\) (17)

Since \(\frac{a_{i}^{\mathcal{T}} \mathbf{z}}{\left|\boldsymbol{a}_{i}^{T} \mathbf{z}\right|}\) introduce a bias in the update direction, such as \(\frac{a_{i}^{\mathcal{T}} \mathbf{z}_{t}}{\left|\boldsymbol{a}_{i}^{T} \mathbf{z}_{t}\right|}\) ≠\(\frac{a_{i}^{T} x}{\left|\boldsymbol{a}_{i}^{T} x\right|}\) , which may cause the Wirtinger derivative to be too large, resulting in a "bad/misleading" search direction. These gradients may cause z to move in the opposite direction to x, especially when approaching the information theory limit 𝑚 = 2𝑛 − 1 or 𝑚 = 4𝑛 − 4.

There are many ways to solve this problem. TAF [11] introduces a truncated rule to detect and remove these "bad" gradient components to ensure the right search direction. However, this method may cause too many gradient components to be lost or retain defective gradient components, thereby affecting the search direction. STAF [41] goes a step further. It refines the initial estimation by the stochastic truncated gradient iteration. Each iteration evaluates the current generalized gradient and randomly selects from the gradient components or with a given probability. This truncated rule is very likely to reject those "bad" gradient component. RAF [24], [25] takes a different approach and adopts the (timely) adaptive weighted gradient method by adding different weights 𝜔_𝑖 to each gradient vector ∇𝑙_𝑖(𝒛) of the current point 𝒛_𝑡, to find a suitable search direction for the current iteration process, i.e.

\(\mathbf{z}_{t+1}=\mathbf{z}_{t}-\mu \nabla \ell(\mathbf{z})=\mathbf{z}_{t}-\frac{\mu}{m} \sum_{i \in m} \omega_{i} \nabla \ell_{i}(\mathbf{z})\) (18)

The weights 𝜔𝜔𝑖𝑖 corresponding to different gradient components ∇𝑙_𝑖(𝒛) are closely related to the truncated rule \(\mathcal{L}:=\left\{1 \leq i \leq m: \frac{\left|\boldsymbol{a}_{i}^{T} \mathbf{z}\right|}{\left|a_{i}^{T} x\right|} \geq \frac{1}{1+\gamma}\right\}\) in TAF. Studies in [11] have shown that the gradient component corresponding to a large \(\frac{\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|}{\left|\boldsymbol{a}_{\boldsymbol{i}}^{\mathcal{T}} \boldsymbol{x}\right|}\) will have a great probability of pointing in the direction of the true value x. If \(\frac{\left|\boldsymbol{a}_{i}^{\mathcal{T}} \mathbf{z}\right|}{\left|\boldsymbol{a}_{\boldsymbol{i}}^{\mathcal{T}} \boldsymbol{x}\right|}\) is regarded as the confidence parameter α of the corresponding gradient component, then according to the difference of the confidence parameter α, the corresponding gradient components can be distinguished, so as to determine the contribution of these gradient components to the overall search direction.

The method of TAF/STAF is to discard the gradient components corresponding to the relatively small α through the truncated rule. RAF adds weights to all gradient components. The gradient components with large confidence parameters are added with large weights, and small weights are added to those "bad" gradient components. Based on the truncated rule, the weight corresponding to the gradient component 𝜔_𝑖 is

\(\omega_{i}:=\frac{1}{1+\beta_{i} /\left(\left|\boldsymbol{a}_{\boldsymbol{i}}^{*} \mathbf{z}\right| /\left|\boldsymbol{a}_{\boldsymbol{i}}^{*} \boldsymbol{x}\right|\right)}, \quad 1 \leq i \leq m\) (19)

where {𝛽_𝑖}_{1≤𝑖≤𝑚} are pre-selected parameters. In this way, compared with TAF/STAF, adding the weight 𝜔_𝑖 not only strengthens the importance of good gradient components, but also retains the information of weak gradient components.

In large-scale applications, due to the high data dimension (more samples), the time cost of passes over the entire data by the classic gradient descent (GD) method is very high. Hence we adopt the stochastic gradient descent (SGD) method with fast convergence speed and low calculation complexity to refine the gradient. On the one hand, the number of measurements required for phase retrieval to achieve accurate recovery has not reached the information theory limit 𝑚 = 2𝑛 − 1. TWF needs a ϵ-accurate solution to obtain 𝑚 ≥ 4.5𝑛 noise-free measurement value, while TAF is about 3𝑛𝑛. In contrast, STAF, which also adopts a stochastic algorithm in the gradient stage, only needs about 2.3𝑛𝑛 measurement value.

On the other hand, as can be seen from [33], the phase retrieval problem is essentially a non-convex optimization problem. Therefore, there are many saddle points in its search area, and the gradient iteration will most likely fall into the saddle point. SGD method can automatically escape from the saddle point and the poor local best and converge globally to a local minimum with strong generalization.

Therefore, to better apply the RAF-based gradient optimization method to large-scale data, we introduce the gradient method into the weighted gradient iteration and refine the initial estimate 𝒛₀ through this stochastic reweighted gradient iteration method to accelerate the gradient iteration stage. Specifically, SRPR uses the following stochastic reweighted gradient iteration to update continuously z

\(\mathbf{z}_{t+1}=\mathbf{z}_{t}-\mu \omega_{i} \nabla \ell_{k_{t}}\left(\mathbf{z}_{t}\right)=\mathbf{z}_{t}-\mu \omega_{i}\left(\boldsymbol{a}_{k_{t}}^{\mathcal{T}} \mathbf{z}_{t}-\psi_{i} \frac{\boldsymbol{a}_{k_{t}}^{\mathcal{T}} \mathbf{z}_{t}}{\left|\boldsymbol{a}_{k_{t}}^{T} \mathbf{z}_{t}\right|}\right) \boldsymbol{a}_{k_{t}}\) (20)

where the step size 𝜇𝜇 is set to a constant. Each iteration of SRPR selects the current gradient component function ∇ℓ_𝑘𝑡 (𝒛_𝑡) by randomly sampling the index 𝑘_𝑡, where t is the current iteration number, and the sampling method of the index 𝑘𝑘𝑡𝑡 can be random sampling from {1,2, ⋯ 𝑚}, or sampling with a given probability. After obtaining the gradient component ∇ℓ𝑘_𝑡 (𝒛_𝑡), we use (19) to evaluate the additional weight 𝜔_𝑖. The proposed SRPR algorithm is expressed in Algorithm 2.

111.JPG 이미지

It is worth emphasizing that this stochastic reweighted gradient method is very effective. If the randomly selected gradient component is reliable, its additional weight will be higher, which will make a more outstanding contribution to the search direction and iterative process. Conversely, if it is an unreliable gradient component, its additional weight will be minimal, resulting in almost no refinement in the current iteration. Moreover, if the current iteration point zt satisfies\(\boldsymbol{a}_{\boldsymbol{i}}^{\mathcal{T}} \boldsymbol{Z}_{t}=0\), that is, there is a nonsmooth item, its weight is set to 𝜔_𝑖 = 0 to eliminate the nonsmooth item and prevent it from affecting the search direction and the iteration process. The specific experimental results will be presented in the experimental part of Section 3.

3. Simulation Results

This section presents the evaluation of the performance of the proposed algorithm SRPR relative to TWF [8], [10], TAF [11], RAF [24], [25], and STAF [23]. To be fair, the parameters in each algorithm have adopted their suggested values. We use the relative error∶= dist(𝒛, 𝒙) / ‖𝒙‖₂ as the performance metric of the algorithm, where dist(𝒛, 𝒙) ∶= min‖𝒛 ± 𝒙‖ is the Euclidean distance modular constant of the estimation 𝒛 ∈ ℝ^𝒏 to the true solution {±𝒙}. All simulation tests are performed 100 independent Monte Carlo trials. In each trial, the algorithm runs 100 initialization iterations, while the number of gradient refinement iterations are 1000. It is worth emphasizing that each iteration of stochastic algorithms such as STAF and SRPR is equivalent to m stochastic iterations over the entire data. A success is declared in a trial when the returned estimate attains a relative error less than 10−5.

The first experimental comparison is in the initialization stage, including the Power Method in TAF, the VR-OPI algorithm in STAF, and the SSVR-OPI algorithm proposed in this paper. We compare two aspects of exact recovery and convergence speed to observe these initialization algorithms' performance in large-scale applications. We use real- or complex-valued Gaussian models with a data dimension of n = 10000, and all tests are based on known theoretical limits of information, i.e., 𝑚 = 2𝑛 − 1 in the real case and 𝑚𝑚 = 4𝑛 − 4 in the complex case. The first comparison is the error evolution of the iterates used in the initialization algorithm; that is, the number of iterations used by each algorithm to obtain its exact solution. The error is expressed in logarithmic form, defined as \(\log _{10}(1-\left.\left\|\boldsymbol{B}^{\mathcal{T}} \boldsymbol{\omega}_{i}\right\|^{2} /\left\|\boldsymbol{B}^{T} \boldsymbol{u}_{0}\right\|^{2}\right)\), where 𝒖₀ is the exact leading eigenvector of \(\overline{\boldsymbol{Y}}=\boldsymbol{B} \boldsymbol{B}^{\mathcal{T}}\) in (10) accurately calculated by SVD.

Fig. 1 shows the error evolution capabilities of Power Method, VR-OPI, and SSVR-OPI. To be honest, due to the limit of the number of iterations, each iteration of SSVR-OPI only estimates the gradient's snapshot, which does not pass the entire data to estimate the entire gradient-like VR-OPI. Hence the exact recovery effect of SSVR-OPI is slightly weaker than VR-OPI but significantly higher than the power method. On the other hand, because SSVROPI does not require passing through all data for each iteration like VR-OPI, it requires less computational complexity or time consumption, about 0.813s per iteration whereas 1.129s in VR-OPI under the real case as shown in Table 1.

Fig. 1. Error evolution of the iterates using different initialization to solve the problem (11): (top) Noiseless real-valued Gaussian model with a data dimension of n = 10000 and 𝑚 = 2𝑛 − 1; (bottom) Noiseless complex-valued Gaussian model with a data dimension of n = 10000 and 𝑚 = 4𝑛 − 4.

Table 1. Comparison of time cost between VR-OPI and SSVR-OPI

In the second experiment, the algorithms equipped with their own initial method and suggested parameter settings were simulated to compare the empirical success rate and the convergence speed. We test the experiment separately under noiseless real- and complexGaussian models. In the real case, the signal is generated as 𝒙~𝒩(𝟎, 𝑰₁₀₀₀), i.i.d. sampling vector is 𝒂_𝑖~𝒩(𝟎, 𝑰₁₀₀₀), and 𝒙~𝒞𝒩(𝟎, 𝑰₁₀₀₀), 𝒂𝑖~𝒞𝒩(𝟎, 𝑰₁₀₀₀) under the complex case. Fig. 2 depicts the empirical success rate of all algorithms. For the real case, when m/n = 1.7, the success rate of the SRPR exceeds 90%, and guarantees the perfect recovery when m = 1.9n, as shown in Fig. 2.

Fig. 2. Empirical success rate with n = 1000. Noiseless real-valued Gaussian model with m/n varying by 0.1 from 1 to 5.

Fig. 3 uses 𝑚 = 2𝑛 − 1 and 𝑚 = 4𝑛 − 4 to compare the convergence speed of each algorithm in the real and complex cases, respectively. It is worth mentioning that TWF was not added in Fig. 3 (a) because it did not perform well. SRPR can solve the phase retrieval problems with fewer iterations compared with other competitive algorithms.

Fig. 3. Relative error versus iterations for 𝑛𝑛 = 1000. (top) Noiseless real-valued Gaussian model with 𝑚/𝑛 = 2. (bottom) Noiseless complex-valued Gaussian model with 𝑚/𝑛 = 4.

The last experiment was to test the robustness of SRPR against additive noise destruction. We set up a real noisy Gaussian model 𝜓_𝑖 = |⟨𝒂_𝑖, 𝒙⟩| + 𝜂_𝑖 with 𝜂_𝑖~𝒩(𝟎, 𝜎²𝑰), where 𝜎𝜎2 is set to satisfy the signal-to-noise ratio (SNR)

\(S N R:=10 \log _{10} \sum_{i=1}^{m}\left|\left\langle\boldsymbol{a}_{i}, \boldsymbol{x}\right\rangle\right|^{2} / m \sigma^{2}\) (21)

which is varied from 10 dB to 50 dB. And the noisy data was generated as \(\psi_{i}=\sqrt{y_{i}}\). Fig. 4 show the normalized mean-square error

\(\mathrm{NMSE}:=\operatorname{dist}^{2}\left(\mathbf{z}_{T}, \boldsymbol{x}\right) /\|x\|^{2}\) (22)

as a function of SNR for different 𝑚/𝑛 values. It can be seen from Fig. 4 that the values of NMSE with different 𝑚/𝑛 are inversely proportional to SNR. From Fig. 4, we can see that when the ratio of 𝑚𝑚/𝑛𝑛 is 3 to 6, the curves almost overlap, which proves the stability of SRPR under the influence of additive noise.

Fig. 4. The normalized mean-square error as a function of SNR for different 𝑚𝑚/𝑛𝑛 values.

4. Conclusion

This paper proposed a stochastic reweighted gradient algorithm to solve large-scale phase retrieval problems. The algorithm is divided into two stages. First, a new stochastic variance reduction method is used to solve the initialization stage's valuation problem. Then by a new stochastic reweighted amplitude-based iterations method, we refined the initial estimated value. The simulation test proves that, compared with TAF, TWF, STAF, RAF, the recovery success rate and convergence speed of SRPR are improved in both real and complex cases. Moreover, approaching the limit of information theory, SRPR can still perfectly recover the signal, about 1.9𝑛 under real Gaussian models.

References

J. C. Dainty and J. R. Fienup, "Phase Retrieval and Image Reconstruction for Astronomy," Image Recover. theory Appl., pp. 231-275, 1987.
S. C. Mayo et al., "X-ray phase-contrast microscopy and microtomography," Opt. Express, vol. 11, no. 19, pp. 2289-2302, 2003. https://doi.org/10.1364/OE.11.002289
R. P. Millane, "Phase retrieval in crystallography and optics," J. Opt. Soc. Am. A, vol. 7, no. 3, p. 394-411, 1990. https://doi.org/10.1364/josaa.7.000394
J. Miao, P. Charalambous, J. Kirz, and D. Sayre, "Extending the methodology of X-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens," Nature, vol. 400, no. 6742, pp. 342-344, 1999. https://doi.org/10.1038/22498
C. Jaramillo, R. G. Valenti, L. Guo, and J. Xiao, "Design and analysis of a single-Camera omnistereo sensor for quadrotor Micro Aerial Vehicles (MAVs)," Sensors (Switzerland), vol. 16, no. 2, 2016.
R. Balan, P. Casazza, and D. Edidin, "On signal reconstruction without phase," Appl. Comput. Harmon. Anal., vol. 20, no. 3, pp. 345-356, 2006. https://doi.org/10.1016/j.acha.2005.07.001
A. S. Bandeira, J. Cahill, D. G. Mixon, and A. A. Nelson, "Saving phase: Injectivity and stability for phase retrieval," Appl. Comput. Harmon. Anal., vol. 37, no. 1, pp. 106-125, 2014. https://doi.org/10.1016/j.acha.2013.10.002
E. J. Candes, X. Li, and M. Soltanolkotabi, "Phase retrieval via Wirtinger flow: Theory and algorithms," IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1985-2007, 2015. https://doi.org/10.1109/TIT.2015.2399924
Y. C. Eldar and S. Mendelson, "Phase retrieval: Stability and recovery guarantees," Appl. Comput. Harmon. Anal., vol. 36, no. 3, pp. 473-494, 2014. https://doi.org/10.1016/j.acha.2013.08.003
Y. Chen and E. J. Candes, "Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems," Commun. Pure Appl. Math., vol. 70, no. 5, pp. 822-883, 2017. https://doi.org/10.1002/cpa.21638
G. Wang, G. B. Giannakis, and Y. C. Eldar, "Solving systems of random quadratic equations via truncated amplitude flow," IEEE Trans. Inf. Theory, vol. 64, no. 2, pp. 773-794, 2018. https://doi.org/10.1109/tit.2017.2756858
Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, "Phase Retrieval with Application to Optical Imaging: A contemporary overview," IEEE Signal Process. Mag., vol. 32, no. 3, pp. 87-109, 2015. https://doi.org/10.1109/MSP.2014.2352673
Y. Shechtman, A. Beck, and Y. C. Eldar, "GESPAR: Efficient phase retrieval of sparse signals," IEEE Trans. Signal Process., vol. 62, no. 4, pp. 928-938, 2014. https://doi.org/10.1109/TSP.2013.2297687
Z. Xiao, Y. Zhang, K. Zhang, D. Zhao, and G. Gui, "GARLM: Greedy autocorrelation retrieval Levenberg-Marquardt algorithm for improving sparse phase retrieval," Appl. Sci., vol. 8, no. 10, 2018.
Y. C. Eldar, P. Sidorenko, D. G. Mixon, S. Barel, and O. Cohen, "Sparse phase retrieval from shorttime fourier measurements," IEEE Signal Process. Lett., vol. 22, no. 5, pp. 638-642, 2015. https://doi.org/10.1109/LSP.2014.2364225
K. Jaganathan, S. Oymak, and B. Hassibi, "Sparse phase retrieval: Uniqueness guarantees and recovery algorithms," IEEE Trans. Signal Process., vol. 65, no. 9, pp. 2402-2410, 2017. https://doi.org/10.1109/TSP.2017.2656844
K. Jaganathan, Y. C. Eldar, and B. Hassibi, "STFT Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms," IEEE J. Sel. Top. Signal Process., vol. 10, no. 4, pp. 770-781, Jun. 2016. https://doi.org/10.1109/JSTSP.2016.2549507
T. Bendory, Y. C. Eldar and N. Boumal, "Non-Convex Phase Retrieval From STFT Measurements," IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 467-484, Jan. 2018. https://doi.org/10.1109/tit.2017.2745623
E. J. Candes, X. Li, and M. Soltanolkotabi, "Phase retrieval from coded diffraction patterns," Appl. Comput. Harmon. Anal., vol. 39, no. 2, pp. 277-299, 2015. https://doi.org/10.1016/j.acha.2014.09.004
E. J. Candes, X. Li, and M. Soltanolkotabi, "Phase retrieval via wirtinger flow: Theory and algorithms," IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1985-2007, 2015. https://doi.org/10.1109/TIT.2015.2399924
P. Netrapalli, P. Jain, and S. Sanghavi, "Phase retrieval using alternating minimization," IEEE Trans. Signal Process., vol. 63, no. 18, pp. 4814-4826, 2015. https://doi.org/10.1109/TSP.2015.2448516
E. J. Candes, T. Strohmer, and V. Voroninski, "PhaseLift: Exact and stable signal recovery from magnitude measurements via convex programming," Commun. Pure Appl. Math., vol. 66, no. 8, pp. 1241-1274, 2013. https://doi.org/10.1002/cpa.21432
G. Wang, G. B. Giannakis, and J. Chen, "Scalable Solvers of Random Quadratic Equations via Stochastic Truncated Amplitude Flow," IEEE Trans. Signal Process., vol. 65, no. 8, pp. 1961-1974, 2017. https://doi.org/10.1109/TSP.2017.2652392
G. Wang, G. B. Giannakis, Y. Saad, and J. Chen, "Phase Retrieval via Reweighted Amplitude Flow," IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2818-2833, 2018. https://doi.org/10.1109/tsp.2018.2818077
L. Zhang, G. Wang, G. B. Giannakis, and J. Chen, "Compressive Phase Retrieval via Reweighted Amplitude Flow," IEEE Trans. Signal Process., vol. 66, no. 19, pp. 5029-5040, 2018. https://doi.org/10.1109/tsp.2018.2862395
I. Waldspurger, A. D'Aspremont, and S. Mallat, "Phase recovery, MaxCut and complex semidefinite programming," Math. Program., vol. 149, no. 1-2, pp. 47-81, 2015. https://doi.org/10.1007/s10107-013-0738-9
K. Huang, Y. C. Eldar, and N. D. Sidiropoulos, "Phase Retrieval from 1D Fourier Measurements: Convexity, Uniqueness, and Algorithms," IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6105-6117, 2016. https://doi.org/10.1109/TSP.2016.2601291
T. Goldstein and C. Studer, "PhaseMax: Convex Phase Retrieval via Basis Pursuit," IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2675-2689, 2018. https://doi.org/10.1109/TIT.2018.2800768
G. Wang, L. Zhang, G. B. Giannakis, M. Akcakaya, and J. Chen, "Sparse phase retrieval via truncated amplitude flow," IEEE Trans. Signal Process., vol. 66, no. 2, pp. 479-491, 2018. https://doi.org/10.1109/TSP.2017.2771733
R. W. Gerchberg and W. O. Saxton, "A practical algorithm for the determination of phase from image and diffraction plane pictures," Optik (Stuttg), vol. 35, no. 2, pp. 237-246, 1972.
J. R. Fienup, "Reconstruction of an object from the modulus of its Fourier transform," Opt. Lett., vol. 3, no. 1, pp. 27-29, 1978. https://doi.org/10.1364/OL.3.000027
H. Zhang, Y. Zhou, Y. Liang, and Y. Chi, "Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations," to be published.
J. Sun, Q. Qu, and J. Wright, "A Geometric Analysis of Phase Retrieval," Found. Comput. Math., vol. 18, no. 5, pp. 1131-1198, 2018. https://doi.org/10.1007/s10208-017-9365-9
T. Qiu and D. P. Palomar, "Undersampled Sparse Phase Retrieval via Majorization-Minimization," IEEE Trans. Signal Process., vol. 65, no. 22, pp. 5957-5969, 2017. https://doi.org/10.1109/TSP.2017.2745459
R. Kolte and A. Ozgur, "Phase Retrieval via Incremental Truncated Wirtinger Flow," to be published.
K. Wei, "Solving systems of phaseless equations via Kaczmarz methods: A proof of concept study," Inverse Probl., vol. 31, no. 12, 2015.
R. Kolte and A. Ozgur, "Phase Retrieval via Incremental Truncated Wirtinger Flow," to be published.
O. Shamir, "Fast stochastic algorithms for SVD and PCA: Convergence properties and convexity," in Proc. of 33rd International Conference on Machine Learning, ICML 2016, pp. 248-256, 2016.
R. Johnson and T. Zhang, "Accelerating stochastic gradient descent using predictive variance reduction," Advances in Neural Information Processing Systems, vol. 26, pp. 1-9, 2013.
E. Min, J. Long and J. Cui, "Analysis of the Variance Reduction in SVRG and a New Acceleration Method," IEEE Access, vol. 6, pp. 16165-16175, 2018. https://doi.org/10.1109/access.2018.2814212
G. Wang, G. B. Giannakis and J. Chen, "Solving large-scale systems of random quadratic equations via stochastic truncated amplitude flow," in Proc. of 2017 25th European Signal Processing Conference (EUSIPCO), Kos, pp. 1420-1424, 2017.

KSII Transactions on Internet and Information Systems (TIIS)

Large-Scale Phase Retrieval via Stochastic Reweighted Amplitude Flow

Abstract

Keywords

1. Introduction

1.1 Related Works

1.2 Our Contribution

2. Stochastic Reweighted Phase Retrieval

2.1 Sample Stochastic Variance Reducing Initialization

2.2 Stochastic Reweighted Gradient Stage

3. Simulation Results

4. Conclusion

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)