DOI QR코드

DOI QR Code

Cloud Removal Using Gaussian Process Regression for Optical Image Reconstruction

  • Park, Soyeon (Department of Geoinformatic Engineering, Inha University) ;
  • Park, No-Wook (Department of Geoinformatic Engineering, Inha University)
  • Received : 2022.08.02
  • Accepted : 2022.08.16
  • Published : 2022.08.31

Abstract

Cloud removal is often required to construct time-series sets of optical images for environmental monitoring. In regression-based cloud removal, the selection of an appropriate regression model and the impact analysis of the input images significantly affect the prediction performance. This study evaluates the potential of Gaussian process (GP) regression for cloud removal and also analyzes the effects of cloud-free optical images and spectral bands on prediction performance. Unlike other machine learning-based regression models, GP regression provides uncertainty information and automatically optimizes hyperparameters. An experiment using Sentinel-2 multi-spectral images was conducted for cloud removal in the two agricultural regions. The prediction performance of GP regression was compared with that of random forest (RF) regression. Various combinations of input images and multi-spectral bands were considered for quantitative evaluations. The experimental results showed that using multi-temporal images with multi-spectral bands as inputs achieved the best prediction accuracy. Highly correlated adjacent multi-spectral bands and temporally correlated multi-temporal images resulted in an improved prediction accuracy. The prediction performance of GP regression was significantly improved in predicting the near-infrared band compared to that of RF regression. Estimating the distribution function of input data in GP regression could reflect the variations in the considered spectral band with a broader range. In particular, GP regression was superior to RF regression for reproducing structural patterns at both sites in terms of structural similarity. In addition, uncertainty information provided by GP regression showed a reasonable similarity to prediction errors for some sub-areas, indicating that uncertainty estimates may be used to measure the prediction result quality. These findings suggest that GP regression could be beneficial for cloud removal and optical image reconstruction. In addition, the impact analysis results of the input images provide guidelines for selecting optimal images for regression-based cloud removal.

Keywords

1. Introduction

Remote sensing imagery is one of the most important data sources for monitoring the Earth environments owing to its ability to provide quantitative information regarding various spatiotemporal scales (Park et al.,2019). In particular, the periodicity of remote sensing images is beneficial for monitoring the natural environment, such as vegetation monitoring which requires continuous temporal information (Ahn et al., 2018; Li et al., 2021).

However, it is often difficult or even impossible to acquire a complete set of time-series optical satellite images as a consequence of cloud contamination and sensor malfunctions.This limited data availability is an obstacle to continuous environmental monitoring. For example, the growing season of paddy rice overlaps with the rainy season in Korea (Na et al., 2017). Hence, it is often impossible to obtain cloud-free optical images from June to July, which coincides with the growing season of the paddy rice.

To overcome the limited availability of cloud free optical images, cloud removal, which predicts reflectance values obscured by clouds, was developed for time-series image construction. The cloud removal methods for reconstructing the missing information in optical images can be categorized into spatial and temporal-based methods, depending on the supplementary information sources used for the missing value prediction (Shen et al., 2015).

Spatial-based methods, such as spatial interpolation using geostatistical kriging (Zhang et al., 2007) and variation-based methods (Shen and Zhang, 2009), reconstruct missing values using reflectance values in cloud-free regions within the same image. Temporal-based methods include temporal replacement, filtering, and learning methods (Shen et al., 2015). Chen et al. (2011)proposed a neighborhood-similar pixel interpolator (NSPI) to fill the gaps in Landsat ETM+ scan-line corrector(SLC)-off images. Zhu et al.(2012) developed the geostatistical NSPI (GNSPI), which combines NSPI with ordinary kriging to restore missing values in heterogeneous regions. In addition, a weighted linear regression-based multi-temporal recovery method was presented to quantify the temporal relationships between cloudy images and cloud-free images acquired at other times (Zeng et al., 2013). However, this linear relationship may fail to completely explain the seasonal vegetation variations.

Recently, there has been a growing interest in machine learning capable of learning non-linear relationships of input data in the remote sensing community (Johnson et al., 2016; Kim et al., 2018; Kwak et al., 2021). Among numerous machine learning algorithms, kernel-based methods, including kernel ridge regression, support vector regression, and Gaussian process(GP)regression, have attracted attention because of their ability to quantify the non-linear relationships of input data (Verrelst et al., 2012).

GP regression is a promising kernel-based machine learning method because of its many advantages (Camps-Valls et al., 2016; Schulz et al., 2018). It is a non-parametric model that defines a distribution over functions via a stochastic process and infersin the space of functions (Rasmussen and Williams, 2006). An attractive aspect of GPregression isits ability to provide uncertainty estimates associated with prediction because it estimates the distribution of the prediction values. Another advantage is automatic hyperparameter optimization via marginal likelihood maximization during model training.

Several studies have applied GP regression to estimate biophysical parameters from remote sensing images, including vegetation index and chlorophyll content (Pasolli et al., 2010; Verrelst et al., 2013; Pipia et al., 2021). These studies demonstrated that the uncertainty estimate could be an indicator of the prediction performance. Despite the great potential of GP regression for prediction tasks, very few studies have applied GP regression to gap filling, including cloud removal (Belda et al., 2020). The prediction performance of GP regression for cloud removal depends on the information content ofthe input images. Hence, it is necessary to analyze the effects of various combinations of input images and spectral bands on prediction performance. However, to the best of our knowledge, this analysis has not yet been conducted on cloud removal.

This study aims to evaluate the potential of GP regression for restoring cloud-contaminated optical images. To this end, various experimental cases were designed in terms of the (1) number of input images and (2) number of spectral bands. Cloud removal experiments using Sentinel-2 images were conducted on two croplands. The prediction performance of GP regression was quantitatively compared with that of random forest (RF) regression. Furthermore, the advantage of GP regression was highlighted by analyzing the relationship between uncertainty information and prediction results.

2. Study Areas and Data

Cloud removal experiments were conducted in two sub-areas in Korea, Gimje and Hapcheon, which are the major paddy rice and onion/garlic cultivation areas in Korea, respectively (Fig. 1). Agricultural sites were chosen because periodic vegetation monitoring requires cloud-free time-series images.

OGCSBN_2022_v38n4_327_f0001.png 이미지

Fig. 1. True color composite of Sentinel-2 imagery in two study areas: (a) image on Aug. 20, 2021 of Gimje and (b) image on Apr. 14, 2021 of Hapcheon.

The vegetation vitality of the paddy rice at the Gimje site increases from July to August, and harvesting continues from September to the end of October. Onion and garlic at the Hapcheon site reach maximum vegetation vitality in late April or early May and are harvested in late May.

Sentinel-2 Level-2A products were utilized for the cloud removal experiments (Table 1). The red-edge band with a spatial resolution of 20 m was converted to 10 m imagery using bilinear resampling to use all spectral bands as inputs for regression modeling. Five images acquired from May to October were collected for the Gimje site, and the Hapcheon site used five images acquired approximately semi-monthly from March to May in consideration of the growth cycle of crops in the two areas.

Table 1. Summary of Sentinel-2 images used in this study

OGCSBN_2022_v38n4_327_t0001.png 이미지

Synthetic cloud masks were first generated randomly and then used as test data to compute the quantitative prediction accuracy statistics of the regression models (Fig. 2). This simulation approach is commonly employed in cloud removal experiments. When using the synthetic cloud masks for the cloud removal experiment, the prediction results are affected by various factors, such as the cloud size and land-cover types (Wang et al., 2022). Therefore, synthetic cloud masks of different sizes and locations were created to mimic real situations. This study aims to analyze the effects of input data and compare the prediction performance of regression-based cloud removal. To this end, the quantitative evaluation of prediction performance is essential. When the cloud mask layer from Sentinel-2 products is used for the experiment, it is not feasible to obtain actual reflectance values in the masked areas, indicating that the computation of accuracy statistics is impossible. Hence, the simulation approach using synthetic cloud masks was adopted in this study. Imagery with synthetic cloud masks was considered as the target imagery for cloud removal. As the actual reflectance values are known in cloud masks, quantitative measures of prediction performance can be easily computed.

OGCSBN_2022_v38n4_327_f0002.png 이미지

Fig. 2. Simulated cloud-contaminated images for two study sites: (a) Gimje and (b) Hapcheon. White areas represent synthetic cloud masks.

3. Methods

1) Gaussian Process Regression

As a generalization of the multivariate Gaussian probability distribution, a GP is a collection of random variables, any finite number of which has a joint Gaussian distribution (Rasmussen and Williams, 2006). An essential principle for applying GP regression to cloud removal, synthesized from Rasmussen and Williams (2006), Camps-Valls et al. (2016), Liu et al. (2018), and Schulz et al. (2018), is briefly described.

Given a dataset x, GP defines a real process f(x) as a distribution over functions specified by its mean and covariance functions:

\(\mathrm{f}(\mathbf{x}) \sim \mathcal{G P}\left(m(\mathbf{x}), k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)\right)\)       (1)

where m(x) and k(x, x′) are the mean and covariance functions between the data values(x and x′),respectively. The covariance function is also called the kernel function.

The mean function reflects the average of all functions in the distribution evaluated at x. The prior mean function is usually set to zero. The kernel function represents the similarity between the input data values. The greater k(x, x′) value, the greater the correlation between the corresponding outputs f(x) and f(x′).

In this study, the square exponential kernel, also called the radial basisfunction (RBF) kernel, was used as the kernel function. It is defined as:

\(k\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\sigma_f^2 \exp \left(-\frac{\left\|\mathbf{x}-\mathbf{x}^{\prime}\right\|^2}{2 \lambda^2}\right)\)      (2)

where  \(\sigma_f^2\) and λ are the signal variance and length-scale parameter, respectively. These are the hyperparameters to be determined in the GP regression.

Given a training dataset \(D=\{x_n, y_n\}^N_{n=1}\), the output target variable yn in GPR is formulated as the sum of the unknown latent function f(xn) and the noise variance:

\(y_n=f\left(\mathbf{x}_n\right)+\varepsilon_n, \quad \varepsilon_n \sim N\left(0, \sigma_{\varepsilon}^2\right)\)      (3)

where the noise variance term \(\varepsilon_n,\) follows a normal distribution with a zero mean and a noise variance \(\sigma_{\varepsilon}^2\) .

GP encodes prior distributions over functions with x and estimates posterior distributions by adopting a Bayesian framework. The three hyperparameters (\(\sigma_f^2\), λ, \(\sigma_{\varepsilon}^2\) ) are inferred from the training dataset and optimized by maximizing the log marginal likelihood (Rasmussen and Williams, 2006). Once the hyperparameters have been determined, the output for a new data value x* is obtained as the mean of the distribution along with the variance representing the output uncertainty.

2) Regression-based Cloud Removal

Let Y(x, bi , t0) and Y(x, bi , tj ) be the reflectances at pixel location x in the ith spectral band bi of the cloud-contaminated and cloud-free images acquired at different times(t0 and tj ) in the study area, respectively. Y(x, bi , t0)isthe target image containing both cloud (C) and non-cloud (NC) pixels. The cloud-free images are referred to as supplementary images in this study.

A regression-based approach for cloud removal first quantifies the temporal relationships between Y(xNC, bi , t0) and Y(xNC, bi , tj ) for non-cloud pixels (xNC) via regression modeling. Then, Y* (xC, bi , t0), the reflectance at cloud-contaminated pixels (xC), is predicted by applying the quantified relationships (Fig. 3). This study employed GP regression as a primary regression model.

OGCSBN_2022_v38n4_327_f0003.png 이미지

Fig. 3. Work flow for regression-based cloud removal.

It should be noted that the number of cloud-free images can be one or more. In other words, singleor multi-temporal imagery can be used as input for regression modeling, depending on the data availability within the study area. Furthermore, other spectral bands (bk, k≠i) can also be used to predict the reflectance in band bi , since multi-spectral bands with strong correlations provide significant information for regression modeling.

3) Experimental Design

In the cloud removal experiments, the period with the highest vegetation vitality at the two study sites, which is the most crucial time for vegetation monitoring, was used as the target prediction date (t0). The prediction dates for the Gimje and Hapcheon sites were August and April, respectively. The images acquired before and after the target prediction date were considered as supplementary images. As five Sentinel-2 images were collected, four supplementary images can be available. The experimental setup is listed in Table 2.

Table 2. List of prediction target and supplementary images utilized for the experiment(t0:target prediction date)

OGCSBN_2022_v38n4_327_t0002.png 이미지

Four combination cases were prepared by combining the number of supplementary images and spectral bands to analyze the effect of input data combinations (Table 3). First, single imagery (S) and multi-temporal images (M) were considered for regression modeling. The use of single imagery is considered to mimic the actual case in which the availability of cloud free images is extremely limited. When using multitemporal images, the minimum and maximum numbers of input images were two and four, respectively. Second, the number of spectral bands is divided into two cases: using a single band and using five multi bands. Using single-band imagery indicates that only the spectral band in the supplementary imagery corresponding to the target spectral band for prediction is used as the input. Among the four combination cases, SS in Table 3 refers to predicting the reflectance in a given spectral band at t0 using input data with the same spectral band in the supplementary imagery obtained at a single time. In contrast, the MM case in Table 3 predicts the reflectance in any spectral band at t0 using five multi-spectral bands from multi-temporal supplementary images.

Table 3. Combinationcasesforthenumberofsupplementary images and spectral bands

OGCSBN_2022_v38n4_327_t0003.png 이미지

Finally, eight experiments were designed based on the available supplementary images in Table 2 and combination cases in Table 3 (Table 4). For example, SM#1 uses the five spectral bands from the supplementary images obtained at t1 as inputs to predict all spectral bands. For MS#2, the same two spectral bands in the supplementary images obtained at t1 and t2 were used to predict each of the five individual bands. In particular, SS and SM used the imagery at t1 with a large temporal difference from t0. In the case of using the single-date supplementary imagery, it is not always possible to use the imagery obtained at the time close to the prediction date. Hence, the imagery at t1 was utilized as the single-date supplementary imagery for SS and SM. Using multiple spectral bands from multi-temporal supplementary images is more likely to improve prediction performance because of the much valuable information for cloud removal. However, using more inputs for GPR modeling generally results in an increased computational complexity.

Table 4. Experimental cases of different combinations of input data (○: utilized, ×: not utilized)

OGCSBN_2022_v38n4_327_t0004.png 이미지

The performance of GP regression was evaluated through a comparison with RF regression, which is a representative machine learning method. Among many machine learning models, RF regression was selected because it is a non-parametric method, like GP regression, and adequately quantifies non-linear relationships. Random forest is an ensemble machine learning method that trains bagged multiple decision trees and can be effectively applied for regression and classification tasks(Breiman, 2001; Kwak et al., 2021). RF regression has two hyperparameters to be set: the number of features for node splitting and the number of trees to be grown. The optimal values for the two hyperparameters were determined using a grid search procedure. A random extraction of 1% of the total number of non-cloud pixels in the study area was used as training data for regression modeling by considering the computational efficiency and availability of non cloud pixels in real applications.

For a quantitative comparison of the cloud removal results, accuracy statistics were computed using the actual value at the synthetic cloud masks. The root mean square error (RMSE) and structural similarity index measure (SSIM) were computed as quantitative measures of prediction accuracy and spatial similarity, respectively. The ideal valuesfortheRMSE and SSIM are 0 and 1, respectively. For example, as the SSIM approaches 1, the prediction result is more structurally similar to the actual values.

4. Results and Discussion

1) Correlation Analysis of Input Data

Prior to the cloud removal experiments, the correlations between the input image bands obtained at t0 and tj were analyzed (Table 5) because the relationships between the input images may affect the prediction result. At the Gimje site, the target image at t0 (Aug. 20) was relatively highly correlated to the supplementary data at t2 (Jul. 26) for all spectral bands, where the lowest correlation was observed between images obtained at t0 and t4 (Oct. 9) due to harvesting.

Table 5. Correlation coefficients per each spectral band between target and supplementary images

OGCSBN_2022_v38n4_327_t0005.png 이미지

The input images at the Hapcheon site generally showed greater correlations than those at the Gimje site. In addition, the closer the image acquisition date is to t0 (Apr. 14), the higher the correlation. This result is mainly due to the difference in spectral reflectance according to land-cover types. Phenological changes in major crops were prominent in Hapcheon. In contrast, the changes in physical conditions (i.e., growing and harvesting) of paddy rice parcels were significant in Gimje, thereby resulting in significant differences in spectral reflectance depending on the image acquisition dates. This difference in the correlation between the two sites may affect the prediction performance for different combinations of input images and spectral bands.

2) Comparison of Different Input Combination Cases

Fig. 4 shows the prediction accuracy values of the GPregression for the different input combination cases defined inTables 3 and 4. Using multi-temporal images as inputs generally improved the prediction accuracy for both sites; however, this behavior was not always observed. For the cases of MS#2 and MS#3, the errors slightly increased for green and red bands, despite the utilization of multi-temporal supplementary images. The inclusion of the image at t4 which has a low correlation with the target image at t0 (see Table 5) in the multi-temporal dataset led to increased errors. Thus, for some cases, using a small number of highly correlated supplementary images may improve the predictive performance more than several data with low correlations to the target image. This emphasizes the importance of selecting the optimal data for cloud removal.

OGCSBN_2022_v38n4_327_f0004.png 이미지

Fig. 4. RMSE values of GP regression results for different combination cases at two sites. (a) Gimje site results and (b) Hapcheon site results.

When multiple bands were used as inputs, the prediction error slightly decreased compared with the case using a single spectral band. Additionally, the prediction accuracy increased with the number of supplementary images. However, the effect of the number of supplementary images on the prediction accuracy became less significant, indicating that high correlations between multiple bands could improve prediction accuracy. Therefore, the use of multitemporal and multi-bands highly correlated with the target image is suggested as input data for regression based cloud removal.

3) Quantitative Comparison Result

The prediction performance of the GP regression was also compared with that of the RF regression. As shown in Fig. 4, all spectral bands, except for the near infra-red (NIR) band, showed a slight variation in RMSE (0.02 or less). Therefore, a significant difference in prediction accuracy between the GP regression and RF regression was not observed.

As the NIR band is the essential spectral band for vegetation monitoring, its prediction performance was compared in terms of the RMSE and SSIM, and the results are listed in Table 6. The prediction accuracies of GP regression and RF regression were similar for the Gimje site. In particular, GP regression showed greater prediction performance for the SM and MM cases using multiple bands as inputs compared to RF regression. Although the RMSE value of the RF regression was less than that of the GP regression for the SS and MS cases using a single spectral band as input, the GP regression yielded higher structural similarity in most cases. The high correlations between the input spectral bands had a greater influence on the prediction results of the GP regression than those of the RF regression. The temporal changes between t0 and tj were better reflected in the RF regression results. For the Gimje site which is mainly composed of paddy rice fields and is spatially homogeneous, the RF regression satisfactorily predicted the spectral value. However, the GP regression could reproduce the structural characteristics better than the RF regression.

The Hapcheon site includes more diverse land cover types and heterogeneously distributed small crop fields than the Gimje site. For this heterogeneous landscape, the prediction performance of the GP regression for the NIR band was superior to that of the RF regression, unlike the Gimje site. In most cases, the GP regression showed smaller prediction errors and greater SSIM than the RF regression. In particular, the GP regression can reflect the structural characteristics of both homogeneous and heterogeneous areas in the prediction results.

Fig. 5 shows the scatterplots between the actual values and the prediction results of the NIR band. At both sites, over-estimation of low values and underestimation of high values were observed in the RF regression predictions. The GP regression predictions better reproduced the actual values. The slightly elevated RMSE value for the GP regression of MM#4 at the Gimje site in Table 6 is due to the overestimation of high values. However, the broader distribution of the NIR reflectance could be well reproduced in the GP regression predictions compared with the RF regression predictions. A significant difference in predictive performance between the two regression models was not observed. Hence, further analyses were performed, including the visual comparison of the predicted results and the effectiveness of the uncertainty information provided by the GP regression.

OGCSBN_2022_v38n4_327_f0005.png 이미지

Fig. 5. Scatterplots of actual values versus prediction results for MM#4 within the NIRband at two sites. (a) Gimje site and (b) Hapcheon site.

Table 6. RMSE and SSIM values for the NIR band prediction by GP regression and RF regression for different combination cases (GPR: GP regression; RFR: RF regression). The best accuracy case is highlighted in bold

OGCSBN_2022_v38n4_327_t0006.png 이미지

4) Qualitative Comparison Results

The prediction performance of GP regression was also qualitatively compared with that of RF regression through visual inspection. Fig. 6 shows the prediction results for some sub-areas of the Gimje site. Area A is inside the cloud area, whereas area B contains both cloud-free and cloud pixels. For SM#1, the RF regression exhibited a slightly reduced RMSE value compared to that of the GP regression. As shown in area A in Fig. 6, the prediction result by RF regression included some pixels with extraordinarily high and low reflectance values within the same field, which causes the low SSIM value. In contrast, GP regression predicts pixel values within the same field more homogeneously.

OGCSBN_2022_v38n4_327_f0006.png 이미지

Fig. 6. Comparison of the GP and RF regression results with the actual imagery (NIR-red-green bands as R-G-B) in two sub-areas within the Gimje site. A and B in the left image are the sub-areas for the SM#1 and MM#4 results, respectively

The speckle effects in the RF regression results were significantly reduced in the case of MM#4. As area B included a cloud boundary, both regression models showed discontinuity as a result of differences in reflectance at the cloud boundary. In the prediction results of the GP regression, fields with black patterns appeared clearly, and the field boundary was less discontinuous than that of the RF regression.

Fig. 7 illustrates the prediction results of both regression models for the Hapcheon site. Similar prediction results were observed in area A, which represents the case of SS#1, as in the case of SM#1 at the Gimje site. The dark spectral patterns inside the field were underestimated in the RF regression prediction. In contrast, the GP regression yielded a prediction result with high structural similarity because there were no spectral patterns within the field.

OGCSBN_2022_v38n4_327_f0006.png 이미지

Fig. 7. Comparison of the GP and RF regression results with the actual imagery (NIR-red-green bands as R-G-B) in two sub-areas within the Hapcheon site. A and B in the left image are the sub-areas of the SS#1 and MS#4 results, respectively

For area B in the case of MS#4, the field boundaries were unclear, and reflectance was over-estimated inside the fields in the prediction result of RF regression. However, GP regression could reproduce the boundaries between crop fields well and predict reflectance values similar to the actual image. In contrast, the prediction results of both regression models still included the discontinuity in the reflectance near the cloud boundaries. Thus, post-processing of the prediction results is required to reduce the discontinuity.

5) Uncertainty Estimation

The GP regression returns the mean of the estimated posterior probability distribution as a prediction value. Because the posterior probability distribution is readily available, the variance or standard deviation of the probability distribution can be computed and used to quantify the uncertainty of the prediction. This study used the standard deviation as a quantitative measure of the uncertainty.

The relationship between the uncertainty and prediction error was further analyzed in this study. The absolute error between the predicted and actual values was first calculated for each cloud pixel and then compared with the uncertainty value. As shown in Fig. 8, a clear correlation between the absolute error and the uncertainty was not observed in the NIR band. For the Gimje site, the correlation between absolute error and uncertainty values was approximately 0.6 for visible bands, whereas the correlation for the NIRband was approximately 0.2. The Hapcheon site showed correlation values of approximately 0.7 and 0.3 for visible and NIR bands, respectively. As the actual value is not used directly to estimate the uncertainty attached to the prediction, a distinct correlation between the errors and uncertainty estimates may not be obtained. However, some pixels with high prediction errors exhibited high uncertainty values, as shown in Fig. 8. As the actual reflectance values in cloud pixels are unavailable for real applications, it is not feasible to compute errors in cloud pixels. Thus, it is suggested that uncertainty estimates by GP regression can be used as an indirect measure of the reconstructed reflectance values. The low correlation between absolute error and uncertainty values for the NIR band may result from the significant temporal variability in spectral reflectance of crops occupying most of the study areas. Thus, this suggests that additional experiments in urban and barren areas with less temporal variability would be needed to further verify the relationship between errors and uncertainty estimates.

OGCSBN_2022_v38n4_327_f0008.png 이미지

Fig. 8. Absolute error and uncertainty maps of GP regression for the NIR band within cloud masks.

5. Conclusions

Reconstruction of time-series optical images by cloud removal is often required for continuous environmental monitoring. In this study, the potential of GP regression for cloud removal was demonstrated through experiments using Sentinel-2 images from two agricultural regions. Particular attention was paid to analyzing the effect of input data combinations on prediction performance, which is important in regression-based prediction tasks. To this end, the prediction performance of the GP regression was evaluated for various experimental cases by combining the number of supplementary data and the number of spectral bands. First, the prediction performance for the two sites increased when using multi-temporal data, suggesting the necessity of using supplementary images highly correlated to the image at the prediction date as inputs. Second, using multiple spectral bands could improve the prediction accuracy because of the rich information from the highly correlated spectral bands. Thus, using multi-temporal images with multispectral bands is recommended for regression-based cloud removal. Compared with RF regression, the prediction performance of GP regression was superior to RF regression for the NIR band. In particular, the advantage of GP regression over RF regression was its ability to reproduce structural similarities. The uncertainty estimates of GP regression did not show a clear relationship with the prediction errors, but they were likely to be used to evaluate the prediction quality. More experiments on the removal of discontinuities near cloud boundaries and a detailed analysis of uncertainty estimates will be investigated in future work to further validate the potential of GP regression for cloud removal.

References

  1. Ahn, H.-Y., K.-Y. Kim, K.-D. Lee, C.-W. Park, K.-H. So, and S.-I. Na, 2018. Feasibility assessment of spectral band adjustment factor of KOMPSAT-3 for agriculture remote sensing, Korean Journal of Remote Sensing, 34(6-3): 1369-1382 (in Korean with English abstract). https://doi.org/10.7780/kjrs.2018.34.6.3.5
  2. Belda, S., L. Pipia, P. Morcillo-Pallares, and J. Verrelst, 2020. Optimizing Gaussian process regression for image time series gap-filling and crop monitoring, Agronomy, 10(5): 618. https://doi.org/10.3390/agronomy10050618
  3. Breiman, L., 2001. Random forests, Machine Learning, 45: 5-32. https://doi.org/10.1023/A:1010933404324
  4. Camps-Valls, G., J. Verrelst, J. Munoz-Mari, V. Laparra, F. Mateo-Jimenez, and J. Gomez-Dans, 2016. A survey on Gaussian processes for earth-observation data analysis: A comprehensive investigation, IEEE Geoscience and Remote Sensing Magazine, 4(2): 58-78. https://doi.org/10.1109/mgrs.2015.2510084
  5. Chen, J., X. Zhu, J.E. Vogelmann, F. Gao, and S. Jin, 2011. A simple and effective method for filling gaps in Landsat ETM+ SLC-off images, Remote Sensing of Environment, 115(4): 1053-1064. https://doi.org/10.1016/j.rse.2010.12.010
  6. Johnson, M.D., W.W. Hsieh, A.J. Cannon, A. Davidson, and F. Bedard, 2016. Crop yield forecasting on the Canadian Prairies by remotely sensed vegetation indices and machine learning methods, Agricultural and Forest Meteorology, 218: 74-84. https://doi.org/10.1016/j.agrformet.2015.11.003
  7. Kim, Y., G.-H. Kwak, K.-D. Lee, S.-I. Na, C.-W. Park, and N.-W. Park, 2018. Performance evaluation of machine learning and deep learning algorithms in crop classification: Impact of hyper-parameters and training sample size, Korean Journal of Remote Sensing, 34(5): 811-827 (in Korean with English abstract). https://doi.org/10.7780/kjrs.2018.34.5.9
  8. Kwak, G.-H., C.-W. Park, K.-D. Lee, S.-I. Na, H.-Y. Ahn, and N.-W. Park, 2021. Potential of hybrid CNN-RF model for early crop mapping with limited input data, Remote Sensing, 13(9): 1629. https://doi.org/10.3390/rs13091629
  9. Li, S., L. Xu, Y. Jing, H. Yin, X. Li, and X. Guan, 2021. High-quality vegetation index product generation: A review of NDVI time series reconstruction techniques, International Journal of Applied Earth Observation and Geoinformation, 105: 102640. https://doi.org/10.1016/j.jag.2021.102640
  10. Liu, M., G. Chowdhary, B.C. Da Silva, S.Y. Liu, and J.P. How, 2018. Gaussian processes for learning and control: A tutorial with examples, IEEE Control Systems Magazine, 38(5): 53-86. https://doi.org/10.1109/mcs.2018.2851010
  11. Na, S.-I., C.-W. Park, K.-D. So, J.-M. Park, and K.-D. Lee, 2017. Development of garlic & onion yield prediction model on major cultivation regions considering MODIS NDVI and meteorological elements, Korean Journal of Remote Sensing, 33(5-2): 647-659 (in Korean with English abstract). https://doi.org/10.7780/kjrs.2017.33.5.2.5
  12. Park, N.-W., Y. Kim, and G.-H. Kwak, 2019. An overview of theoretical and practical issues in spatial downscaling of coarse resolution satellite-derived products, Korean Journal of Remote Sensing, 35(4): 589-607. https://doi.org/10.7780/kjrs.2019.35.4.8
  13. Pasolli, L., F. Melgani, and E. Blanzieri, 2010. Gaussian process regression for estimating chlorophyll concentration in subsurface waters from remote sensing data, IEEE Geoscience and Remote Sensing Letters, 7(3): 464-468. https://doi.org/10.1109/lgrs.2009.2039191
  14. Pipia, L., E. Amin, S. Belda, M. Salinero-Delgado, and J. Verrelst, 2021. Green LAI mapping and cloud gap-filling using Gaussian process regression in Google Earth Engine, Remote Sensing, 13(3): 403. https://doi.org/10.3390/rs13030403
  15. Rasmussen, C.E. and C.K.I. Williams, 2006. Gaussian Processes in Machine Learning, The MIT Press, Cambridge, MA, USA.
  16. Schulz, E., M. Speekenbrink, and A. Krause, 2018. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions, Journal of Mathematical Psychology, 85: 1-16. https://doi.org/10.1016/j.jmp.2018.03.001
  17. Shen, H. and L. Zhang, 2009. A MAP-based algorithm for destriping and inpainting of remotely sensed images, IEEE Transactions on Geoscience and Remote Sensing, 47(5): 1492-1502. https://doi.org/10.1109/TGRS.2008.2005780
  18. Shen, H., X. Li, Q. Cheng, C. Zeng, G. Yang, H. Li, and L. Zhang, 2015. Missing information reconstruction of remote sensing data: A technical review, IEEE Geoscience and Remote Sensing Magazine, 3(3): 61-85. https://doi.org/10.1109/mgrs.2015.2441912
  19. Verrelst, J., J. Munoz, L. Alonso, J. Delegido, J.P. Rivera, G. Camps-Valls, and J. Moreno, 2012. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and -3, Remote Sensing of Environment, 118: 127-139. https://doi.org/10.1016/j.rse.2011.11.002
  20. Verrelst, J., J.P. Rivera, J. Moreno, and G. Camps-Valls, 2013. Gaussian processes uncertainty estimates in experimental Sentinel-2 LAI and leaf chlorophyll content retrieval, ISPRS Journal of Photogrammetry and Remote Sensing, 86: 157-167. https://doi.org/10.1016/j.isprsjprs.2013.09.012
  21. Wang, Q., L. Wang, X. Zhu, Y. Ge, X. Tong, and P.M. Atkinson, 2022. Remote sensing image gap filling based on spatial-spectral random forests, Science of Remote Sensing, 5: 100048. https://doi.org/10.1016/j.srs.2022.100048
  22. Zhang, C., W. Li, and D. Travis, 2007. Gaps-fill of SLC-off Landsat ETM+ satellite image using a geostatistical approach, International Journal of Remote Sensing, 28(22): 5103-5122. https://doi.org/10.1080/01431160701250416
  23. Zeng, C., H. Shen, and L. Zhang, 2013. Recovering missing pixels for Landsat ETM + SLC-off imagery using multi-temporal regression analysis and a regularization method, Remote Sensing of Environment, 131: 182-194. https://doi.org/10.1016/j.rse.2012.12.012
  24. Zhu, X., D. Liu, and J. Chen, 2012. A new geostatistical approach for filling gaps in Landsat ETM+ SLC-off images, Remote Sensing of Environment, 124: 49-60. https://doi.org/10.1016/j.rse.2012.04.019