DOI QR코드

DOI QR Code

No-Reference Image Quality Assessment Using Complex Characteristics of Shearlet Transform

쉬어렛 변환의 복소수 특성을 이용하는 무참조 영상 화질 평가

  • Mahmoudpour, Saeed (Department of Computer & Communications Engineering, Kangwon National University) ;
  • Kim, Manbae (Department of Computer & Communications Engineering, Kangwon National University)
  • Received : 2015.12.07
  • Accepted : 2016.04.22
  • Published : 2016.05.30

Abstract

The field of Image Quality Measure (IQM) is growing rapidly in recent years. In particular, there was a significant progress in No-Reference (NR) IQM methods. In this paper, a general-purpose NR IQM algorithm is proposed based on the statistical characteristics of natural images in shearlet domain. The method utilizes a set of distortion-sensitive features extracted from statistical properties of shearlet coefficients. A complex version of the shearlet transform is employed to take advantage of phase and amplitude features in quality estimation. Furthermore, since shearlet transform can analyze the images at multiple scales, the effect of distortion on across-scale dependencies of shearlet coefficients is explored for feature extraction. For quality prediction, the features are used to train image classification and quality prediction models using a Support Vector Machine (SVM). The experimental results show that the proposed NR IQM is highly correlated with human subjective assessment and outperforms several Full-Reference (FR) and state-of-art NR IQMs.

화질 평가 방법은 그동안 많은 방법이 소개되어 왔다. 특히 우수한 성능을 보여주는 무참조 평가에서 기법에서 발전이 지속되어 왔다. 본 논문에서는 쉬어렛 영역에서 자연영상의 통계적 특성에 기반한 무참조 영상화질 평가 방법을 제안한다. 제안 방법은 쉬어릿 계수의 통계 특성으로부터 왜곡에 민감한 특징을 추출한다. 쉬어렛 변환의 복소수 계수로부터 위상과 크기 특징을 얻어낸다. 또한 쉬어렛 변환은 다양한 스케일로 영상을 분석할 수 있기 때문에, 스케일간의 계수의 의존성에 대한 왜곡의 영향을 분석한다. 화질 예측을 위해서 특징들은 SVM(support vector machine)을 이용하여 영상 왜곡 분류 및 화질 예측에 활용된다. 실험결과는 제안 방법이 주관적 평가와의 높은 상관도를 보여주고, 또한 기존 참조 및 무참조 방법보다 우수한 성능을 보여준다.

Keywords

Ⅰ. Introduction

Following the widespread application of imaging systems, digital images can be easily captured, stored and shared among users. However, visual information is subject to different degradations during image processing and compression steps which can affect the visual quality. Human subjective test is a reliable method of visual quality assessment, however, it is costly and time-consuming. Thus, image quality is measured using an objective tool with high consistency to human evaluation. Different objective Image Quality Measures (IQMs) have been developed in recent years to assess the visual quality free of human interference[1].

The objective IQMs fall into three categories based on the amount of accessible information: Full-Reference (FR), Reduced-Reference (RR) and No-Reference (NR). In FR methods, both reference and distorted images are available. The RR methods aim to measure the quality of distorted image by using only partial information of the reference image. The NR or blind IQMs are used when there is no information about the reference image.

Most of the existing NR IQMs can be divided into two categories: (1) Distortion-specific method. This approach is dedicated to measuring the severity of a single type of distortion[2,3]. (2) General-purpose approach based on Natural Scene Statistics (NSS). This NR IQM method can be used in various distortion types and depends on the deviations from the regularity of NSS features. Recent works mostly focused on extracting a number of quality-related features from the statistical model of natural images and performed a mapping from feature space to the predicted quality score[4,5,6].

Shearlet transform is a multidimensional version of the conventional wavelet transform that analyzes the image at multiple scales and directional subbands[7]. In this paper, a new NR IQM (ShearletIQM) is proposed based on modeling the NSS in shearlet domain. Natural images possess certain statistical properties varying in the presence of distortion. Shearlet representation can efficiently determine the types of statistical variations caused by distortion. Thus, quality-related features extracted from statistics of shearlet coefficients are used to exhibit the variations and classify different distortion types. Finally, image quality is obtained by mapping from feature space to quality index using learning and regression methods.

The rest of the paper is organized as follows: Section 2 describes the framework of the proposed method. Section 3 explains the complex shearlet transform and feature extraction method is described in Section 4. In Section 5, the two-stage framework for distortion classification and quality prediction is presented. The experimental results are reported and discussed in Section 6. Finally, Section 7 concludes the paper.

 

Ⅱ. Methodology

The proposed method is based on the fact that natural images exhibit certain statistical properties varying in the presence of distortions. These statistical changes are well presented in shearlet domain. The amount and the characteristics of the variations depend on the degree and type of distortion. Therefore, the quality degradation can be predicted by quantifying the deviations of shearlet coefficients of distorted images from those of a pristine image. Here, a number of quality-related features are extracted in shearlet domain and a quality index is obtained using a machine learning approach. The framework of the proposed method is summarized in Fig. 1. First, a Complex Shearlet Transform (CST) decomposes an input image into multiple scales and directional subbands. Second, various features are extracted from real- and complex-valued shearlet coefficients in subbands. Finally, the features are trained for distortion classification followed by quality prediction using SVM and Support Vector Regression (SVR)[8].

Fig. 1.Flow diagram of the proposed framework 그림 1. 제안하는 프레이워크의 전체 흐름도

 

Ⅲ. Shearlet Transform

A disadvantage of wavelet transform is its limited capability in dealing with multivariate and directional data. Therefore, some variations of wavelets such as curvelets[9], contourlets[10] and shearlets[7] are proposed to overcome the wavelet limitations. Shearlet transform provides sparse representation for multi-dimensional data and anisotropic information at multiple scales. Thus, it can deliver an accurate detection of signal singularities. Considering these properties, shearlet transform provides accurate information about distortion effects. The shearlets form an affine system which is parameterized by three parameters: scaling, shear, and translation. The shearlet transform of an image f is defined as:

where a>0 is the scale parameter, s ∈ R is the shear parameter and t ∈ R2 denotes the translation parameter. The shearlet coefficient (ψa s, t) is given by:

where

In order to achieve optimal sparsity, the anisotropic dilation matrix Aa ensures the multi-scale property while the shear matrix Ss provides a mean to detect directions.

The CST is useful especially for the analysis of the phase. Thus, in addition to real-valued coefficients, the relation between real and imaginary parts is explored using phase and amplitude to extract discriminative features. The imaginary part can be obtained using the Hilbert transform of the real part. Let ψ = Hilbert(ϕ). CST coefficients are then computed by

and

where SHϕ and SHψ denote the real and imaginary parts of complex shearlet, respectively.

 

Ⅳ. Feature Extraction

Feature selection is the important part of creating a NR IQM model. The features should be independent from the image content and sensitive to degree and type of the distortion. To extract features, the statistical characteristics of shearlet coefficients are modeled for natural and distorted images. Fig. 2 shows an original natural image (bikes) from LIVE image database[11] and its five distorted versions.

Fig. 2.Original image (bikes) and five distorted versions from LIVE image database. (a) Original image, (b) JP2K distorted image, (c) Gaussian blur distorted image, (d) JPEG distorted image, (e) Gaussian white noise distorted image, and (f) Fast fading distorted image 그림 2. 원영상과 5개의 왜곡 영상 (bikes, LIVE image database). (a) 원영상, (b) JP2K 왜곡, (c) 가우시안 블러 왜곡, (d) JPEG 왜곡, (e) 가우시안 화이트, 및 (f) 패스트 페이딩 왜곡

First, the images are divided into blocks of size 256x256 and each block is transformed to 4 scales and 6 directions (total number of 24 subbands) using shearlet transform. Subsequently, the characterizing features are extracted in each block and an average feature vector computed from all blocks is used as a final one. As verified in experiments, the statistical properties of shearlet coefficients are very similar in different subbands of one scale, however, they change across scales. Therefore in this work, the first subbands of the scales is considered. In the following, various features captured from real- and complex-valued coefficients are explained.

1. Real-valued shearlet features

Here, the features obtained from the statistical properties of real-valued shearlet coefficients are described.

1.1 Single-subband statistics

The distribution of real-valued shearlet coefficients varies by distortion and modeling the distribution makes it possible to quantify these changes. Since high frequency components of an image are more sensitive to distortion, the distribution of real coefficients in the finest scale is utilized in modeling. The first subband in the finest scale is selected. Fig. 3 plots the histogram of normalized shearlet coefficients for original image of Fig. 2 and its different distorted versions. The distribution for original image has characterized by large concentration of values around zero and heavy tails. The distortion can affect the subband coefficients and consequently change the shape of the distribution. As shown in the figure, distortions such as Gaussian Blur (GBlur) increases the concentration of coefficients around zero and can be better fitted using Laplacian model. Note that noise creates a more Gaussian appearance due to increase of high frequency components. Therefore, a GGD (Generalized Gaussian Distribution) model is used to capture a wide range of shearlet statistics in distorted images. The univariate GGD with zero mean is given by:

where γ is the shape parameter, α and β are normalizing and scale parameters, respectively given by

where σ is the standard deviation and Γ is the gamma function computed by

By adjusting the shape parameter, the GGD model can span both Gaussian (γ=2) and Laplacian (γ=1) distributions. The features obtained from the statistical properties of real-valued coefficients in a single subband and can be summarized as:

Fig. 3Histograms of normalized real-valued shearlet coefficients for original image and five distortions 그림 3. 원영상과 5개의 왜곡의 정규화된 쉬어렛 실수 계수의 히스토그램

1.2 Joint distribution of coefficients across scales

A natural extension of subband’s univariate modeling is to consider the joint density of subband coefficients in different scales. In order to explore the statistical dependencies existing across scales, the histograms of the first subband of four scales are presented for (a) original image, (b) JP2K and (c) noise distortion types in Fig. 4. Comparing the histograms in four scales of the original image (Fig. 4 (a)), the peak and tail weight are increased rather monotonously from coarse to fine scale (Scale 1 to 4). However, distortion can alter the relation between statistical properties of subbands across scales. For instance, comparing the four histograms of JP2K distortion with original image (Figs. 4(a) and (b)), JP2K has higher peak rate increase from coarse-to-fine scale than original image especially from Scale 3 to 4. A significant amount of low frequency values introduced by JP2K distortion yields increase in peak and tail weight of the finer scales.

Fig. 4.Histograms of shearlet coefficients in first subband of four scales for (a) Original image (b) JP2K (c) Noise 그림 4. 4 스케일의 첫 서브밴드 쉬어렛 계수의 히스토그램. (a) 원영상, (b) JP2K, 및 (c) 노이즈

Since each distortion affects the across-scale dependencies of subbands in its own way, modeling the joint distribution of subbands across scales can effectively shows the changes. Here, a MGGD (Multivariate Generalized Gaussian Distribution) function is used to model the joint distribution of four subbands. MGGD with zero mean is given by

where

m is the probability space dimension, Σ is a dispersion matrix which is equal to covariance only for Gaussian distribution. Similar to univariate case, γ is the shape parameter while Laplacian distribution is obtained by γ=0.5 and γ=1 delivers a Gaussian distribution.

Various parameter estimation methods have been pro posed for MGGD. We used the fast and reliable method of moments given in [12]. Note that m is equal to the number of scales (m=4). The first subband of each scale is selected to make n 4-dimensional random vectors. Then, the method of moments estimates the MGGD model parameters (γ, Σ). The second feature set is referred to as fjds:

2. Complex-valued Shearlet features

The complex extension of shearlet transform is used to represent the visual appearance of an image by phase and amplitude information. The dependency between real and imaginary parts of shearlet coefficients can be well described in terms of phase and amplitude in polar coordinate. The distributions of phase and amplitude have consistent shape across original images but change significantly in presence of distortion.

2.1 Phase statistics

Extracting the features from phase statistics of images can be very useful in predicting image quality. Fig. 5 presents the histograms of phase values for original image as well as JPEG and noise distortion types in the finest scale. It can be observed that the phase histogram of the original image is characterized by a bimodal distribution. The noise distortion delivers a uniform histogram while JPEG distortion type presents a bimodal shape with higher peaks.

Fig. 5.Histograms of the phase values in the finest scale. (a) Original image, (b) JPEG, and (c) Noise 그림 5. 파인 스케일에서 페이즈 값의 히스토그램. (a) 원영상, (b) JPEG, 및 (c) 노이즈

In order to model the observed bimodal histogram of phase distribution, Maboudi et al.[13] proposed a phase model composed of two Von-Mises distributions and a uniform circular distribution. Furthermore, comparing the peak positions and concentration parameters, they observed that the two peaks always have π distance and the shape of the peaks are similar to each other. Inspired by their method and because the phase bimodal distribution is symmetrical around zero, a simple and fast method is used in which only half of the phase values between [0,π] is modeled using a unimodal Von-Mises distribution. The model of phase values located between [-π,0] is same as [0,π] but with the peak center shifted to the left by π. The unimodal Von-Mises equation is given by:

where I0(k) is the Bessel function of order 0, θ is the mean direction and k is a concentration parameter. When k is zero, the Von-Mises reduces to uniform distribution. The maximum likelihood estimates of θ and k are found for each distribution based on the method in[14]. Fig. 6 shows the histogram of the original image and the fitted Von-Mises function in range [0 π]. Using the mean and concentration parameters of Von-Mises model, a two-element feature vector is defined as

Fig. 6.Histogram of phase values and fitted Von-Mises model 그림 6. 페이즈 값의 히스토그램과 폰 마이스 모델

2.2 Energy variations across scales

The amplitude represents the local energy distribution and distortion can change the energy spectrum of an image. Fig. 7 shows the mean scalar energy values for all images of LIVE database in 24 subbands (6 subbands and 4 scales). It can be observed that the energy of subbands is increased from coarse-to-fine scale for noise while the energy is decreased for rest of the distortion types. Also, comparing subbands in any two scales, distortion can change the energy across scales while the monotonicity of energy variations within each scale is not affected. Since the energy of distorted images is changing across scales, the means of the logarithm of the amplitudes are computed for the subbands over four scales. Then, the energy differ ences between scales are captured as features. The energy of the ith subband in the jth scale is given by

As mentioned earlier, the first subbands of scales are considered. Therefore, a 3-dimensional feature vector can be obtained by

Fig. 7.Mean values of the logarithms of the shearlet amplitudes in all 24 subbands for original images and different distorted images of LIVE database. The vertical axis shows the mean values and horizontal axis denotes the number of subbands. The vertical dashed lines separate the scales 그림 7. 원영상 및 왜곡영상의 24 서브밴드에서의 로그 쉬어렛 크기의 평균값. 수직축은 평균값, 수직축은 서브밴드번호임

Fig. 8 shows a 3D scatter plot between three features: shape parameter γ (F(1)), mean direction of phase mode θ (F(2)) and the first feature of fas (F(3)) in Eq. (13). The features are normalized to [0 1] and they are represented for JPEG, noise and GBlur distortion types. As shown in the figure, the features occupy different regions for based on the distortion type in this parameter space.

Fig. 8.3D scatter plot of three extracted features across JPEG, noise and GBlur distortion types 그림 8. JPEG, 노이즈, GBlur 왜곡의 3D 스캐터 플롯

 

Ⅴ. Learning-based Quality Evaluation

In order to map the feature space to image quality scores, a two-stage framework is performed using a machine learning approach. In the first stage, a probabilistic classifier based on extracted features is trained to identify the type of distortion and the probability of distortion occurrence. Using Support Vector Machine (SVM), the probability that a distorted image belongs to each of five distortion classes is computed and a 5-dimensional classification probability vector p is obtained. In the second stage, the Support Vector Regression (SVR) is adopted to train a quality prediction model. From the regression model, a 5-dimensional quality estimation vector q is obtained for a distorted image classified in stage 1. The elements of the q vector denote the quality scores of the image along five distortion types. The final quality score Q is computed by:

The LIBSVM package[15] is used to implement the SVR and SVM both with Radial Basis Function (RBF) kernel in our method.

 

Ⅵ. Experimental Results

The performance of the ShearletIQM is compared with several FR and NR quality assessment methods. The LIVE image database was used for training ShearletIQM. This database contains 29 reference images subjected to five distortion types – JPEG, JP2K, GBlur, Gaussian white noise (GWN) and Fast Fading (FF) – yielding a total number of 779 distorted images. A Differential Mean Opinion Score (DMOS) is available for each image that is representative of the human subjective score. DMOS scores are in the range [0,100] in which lower values indicate higher quality. The performance could be varied according to test data since they are used for either NR or FR metrics.

The image database was iteratively partitioned in to train and test subsets to evaluate the performance of the proposed method. The train and test images were separated by content to ensure the validity of experiment. In each iteration, the training set contained 80% of the original images and the corresponding distorted images while the remaining 20% of images were used as test set. From training set, classification and regression models were obtained and the test images were evaluated using the constructed models. The train-test set partitioning was randomly repeated 1,000 times and the performance indices were obtained in each iteration. Finally, the median performance indices across 1,000 experiments were reported as the IQM performance.

The performance indices include Linear Correlation Coefficients (LCC) and Spearman Rank Order Correlation Coefficients (SROCC) between the objective IQM and subjective DMOS.

Three FR IQMs including PSNR, SSIM[16] and VIF[17] and five state-of-art NR IQMs namely, BIQI[4], BLIINDS II[5], DIIVINE[18], CurveletIQM[19]and BRISQUE[6] were used for comparison. Tables 1-4 compare the performance of the ShearletIQM with other methods in terms of median and standard deviation of LCC and SROCC values.

Table 1.Median LCC comparison across 1,000 train-test 표 1. 1,000 학습테스트의 평균 LCC 값

Table 2.Standard deviation of LCC values across 1,000 train-test 표 2. 1,000 학습테스트의 LCC 값의 표준편차

Table 3.Median SROCC comparison across 1,000 train-test 표 3. 1,000 학습테스트의 SROCC 값의 메디안

Table 4.Standard deviation of SROCC values across 1,000 train-test 표 4. 1,000 학습테스트의 SROCC 값의 표준편차

As presented in the Tables, the performance of the proposed method is statistically comparable with other FR and state-of-art NR methods. Comparing to FR IQMs, the ShearletIQM has higher LCC and SROCC values than PSNR and SSIM in all distortion types. Also, the overall performance of the VIF is the highest among all methods and the proposed IQM is competitive with VIF in five distortion types.

ShearletIQM delivers superior overall performance than BIQI, DIIVINE, BLIINDS II and CurveletIQM. Compared to DIIVINE and CurveletIQM that are based on wavelet and curvelet transforms, respectively, the ShearletIQM achieves higher performance in all distortion types except in GWN that shows slightly inferior performance. Also, the ShearletIQM is highly competitive with BRISQUE across different distortion types. For JPEG and GBlur distortion types, ShealetIQM outperforms all NR IQMs.

To compare the computational time, the original Matlab code of each NR IQM algorithm is executed on bikes with resolution of 768x512. The processing time of the methods is tabulated in Table 5. The time complexity of Shearlet-IQM is quite reasonable and the algorithm is faster than other methods except BRISQUE.

Table 5.Comparison between time complexity of different methods 표 5. 비교 방법의 시간복잡도 비교

 

Ⅶ. Conclusion

A novel NR image quality assessment method was proposed based on the feature extraction from statistical properties of natural images in shearlet domain. A complex version of shearlet transform is employed to capture the features from both real- and complex-valued shearlet coefficients. Experimental results show that the ShearletIQM outperforms several state-of-art NR IQMs such as BIQI, DIIVINE, BLIINDS II and CurveletIQM in various distortion types. Also, it has comparable results with BRISQUE.

References

  1. W. Lin and C. Kuo, ″Perceptual visual quality metrics: a survey″, J. Vis. Commun. Image Represent. 22(4), pp. 297–312, 2011. https://doi.org/10.1016/j.jvcir.2011.01.005
  2. M. Chen and A. Bovik, ″No-reference image blur assessment using multi-scale gradient″, EURASIP J. Image Vid. Process. 2011(1), pp. 1-11, 2011. https://doi.org/10.1155/2011/790598
  3. Z. Wang, H. Sheikh and A. Bovik, ″No-reference perceptual quality assessment of JPEG compressed images″, Proceedings of IEEE International Conference on Image Processing, vol. 1, pp. 477-480, 2002.
  4. A. Moorthy and A. Bovik, ″A two-step framework for constructing blind image quality indices″, IEEE Signal Process. Lett., 17(5), pp. 513-516, 2010. https://doi.org/10.1109/LSP.2010.2043888
  5. M. Saad and A. Bovik, ″Blind image quality assessment: a natural scene statistics approach in the DCT domain″, IEEE Trans. Image Process., 21(8), pp. 3339-3352, 2012. https://doi.org/10.1109/TIP.2012.2191563
  6. A. Mittal, A. Moorthy and A. Bovik, ″No-Reference image quality assessment in the spatial domain″, IEEE Trans. Image Process. 21(12), pp. 4695-4708, 2012. https://doi.org/10.1109/TIP.2012.2214050
  7. G. Kutyniok, W. Lim and X. Zhuang, ″Digital shearlet transforms″, Shearlet, Birkhauser, Boston, pp. 239-282, 2012.
  8. A. Smol and B. Schölkopf, ″A tutorial on support vector regression″, Stat. Comput. 14(3), pp. 199-222, 2004. https://doi.org/10.1023/B:STCO.0000035301.49549.88
  9. E. Candes, L. Demanet, D. Donoho and L. Ying, ″Fast discrete curvelet transforms″, Multiscale Model. Simul. 5(3), pp. 861-889, 2006. https://doi.org/10.1137/05064182X
  10. M. Do and M. Vetterli, ″The contourlet transform: an efficient directional multi-resolution image representation″, IEEE Trans. Image Process. 14(12), pp. 2091–2106, 2005. https://doi.org/10.1109/TIP.2005.859376
  11. H. Sheikh, Z. Wang, L. Cormack and A. Bovik, LIVE image quality assessment database release 2. http://live.ece.utexas.edu/research/quality.
  12. G. Verdoolaege and P. Scheunders, ″Geodesics on the manifold of multivariate generalized Gaussian distributions with an application to multi component texture discrimination,” Int. J. Comput. Vis. 95(3), pp. 265-286, 2011. https://doi.org/10.1007/s11263-011-0448-9
  13. H. Maboudi, H. Shimazaki, S. Amari and H. Soltanian-Zadeh, ″ Representation of higher-order statistical structures in natural scenes via spatial phase distributions″, Vis. Res., 2015.
  14. N. Fisher, Statistical analysis of circular data, Cambridge University Press, 1996.
  15. C. Chang and C. Lin, ″LIBSVM: A library for support vector machines″, ACM Trans. Intell. Syst. Technol., 2(3), pp. 1-27, 2011. https://doi.org/10.1145/1961189.1961199
  16. Z. Wang, A. Bovik, H. Sheikh and E. Simoncelli, ″Image quality assessment: From error visibility to structural similarity″, IEEE Trans. Image Process. 13(4), pp. 600-612, 2004. https://doi.org/10.1109/TIP.2003.819861
  17. H. Sheikh, A. Bovik and G. de Veciana, ″Image information and visual quality″, IEEE Trans. Image Process. 15(2), pp. 430-444, 2006. https://doi.org/10.1109/TIP.2005.859378
  18. A. Moorthy and A. Bovik, ″Blind image quality assessment: From natural scene statistics to perceptual quality″, IEEE Trans. Image Process. 20(12), pp. 3350–3364, 2011. https://doi.org/10.1109/TIP.2011.2147325
  19. L. Liu, H. Dong, H. Huang and A. Bovik, ″No-reference image quality assessment in curvelet domain″, Sig. Process. Image Comm. 24(4), pp. 494-505, 2014. https://doi.org/10.1016/j.image.2014.02.004