1. Introduction
Illegal copy and distribution of digital images over the Internet have become increasingly easily as the Internet and multimedia technologies develop, and therefore the demand for image authentication is growing. Perceptual image hashing is a solution to deal with these issues, in which the content of digital images can be identified by the comparison of their hash values [1]-[4]. Unlike cryptographic hashing where one bit of change in the input will lead to a totally different hash value [5], the perceptual image hashing is robust to resist content-preserving manipulations. Additionally, the perceptual image hashing generates different hash values for distinct images and provides the capability of discriminability.
Recently, some perceptual hashing schemes for color images are developed. In the literature [6], the histogram of the color vector angles in the inscribed circle is calculated and compressed by one-dimensional DCT to get a hash value composed of the first n AC coefficients. This method is resilient to rotation, but it disregards information outside the inscribed circle. A hashing approach concatenating the invariant moments of each component of the HSI and YCbCr color spaces to be the hash value is given in the literature [7]. In the literature [8] local color features are extracted by calculating the block means and variances from each component of the HSI and YCbCr spaces, but the hash value obtained by concatenating the Euclidian distances between the block features and a reference feature cannot resist rotation operations well. In the perceptual hashing schemes proposed in the literatures [9] and [10], the input color image is represented as a pure quaternion matrix by representing the RGB components as the three imaginary parts of the pure quaternions. In the literature [9], the non-overlapping blocks of the pure quaternion matrix are transformed by Quaternion Fourier Transform (QFT) and a hash value in binary is generated by comparing the block mean frequency energies with the global mean frequency energy. In the literature [10] intermediate features are extracted by applying Quaternion Singular Value Decomposition (QSVD) to the pseudo-randomly selected regions of the pure quaternion matrix, and then the QSVD is applied to these features to construct the final hash value. However, the robustness performance of these two quaternion-based hashing schemes needs to be enhanced.
Quaternion introduced by Haminlton [11] is a generalization of complex number which consists of one real part and three imaginary parts. The classical pure quaternion representation for color images presented by Sangwine et al. [12] offers a sound way to simultaneously deal with three color channels and is employed in color image analysis [13]-[15], registration [16], watermarking [17] and perceptual hashing [9][10]. However, since this pure quaternion representation is sensitive to image manipulations, it would degrade the robustness when it is used in color image perceptual hashing. The motivation of our research work is to improve the robustness and our contributions are two-folds. First, a new full quaternion representation for color images is proposed, in which each color pixel is represented as a full quaternion by setting the local luminance variance as the real part and the RGB values as the three imaginary parts respectively. Second, based on this new representation and Quaternion Discrete Cosine Transform (QDCT), a Full Quaternion Discrete Cosine Transform (FQDCT)-based hashing is proposed by applying QDCT to the pseudo-randomly selected regions of the novel full quaternion image. Two quaternion feature matrices are constructed by exploiting the QDCT coefficients and a hash value in binary is computed from these two feature matrices. In our experiments, the effect of the proposed full quaternion representation is verified and performance evaluation of the proposed FQDCT-based hashing is carried out.
The rest of this paper is organized as follows. Section 2 describes the proposed full quaternion representation for color images. Section 3 details the proposed FQDCT-based hashing. Experimental results and analysis are shown in Section 4 and the conclusions are given in Section 5.
2. The Proposed Full Quaternion Representation for Color Images
2.1 Basic Knowledge of Quaternions
A quaternion q is made up of four components in a hypercomplex form as:
where qr,qi,qj,qk ∈ R and i,j,k satisfy the following relations:
Eq. (2) reveals that the multiplication of quaternions is not commutative. A quaternion can also be considered as the sum of a scalar and a vector part:
where s(q) = qr and v(q) = qi⋅i + qj⋅j + ,qk⋅k . q can be defined as a pure quaternion if s(q) = 0, otherwise q is called a full quaternion.
2.2 The Proposed Full Quaternion Representation for Color Images
The classical pure quaternion representation for color images [12] represents each color pixel as a pure quaternion and can be described as:
where (x,y) is the pixel coordinate and fR(x,y), fG(x,y), fB(x,y) are the RGB values of the pixel, respectively.
Based on this representation, a color image can be represented intuitively as a two-dimensional pure quaternion matrix, which provides the ability to preserve the inter-relations among the color channels [9][10][13]-[17]. However, this pure quaternion representation is sensitive to the changes of the RGB components of the image, even though the visual content of the image is preserved. When it is used in color image perceptual hashing, the robustness performance will be weakened when features are extracted from this pure quaternion matrix. Moreover, setting the real parts of the quaternions to zero does not make full use of the nature of quaternions.
Local statistical characteristic values have the ability to describe image structures and are not sensitive to content-preserving manipulations, therefore in our proposed full quaternion representations for color images, local statistical characteristic values are introduced as the real parts to enhance the robustness. In this paper, local luminance variances are adopted due to their outstanding robustness and computational efficiency. Suppose that the size of the input image has been normalized to S×S. By dividing the luminance layer into non-overlapping L×L blocks, where S is an integral multiple of L, the local luminance variance V(blk) of each block blk can be calculated as follows:
where yl (1≤ l ≤ L2) is the luminance value of each pixel in the block and is gotten by:
For a pixel in the block blk, the real part of its full quaternion representation can be defined as:
Pixels in the same block will have the same real parts. Then, each color pixel can be represented as a full quaternion as:
Thus the full quaternion representation for color images is gotten and each color image can be represented as a full quaternion matrix using this novel representation method.
3. The Proposed FQDCT-based Hashing
In this section, a novel perceptual image hashing based on the proposed full quaternion representation and QDCT, termed as the FQDCT-based hashing, is proposed, and its block diagram is shown in Fig. 1.
Fig. 1.Block diagram of the FQDCT-based hashing
In the preprocessing, the input color image is first resized to S×S to ensure that the generated hash value is of fixed length and is robust against image scaling operations. Then the resized image is blurred by a Gaussian low-pass filter to remove insignificant noise without endangering the image content. After preprocessing, the color image is represented as a full quaternion matrix by using the full quaternion representation proposed in Section 2. Two new quaternion feature matrices are constructed by using QDCT and a hash value in binary is calculated from these two feature matrices. The feature matrices construction and the hash value computation modules are detailed below.
3.1 Feature Matrices Construction by Using QDCT
In this module, a secret key K is used to pseudo-randomly select N blocks of size M×M from the full quaternion matrix for security consideration [18]. Then QDCT [19] is applied to each selected block to obtain the features of the image. Similar to the traditional DCT, QDCT has a strong energy compaction property and the low-frequency QDCT coefficients contain most of the signal information. Due to the non-commutative multiplication property of quaternions, QDCT has both left-hand and right-hand forms, but the difference between using these two forms is very small. Therefore without loss of generality, the left-hand QDCT [19] is adopted in this paper.
Let Bn (1 ≤ n ≤ N) be the nth selected block, its left-hand QDCT matrix Qn can be calculated by:
where 0 ≤ p ≤ M - 1 and 0 ≤ s ≤ M - 1. uq can be any pure quaternion which satisfies , and in this paper it is given by:
α(p), α(s) and N(p,s,x,y) are defined as:
The coefficients in the first row and column of each Qn are used to construct two quaternion feature matrices because these coefficients don’t change much under content-preserving manipulations. Let Qn(r,c) denote the element in the (r + 1)th row and (c + 1)th column of Qn, where 0 ≤ r ≤ M - 1 and 0 ≤ c ≤ M - 1. For the first row of Qn, t coefficients (t ≤ M - 1) from the second to the (t + 1)th element are used to form a vector dn as:
For the N blocks a quaternion feature matrix D is available by taking dn as the nth column as follows:
Similarly, the second to the (t + 1)th element of the first column of Qn are exploited to generate a vector en as:
Accordingly, another quaternion feature matrix E is obtained as below:
3.2 Hash Value Computation
To enhance the robustness, data normalization is applied to the two quaternion feature matrices. Suppose z = [z1,z2,...,zN] is a row of matrix D or E. z is normalized to z' by using:
where zn and are the nth elements of z and z' while μ and σ are the mean and standard deviation of z, respectively.
Let D1 and E1 be the normalized versions of D and E, and be the nth column of D1 and E1. Then the L2 norm distance disn between are calculated for initial compression and thus the sequence dis = [dis1,dis2,...,disN] is available. A threshold τ is decided to quantize all the L2 norm distances to make a hash value h = [h1,h2,...,hN] as follows:
where disn and hn are the nth elements of dis and h. When the amount of zeros and the amount of ones in h are approximately the same, it will guarantee the highest information content of the extracted N-tuple [20], and therefore the median of dis is chosen to be the threshold τ. It should be noted that the length of the hash value is equal to the amount of the pseudo-randomly selected blocks.
4. Experimental Results and Analysis
4.1 Similarity Metric and Image Database
Some methods for similarity measurement have been studied, including the enhanced perceptual distance functions [21], the robust structured subspace learning (RSSL) algorithm [22], the neighborhood discriminant hashing (NDH) [23] and the normalized hamming distance (NHD) [24]. In this paper the NHD is adopted to measure the similarity between two images for its simplicity and low complexity. The NHD between two hash values h and h' is defined as:
where N denotes the length of the hash value, hn and are the nth elements of h and h', respectively. If the NHD between the hash values of two images is less than a given threshold η, the two images are judged as similar and otherwise distinct.
Images in the uncompressed color image database (UCID) [25] are used to evaluate the performance of the proposed hashing. These original images are of 512×384 or 384×512 pixels and some examples are displayed in Fig. 2. To test the robustness performance of the proposed hashing, for an original image, 100 similar versions are generated by using the 10 kinds of content-preserving manipulations listed in Table 1 referring to the StirMark BenchMark [26]. The illustrations of these manipulations are shown in Fig. 3.
Fig. 2.Examples of test images
Table 1.Content-preserving manipulations and parameters
Fig. 3.Illustrations of the content-preserving manipulations in Table 1. (a) Original, (b) Affine, (c) Cropping, (d) JPEG compression, (e) Median filter, (f) Noise, (g) Scaling, (h) Random distortion, (i) Rotation, (j) Rotation-cropping, (k) Rotation-scaling
4.2 Setting of Parameters
As mentioned in Section 3, the input color image is resized to 512×512 and blurred by a 3×3 Gaussian low-pass filter. Since the size of each block blk in Section 2 relates to the real parts fV(x,y) and will affect the performance of the color image hashing, in our experiment six sizes of the blk are tested: 2×2, 4×4, 8×8, 16×16, 32×32 and 64×64. The length of the hash value is equal to the amount of the pseudo-randomly selected blocks N, so if the amount is too small, it will easily cause collision and if the amount is too large, it will decrease the computation and matching efficiency. In such consideration, 150 overlapping blocks are pseudo-randomly selected from the full quaternion matrix by using a secret key and the sizes of the blocks Bn are set to be 64 x 64 to adequately cover the whole matrix. For each selected block Bn, the 2nd to 33rd element in the first row and column of its QDCT matrix Qn are used to construct the two feature matrices D and E, respectively. Receiver operating characteristic (ROC) curves [27] are used to compare the performance of the FQDCT-based hashing when the size of the blk varies.
ROC curves can be attained by setting various thresholds η and computing the true positive rate (TPR) and false positive rate (FPR) for each threshold. Actually, higher TPR and lower FPR indicate better robustness and discriminability respectively and they can be calculated as:
where n1 corresponds to the amount of pairs of visually similar images correctly judged as similar, n2 is the amount of pairs of distinct images falsely judged as similar, while N1 and N2 denote the total number of pairs of visually similar and distinct images, respectively.
The experiment is carried on an image set including 10100 images consisting of the first 100 original images in the UCID and their manipulated versions according to Table 1. The ROC cueves correspond to the 6 sizes of the blk are shown in Fig. 4, from which we can see that the proposed FQDCT-based hashing achieves the best performance when the size of the blk is set to be 32×32. For performance and computational efficiency consideration, the size of the blk in the FQDCT-based hashing is decided to be 32×32.
Fig. 4.Performace comparison for the FQDCT-based hashing when the size of the blk varies
4.3 Test on the Proposed Full Quaternion Representation for Color Images
To verify the robustness improvement brought by our proposed full quaternion representation, a new FQFT-based hashing is developed by combining this new representation with the QFT-based hashing [9]. In the FQFT-based hashing, after being resized to 128×128 as in the literature [9], the input color image is represented as a full quaternion matrix using our proposed full quaternion representation, and the size of the blk for calculating the real parts is set to be 4×4.
Four well-known images, Sailboat, Baboon, Lena and Peppers are used to illustrate the robustness performance of the QFT-based and the FQFT-based hashing. For each of the four images, 100 similar versions are generated by using the manipulations listed in Table 1 and respectively correspond to the 100 image indices given in Table 1. The 100 intra-distances, i.e. the 100 NHDs between the original image and its 100 similar versions, are calculated for each of the four images by using the QFT-based hashing and the FQFT-based hashing respectively. Fig. 5 shows the comparison results, where the x-axises denote the image indices of the similar versions. For the Sailboat image, 94% of the intra-distances calculted by using the FQFT-based hashing are smaller than those calculated by using the QFT-based hashing, and the numbers for the Baboon, Lena, and Peppers images are 78%, 70%, 87%, respectively. The results show that most of the intra-distances calculated by using the FQFT-based hashing are smaller than those calculated by using the QFT-based hashing, which indicates that the FQFT-based hashing can achieve better robustness performance. The robustness improvement is attributed to the proposed full quaternion representation for color images. Since the local luminance variances are approximately invariant if the content of the image is not attacked, taking them as the real parts improves the robustness in resisting content-preserving manipulations when features are extracted from the full quaternion matrix.
Fig. 5.Intra-distances calculated by using the QFT-based and the FQFT-based hashing (a) Sailboat, (b) Baboon, (c) Lena, (d) Peppers
4.4 Intra-distances and Inter-distances Analysis
The distributions of the intra-distances and inter-distances are calculated to illustrate the robustness and discriminability performance. The first 100 original images in the UCID and their manipulated versions are used to calculate the distribution of the intra-distances. For each original image and its manipulated versions there will be =5050 intra-distances and 505000 intra-distances can be gotten in total. The first 1000 original images in the UCID are used to calculate the distribution of the inter-distances and =499500 inter-distances are gotten.
Fig. 6 shows the distributions of the intra-distances and inter-distances for the QFT-based hashing, the FQFT-based hashing and the FQDCT-based hashing, respectively. In a dual distribution, the overlap between the two distributions determines the error rate [28], thus lower overlap indicates better robustness and discriminability performance. It can be observed that the FQFT-based hashing yields distributions that have lower overlap than the QFT-based hashing, which manifests the performance improvement by using the proposed full quaternion representation. At the same time, the proposed FQDCT-based hashing yields distributions that are better separated than the FQFT-based hashing, which reveals that the feature construction by using QDCT and the hash computation process are efficient in generating robust hash values.
Fig. 6.Distributions of intra-distances and inter-distances for different hashing methods (a) the QFT-based hashing, (b) the FQFT-based hashing, (c) the FQDCT-based hashing
4.5 Performance Comparison by Using ROC Curves
To evaluate the performance of the proposed FQDCT-based hashing in resisting different kinds of manipulations, the ROC curves for each kind of the manipulations listed in Table 1 are calculated and shown in Fig. 7, where the proposed FQDCT-based hashing is compared with the QFT-based hashing [9], the FQFT-based hashing and the QSVD-based hashing [10]. These experiments are carried on the image set consisting of the first 100 original images in the UCID and their manipulated versions. It can be observed that for all the manipulations, the proposed FQDCT-based hashing attains a higher TPR for a given FPR, and for a given TPR it attains a lower FPR. Hence, the proposed FQDCT-based hashing outperforms the other hashing methods in terms of robustness and discrimination.
Fig. 7.Performance comparison for different manipulations (a) Affine, (b) Cropping, (c) JPEG compression, (d) Median filter, (e) Noise, (f) Scaling, (g) Random distortion, (h) Rotation, (i) Rotation-cropping, (j) Rotation-scaling
The area under the curve (AUC) of the ROC curve can be used to measure the performance of the hashing schemes, and the bigger AUC stands for better performance. The AUC values of each manipulation for the four hashing schemes are calculated and presented in Table 2. The comparison results show that the FQFT-based hashing achieves better performance than the QFT-based hashing, which further proves the robustness and discriminability improvement brought by the proposed full quaternion representation. Moreover, it can be seen that the FQDCT-based hashing performs very well for all the given manipulations. The proposed FQDCT-based hashing is robust due to three main reasons: first, the proposed full quaternion representation of the input color image provides invariant property when the image goes through content-preserving manipulations; second, this invariant property is preserved in the QDCT coefficients selected to construct the two feature matrices; and finally the hash computation compresses the feature matrices in an efficient method to generate similar hash values for similar images and distinct hash values for distinct images.
Table 2.AUC values of each manipulation for different hashing schemes
5. Conclusions
In this paper, a full quaternion representation for color images is proposed. Rather than representing each color pixel as a pure quaternion, we represent each color pixel as a full quaternion by setting the local luminance variance as the real part and the RGB values as the three imaginary parts, which improves the robustness in color image perceptual hashing schemes. Then a novel FQDCT-based hashing scheme combining the proposed full quaternion representation and QDCT is proposed, where two new quaternion feature matrices are constructed by exploiting the QDCT coefficients and the hash value in binary is calculated from the two feature matrices. Our experiment results and analysis indicate that the proposed full quaternion representation can improve the robustness in color image perceptual hashing and the proposed FQDCT-based hashing has superior performance in terms of robustness and discrimination compared with existing notable quaternion-based color image perceptual hashing.
References
- M. Schneider and S.-F. Chang, "A robust content based digital signature for image authentication," in Proc. of IEEE Int. Conf. Image Processing, vol. 3, pp. 227-230, 1996. Article (CrossRef Link)
- C.-S. Lu, C.-Y. Hsu, S.-W. Sun, and P.-C. Chang, "Robust mesh-based hashing for copy detection and tracing of images," in Proc. of IEEE Int. Conf. Multimedia and Expo, vol. 1, pp. 731-734, 2004. Article (CrossRef Link)
- Z. Xu, H. Ling, F. Zou, Z. Lu, and P. Li, “Robust image copy detection using multi-resolution histogram,” in Proc. of ACM Int. Conf. Multimedia Information Retrieval, pp. 129–136, 2010. Article (CrossRef Link)
- Y. Zhao, S. Wang, X. Zhang, and H. Yao, “Robust hashing for image authentication using Zernike moments and local features,” IEEE Trans. Inf. Forensics and Security, vol. 8, no. 1, pp. 55–63, Jan. 2013. Article (CrossRef Link) https://doi.org/10.1109/TIFS.2012.2223680
- A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone, “Handbook of applied cryptography,” CRC press, 2010.
- Z. Tang, Y. Dai, X. Zhang, and S. Zhang, "Perceptual image hashing with histogram of color vector angles," in Proc. of the 8th Int. Conf. Active Media Technology, Lecture Notes in Computer Science, vol. 7669, pp. 237-246, 2012. Article (CrossRef Link)
- Z. Tang, Y. Dai, and X. Zhang, “Perceptual hashing for color images using invariant moments,” Appl. Math. Inf. Sci., vol. 6, no. 2S, pp. 643S-650S, 2012.
- Z. Tang, X. Zhang, X. Dai, J. Yang, and T. Wu, “Robust image hash function using local color features,” Int. J. Electron. and Comm., vol. 67, no. 8, pp.717-722, 2013. Article (CrossRef Link) https://doi.org/10.1016/j.aeue.2013.02.009
- I. H. Laradji, L. Ghouti, and E. H. Khiari, "Perceptual hashing of color images using hypercomplex representations," in Proc. of IEEE Int. Conf. Image Processing, vol. 4, pp. 4402-4406, 2013. Article (CrossRef Link)
- L. Ghouti, “Robust perceptual color image hashing using quaternion singular value decomposition," in Proc. of IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 3794-3798, 2014. Article (CrossRef Link)
- W. R. Hamilton, Elements of Quaternions, London, U.K: Longmans, Green, 1866.
- S. J. Sangwine, “Fourier transforms of colour images using quaternion, or hypercomplex, numbers,” Electron. Lett. vol. 32, no. 21, pp. 979-1980, Oct. 1996. Article (CrossRef Link) https://doi.org/10.1049/el:19961331
- S. C. Pei, J. J. Ding, and J. H. Chang, “Efficient implementation of quaternion Fourier transform, convolution, and correlation by 2-D complex FFT,” IEEE Trans. Signal Processing, vol. 49, no. 11, pp. 2844-2852, Nov. 2001. Article (CrossRef Link) https://doi.org/10.1109/78.960432
- S. C. Pei, J. H. Chang, and J. J. Ding, "Quaternion matrix singular value decomposition and its applications for color image processing," in Proc. of IEEE Int. Conf. Image Processing, vol. 1, pp. 805-808, 2003. Article (CrossRef Linnk)
- T. A. Ell and S. J. Sangwine, “Hypercomplex Fourier transforms of color images,” IEEE Trans. Image Processing, vol. 16, pp. 22–35, Jan. 2007. Article (CrossRef Link) https://doi.org/10.1109/TIP.2006.884955
- Q. Wang and Z. Wang, “Color image registration based on quaternion Fourier transformation,” Optical Engineering, vol. 51, no. 5, pp.1-8, 2012. Article (CrossRef Link)
- T. K. Tsui, X. P. Zhang, and D. Androutsos, “Color image watermarking using multidimensional Fourier transforms,” IEEE Trans. Inf. Forensics and Security, vol. 3, no. 1, pp. 16–28, Mar. 2008. Article (CrossRef Link) https://doi.org/10.1109/TIFS.2007.916275
- V. Monga and B. L. Evans, “Perceptual image hashing via feature points: Performance evaluation and tradeoffs,” IEEE Trans. Image Processing, vol. 15, no. 11, pp. 3453-3466, Nov. 2006. Article (CrossRef Link) https://doi.org/10.1109/TIP.2006.881948
- W. Feng and B. Hu, "Quaternion discrete cosine transform and its application in color template matching," in Proc. of IEEE Int. Cong. Image and Signal Processing, vol. 2, pp. 252-256, 2008. Article (CrossRef Link)
- J. Fridrich and M. Goljan, “Robust hash functions for digital watermarking,” in Proc. of IEEE Int. Conf. Information Technology: Coding Computing, pp. 178-183, 2000. Article (CrossRef Link)
- A. Qamra, Y. Meng and E. Y. Chang, “Enhanced perceptual distance functions and indexing for image replica recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 379-391, Mar. 2005. Article (CrossRef Link) https://doi.org/10.1109/TPAMI.2005.54
- Z. Li, J. Liu, and J. Tang, “Robust structured subspace learning for data representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 10, pp. 2085-2098, Oct. 2015. Article (CrossRef Link) https://doi.org/10.1109/TPAMI.2015.2400461
- J. Tang, Z. Li, M. Wang, and R. Zhao, “Neighborhood discriminant hashing for large-scale image retrival,” IEEE Trans. Image Processing, vol. 24, no. 9, pp. 2827-2840, Sep. 2015 Article (CrossRef Link) https://doi.org/10.1109/TIP.2015.2421443
- A. Swaminathan, Y. Mao, and M. Wu, “Robust and secure image hashing,” IEEE Trans. Inf. Forensics and Security, vol. 1, no. 2, pp. 215–230, Jun. 2006. Article (CrossRef Link) https://doi.org/10.1109/TIFS.2006.873601
- G. Schaefer and M. Stich, "UCID: An uncompressed color image database," in Proc. of SPIE. Storage and Retrieval Methods and Applications for Multimedia, vol. 5307, pp. 472-480, 2004. Article (CrossRef Link)
- M. Steinebach, F. A. P. Petitcolas, F. Raynal, J. Dittmann, C. Fontaine, S. Seibel, N. Fates, and L. C. Ferri, “StirMark Benchmark: audio watermarking attacks,” in Proc. of IEEE Int. Conf. Information Technology: Coding and Computing, pp. 49-54, 2001. Article (CrossRef Link)
- T. Fawcett, “An introduction to roc analysis,” Pattern Recognit. Lett., vol. 27, no. 8, pp. 861–874, Jun. 2006. Article (CrossRef Link) https://doi.org/10.1016/j.patrec.2005.10.010
- J. Daugman, “The importance of being random: statistical principles of iris recognition,” Pattern Recognition, vol. 36, no. 2, pp. 279–291, 2003. Article (CrossRef Link) https://doi.org/10.1016/S0031-3203(02)00030-4