DOI QR코드

DOI QR Code

Convolutional Neural Network with Expert Knowledge for Hyperspectral Remote Sensing Imagery Classification

  • Wu, Chunming (Institute of Geological Survey, China University of Geosciences) ;
  • Wang, Meng (Hubei Key Laboratory of Intelligent Geo-Information Processing School of Computer Science, China University of Geosciences) ;
  • Gao, Lang (Hubei Key Laboratory of Intelligent Geo-Information Processing School of Computer Science, China University of Geosciences) ;
  • Song, Weijing (Hubei Key Laboratory of Intelligent Geo-Information Processing School of Computer Science, China University of Geosciences) ;
  • Tian, Tian (Hubei Key Laboratory of Intelligent Geo-Information Processing School of Computer Science, China University of Geosciences) ;
  • Choo, Kim-Kwang Raymond (Department of Information Systems and Cyber Security The University of Texas at San Antonio)
  • Received : 2018.10.14
  • Accepted : 2019.02.12
  • Published : 2019.08.31

Abstract

The recent interest in artificial intelligence and machine learning has partly contributed to an interest in the use of such approaches for hyperspectral remote sensing (HRS) imagery classification, as evidenced by the increasing number of deep framework with deep convolutional neural networks (CNN) structures proposed in the literature. In these approaches, the assumption of obtaining high quality deep features by using CNN is not always easy and efficient because of the complex data distribution and the limited sample size. In this paper, conventional handcrafted learning-based multi features based on expert knowledge are introduced as the input of a special designed CNN to improve the pixel description and classification performance of HRS imagery. The introduction of these handcrafted features can reduce the complexity of the original HRS data and reduce the sample requirements by eliminating redundant information and improving the starting point of deep feature training. It also provides some concise and effective features that are not readily available from direct training with CNN. Evaluations using three public HRS datasets demonstrate the utility of our proposed method in HRS classification.

Keywords

1. Introduction

With constant advances in remote sensing platform and data processing technologies, spectral resolution and spatial resolution of remote sensing images have been significantly enhanced in recent times. Such hyperspectral remote sensing (HRS) images have been used in a broad range of applications ranging from disaster detection [1], urban planning [2, 3] to environmental monitoring [4, 5, 6], and several other fields [30, 31]. However, the availability of such remote sensing information poses new challenges for researchers due to complex datadistribution, etc. In addition,effective feature extraction is crucial to the classification of HRSimages.

In recent years, the performance of classification for imagery has rapidly increased due to the use of deep convolutional neural networks (CNN). Inspired by the human visual system, deep CNN is able to extract high-level abstract features of imagery by establishing a number of convolutional and pooling layers. Based on the benefits afforded by CNN in extracting deepfeatures and its promising results on natural image interpretation, it has been applied to remotesensing classification and recognition (e.g. see the technical tutorial on deep learning forremote sensing data in [11]). Hu et. al. [21], for example, proposed a deep CNN for HRSclassification based on the spectral domain. As the input of CNN, every HRS pixel sample with hundred spectral bands can be regarded as a 2-D image whose height is equal to 1. In [22], a 3-D CNN-based feature extraction model was proposed to extract the high-level features of HRS imagery, and the combined regularization strategy based on sparse constraint was thendeveloped to handle the high dimensionality of the input data.

Due to the distribution of the HRS imagery, it is inefficient to use the CNN method directly. HRS images composed of hundreds of spectral bands and provide rich spectral information. However, such high dimensional data is challenging to work with. On one hand, the data ’shigh dimension can easily lead to the Hughes phenomenon. On the other hand, the redundantstructure of HRS images presents challenges to extract high-level features by using the CNN. It not only improves the complexity of the neural network structure, but also reduces the efficiency and quality of the extracted features. Thus, a number of approaches to reducedimensionality have been introduced, say to preprocess the HRS image before using the CNN to extract the deep features. In [25] for example, the fractional order Darwinian particle s warmoptimization was used to select the most informative bands suitable for the designed CNN network. These selected bands are then sent to the classification system to produce the classification map. In [24], the deep spatial features extracted automatically by CNN are integrated with the spectral features based on a balanced local discriminant embedding algorithm to reduce the dimension and classify the HRS data. Atharva et. al [23] proposed apatch-based CNN network for medium-resolution remote sensing image classification. Agated CNN for semantic segmentation for high-resolution remote sensing images was presented in [26]. Other advances include the use of CNN to achieve improvements in image classification [33, 32], segmentation [35], and recognition [34].

However, most existing methods use the prior knowledge of the spectral information [12], and assume that the constructed CNN can extract the efficient deep spectral-spatial features. This assumption is not always satisfied on the classification of HRS images because of the complex distribution characteristics of HRS images [8], especially for those with high spatial resolution. It is easy to know that the more complex distribution of the data, the greater the number of CNN parameters will be used to extract features [37]. However, it is difficult to obtain sufficient samples in remote sensing area to train a robust classifier based on CNN of complex structures.

Based on the above limitations, we introduce traditional multi handcrafted features, including spectral and spatial expert knowledge information, into the model, as input of the deep CNN in order to further improve the classification accuracy of HRS images. The hand crafted features that can effectively represent raw data while removing unnecessary redundant information [9], reduce the complexity of data distribution and the demand for CNN parameter scale. In other words, we can effectively fit the feature distribution and obtain a promising classification result of the original data by building a simpler network structure. Introducing handcrafted features into our framework also allows us to improve the training starting point of deep features obtained by CNN, and then to achieve the goal of improving thenetwork training efficiency and the quality of the high-level features. Thus, we are able to obtain good classification results using small sample sizes. In order to mitigate the effects of over-fitting problems caused by small samples, L2 regularization and dropout strategies are used during the training stage of the CNN network.

The remainder of this paper is organized as follows. The proposed methodology is discussed in Section 2. The experimental results and analysis are discussed in Section 3. The conclusion is summarized in Section 4.

2. Expert Knowledge-based CNN Deep Approach

Deep convolutional neural network has raised the expectation to improve the interpretation performance of HRS image because of its potential for high-level feature extraction. However, it is still a challenge to use CNN directly to effectively improve the classification and recognition performance due to the complex distribution characteristics and the limited samples of HRS images, as discussed earlier. In order to reduce the impact of these two factors, some priori information of HRS images based on expert knowledge (EK) was introduced as the input of a specific CNN in the proposed framework (more details shows in Section 2.1).

EK information used in this pape is obtained by some traditional handcrafted feature extraction methods that have been proposed based on the properties of remote sensing images in the past few decades. These EK information is generally obtained in two different ways: one is to remove the redundant information or noise of the HRS image; another way is to extractsome main attribute information of the HRS image. Through the above two ways, the information based on EK can represent ground objects more succinctly and efficiently and contain smaller amounts of data than the original image. The EK information is easier to acquire and more refined than features obtained by training a CNN networks at bottom layers. It means that the training starting point for deep features can be effectively improved by using the EK information as the CNN input. The introduction of the EK information selection will be shown in Section 2.2.

On the premise of obtaining approximate classification results, the effective function parameters needed to characterize the distribution of EK information are far less than thoseneeded to characterize the original HRS distribution. Generally speaking, on the premise of effectively characterizing data distribution, the fewer parameters of the function, the smallerdemand for the sample size.

In this article, the demand for the sample size is controlled by designing a appropriate deepCNN that designed to classify HRS images for EK information rather than the original HRSimagery. More details about the CNN shows in Section 2.3.

2.1 The Proposed Framework

In the basic common framework with CNN for HRS classification, the band information is usually used as the input of CNN to extract the deep features and obtain the classification result. Such framework as shown in Fig. 1, effectively improves the classification performance of HRS images. To illustrate, the overall accuracy of Indian Pines data setclassification is over 97% with 1765 training samples and 6223 testing samples in [22].

Fig. 1. Common framework with CNN for HRS classification

Based on the above framework, this article introduces the traditional handcrafted HRS features and designs a corresponding CNN structure to handle the HRS classification, as shown in Fig. 2. Firstly we extract the spectral features of the HRS images by using the PCAmethod, which preserves the main spectral features of the data by removing the redundantinformation between the bands without changing the shape distribution of the ground objects. After that typical spatial features of the ground objects are extracted as the input of convolution neural network. Finally we use the end-to-end approach to train the CNN and judge the category of pixel in the HRS images with the Softmax classifier. These EKinformation including typical spatial features can be efficiently and easily obtained through the use of Image Processing Toolbox in MATLAB. Experiments show that the classificationaccuracy of Indian Pines data set can exceed 95% with 510 training samples and 9856 testing samples at the optimal number of iterations. And we did not deal with the sample size ofunbalanced classes for Indian Pines data set by using percentage methods.

Fig. 2. Pipeline of the proposed deep feature extraction of HRS images

2.2 Expert Knowledge Selection

Spectrum and space information is the most concerned and widely used characteristics in pixel-wise HRS classification [16]. In this paper, three main feature extraction techniques are used to obtain EK information: principal component analysis(PCA), gray level co-occurrencematrix (GLCM) and the differential morphological profiles (DMP). The feature parametervalues are shown in Table 1.

Table 1. The introduction of the fature parameters

As we all know, HRS images like PaviaU dataset contains rich spectral information with hundreds of bands, which may easily lead to curse of dimensionality for the small ratiobetween the number of the samples and the number of the features. In our framework, the principal component analysis (PCA) is introduced to extract spectral features firstly for its simple mathematical principles and effective performance. PCA can effectively reduce the data dimension and eliminate the potential threat of dimensionality curse by eliminating theredundant information between HRS bands. The first three principal components for eachimage are used to present the spectral feature because the first three principal components contain more than 99% of the original image information in our experiments. The feature image acquired based on PCA will be served as a basic image for subsequent processing in the framework.

GLCM CNN-based deep extraction (GLCM-CDE) method, which use GLCM features as the input of the CNN, is proposed for two reasons. For the first one, the GLCM is typical method of texture feature extraction for classification of remote sensing images. From astatistical point of view, the texture information extracted by GLCM can reflect the regularity of distribution of ground objects in the spatial scene, which can not be directly obtained from CNN. HRS image spatial arrangement characteristics obtained by GLCM as the input of the CNN, is not only conducive to obtain the abstractive semantic features, but also enhance the interpretability of the deep features due to the statistical theory. GLCM-based features that mainly preserve texture information greatly reduce the complexity of the original datadistribution, which helps to reduce the deep CNN structure complexity and the sample sizelimitation. The second reason is to consider the effect of patch size of the CNN input data on the classification accuracy and the deep features characterization ability of the deep CNN. The GLCM texture feature is calculated based on local area information and the size of the localarea has an important influence on the characterization ability of the feature. We can assume astrong correlation between window size in GLCM that closely related to the local features of the image and patch size of the CNN. Therefore the GLCM-based feature extraction method can be used to explore the contribution of the patch size for the deep feature and the HRSimages classification performance. This deep feature extraction method that uses GLCM features as input to the CNN is called GLCM-CDE in this paper.

Morphological profiles is constructed by a series of morphological operations with a family of structuring elements of increasing size [10]. These structure-specific operators that getting such geometric features is not an easy task by training a CNN networks at the bottomlayer. Through the operation of these morphological operator, MP feature not only simplifies the image data, preserves the basic shape information of the image, but also can removeirrelevant structures and remove noise effectively. In order to further enhance the heterogeneity among different objects in the scene, differential morphological profiles based on the morphological gradient theory is used as the input of the CNN in this paper, which has a great impact on the boundary extraction and image segmentation. The introduction of the DMP feature, that mainly retaining the boundary information and fully eliminate unnecessaryinformation, into our deep model as the input of CNN can help to reduce the demand for the model complexity while extracting an effective high-level feature. DMP-based deep geometry features obtained by the proposed CNN further enhance the differences between different ground objects and improve the interpretability of deep features. In this article, the differential morphological profiles-based CNN deep extraction method is referred to as DMP-CDE.

According to the mathematical principle of different features, the features based on GLCMexcavated the arrangement rules of ground objects and the DMP features can measure the distance between different ground objects. That means, the features based on the GLCM and the DMP respectively reflect the different attributes of HRS images. It is natural to fuse the DMP feature and the GLCM feature together for the HRS classification because the effective fusion of different features can enhance the difference between different ground objects [ 20]. The vector stacking method is used to fuse these two spatial features together. Although the amount of the feature data as the input of CNN is slightly increased, the redundancy betweendata can be effectively controlled because of the difference of attributes between the GLCM-based texture feature and DMP-based shape feature, so that we can further improvethe differences between different objects and get an promising classification result on the proposed CNN network. The deep feature extraction method that takes the fused data as CNN input is called multi selected feature-based CNN deep extraction method (MS-CDE), which further enhance the CNN network representation and train a more generalized and robust CNN without increasing the cost of the framework. Based on the direct fusion of GLCM-CDEfeature and DMP-CDE feature is not discussed here for three reasons: the first is a challenge for the classifier due to the amount of fusion data, the second is to change the complexity and uniformity of the proposed framework and the third is that the effect of the fusion of deepfeatures at different levels on the classification result is uncertain.

2.3 Convolutional Neural Network for EK Information

Convolution neural network is a common network structure in deep learning theory. It consists of multiple convolutional layers and pooling layers, and the input data will be represented by learning the corresponding weights and offsets. Compared with Deep Belief Network (DBN), Deep Auto-Encoder (DAE) and other neural network structures [36], the use of sparseconnectivity and weight sharing mechanism within the CNN effectively control the size of network parameters. Deep CNN network has attracted the attention of researchers in the field of remote sensing due to its promising performance in feature extraction and image classification. In this paper, the CNN is used to handle the pixel-wise HRS classification for EK information. The proposed CNN structure is shown in Fig. 3. This neural network containing only five convolutional layers, a pooling layer and a fully_connected layer forclassification has a simpler structure than the classical networks such as AlexNet and VGGNet because of the greatly simplified CNN input data distribution characteristics in our framework. Table 2 shows the difference of the network structures among these three deep neural networks. From the table we can see that the proposed network has a simpler structure and lessparameters. In addition, the ReLU function [15, 17] is used in each convolutional layer as the activation function to extract multilayer abstract features. The max pooling strategy is used to extract the significant characteristics of the data while reducing the computational cost. Weimprove the abstraction level of features by reducing the size of the convolution kernel layer by layer. Instead of the full connected layer in the traditional CNN structure, convolutionallayer with a 1×1 convolution kernel [18] is employed to reduce the number of training parameters. L2 regularization and Dropout strategies [19] are utilized during training processin order to improve the generalization of the CNN network. L2 regularization is able tominimize the cost function in the training stage by making the sum of the squares of the parameter small. Dropout strategy reduces overfitting in neural networks by preventing complex co-adaptations on training dataset. End-to-end training patterns are used to learn weights and offset values to extract the abstract features of HRS images and improve the classification performance.

Fig. 3. The structure of the CNN employed in our experiments

Table 2. The comparison of the network structures among the proposed CNN (ProCNN), AlexNet and VGGNet

 

3. Experiments and Results

3.1 Data Description and Evaluation Criteria

In our work, three HRS images are used to evaluate the proposed method: Pavia University (Pavia U) dataset (Fig.4(a)), Salinas dataset (Fig. 4(b)) and Indian Pine dataset (Fig. 4(c)). The first image is an urban scene and the other two are agricultural scenes, each with different spectral and spatial information. In our experiments 5% of the samples was selected for training and the cross-validation method was used to adjust the parameters of the classifier during the training process. The remaining 95% of the samples were used to test the performance of the algorithm.

The Pavia University scene was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight campaign over Pavia, nothern Italy in 2003. Theurban scene with 1.3 m/pixel spatial resolution contains 610 × 340 pixels and 115 spectral bands. 103 available spectral bands were used in the experiments. Fig. 4(a) shows the falsecolor image of PaviaU dataset and the corresponding ground truth map with 9 classes respectively. Table 3 provides the numbers of training and testing samples with different classes.

The Salinas scene was collected by the Airborne Visible/Infrared Imaging Spectrometer(AVIRIS) sensor over Salinas Valley, California, in 1998. This agricultural area covered comprises 512 lines by 217 samples with 3.7-meter pixels spatial resolution. 204 availablespectral bands of the Salinas dataset were used in the experiments. There are 16 kinds of objects in the ground truth map (Fig. 4(b)) and the number of training and testing samples witheach class is provided in Table 4.

The Indian Pine dataset was gathered by AVIRIS sensor over the Indian Pines test site in North-western Indiana in 1992. This agricultural scene is widely used to measure the performance of HRS image classification algorithm for the unbalanced number of availablelabeled pixels of each class and the presence of mixed pixels. It consists of 145 × 145 pixels and 224 available spectral bands with 20 m/pixel spatial resolution. The number of bands is reduced to 200 by removing bands for the noise and water absorption. The raw dataset and the corresponding ground truth map with 16 classes are showed in Fig. 4(c). Table 5 provides the numbers of training and testing samples with different classes.

The evaluation criteria for these classification algorithms in this work are overall classification accuracy (OA), per-class average accuracy (AA) and Kappa coefficient (Kappa).

Fig. 4. HRS data sets employed in our experiments.a, b and c are images for classification evaluation; d,e and f are the ground truth for these images left to them respectively

Table 3. Training and testing pixels for Pavia University dataset

Table 4. Training and testing pixels for Salinas dataset

Table 5. Training and testing pixels for Indian Pine dataset

3.2 The Contribution of the Expert Knowledge to HRS Classification
Performance

In order to measure the contribution of handcrafted features, we compared the classificationaccuracy between the original data and the handcrafted features with the structure of ourproposed CNN network. The data sets for all experiments were stored in h5 format.

Table 6 shows the total size of the training set and testing set as a network input as well as the classification results based on the proposed CNN. As can be seen from the Table 6, the amount of the training set and testing set as the input of CNN based on the GLCM feature or the DMP feature is far less than the original image because handcrafted features are able torepresent the original images and eliminate redundant information and undesired noises. Although the amount of the training set and testing set based on the MS feature is slightly higher than the original image on PaviaU dataset, the redundancy and the complexity of datadistribution can be effectively controlled because of the difference of attributes between the GLCM-based texture feature and DMP-based shape feature. All these features improve the starting point of the deep features training. The classification results based on hand crafted features with the proposed CNN are much higher than those based on the original images in Table 6. The optimal classification results are obtained on the each dataset based on the MS-CDE feature.

Table 6. The performance comparison between the raw images and the hand crafted features based on the CNN

There are some reasons for the poor results based on the original HRS images. Because of the large amount and complex structure of the original HRS images as well as the simplerstructure of the network we proposed, it is difficult to obtain satisfactory classification results on the original data. From the Table 6 we can also see that the classification accuracy on the Indian Pine dataset are worse than on other two datasets because of the more complex data attributes (more details in section 3.1).

3.3 Effects of Different Network Structures on Classification Results

In order to evaluate the contribution of the proposed CNN structure, we compare the classification results obtained by different deep convolution neural network: Alexnet, VGGNet and the proposed CNN in Section 2.3. And the difference of network structures between these three deep CNN is shown in Table 2. The raw data and three differenthand crafted features including GLCM, DMP and MS are used as the input of each network, respectively.

A summary of findings is presented in Table 7. As can be seen from the Table, the deepfeatures with the proposed CNN achieves higher classification performance than other neural networks for each dataset. The proposed CNN and the VGGNet have similar ability todistinguish between ground objects for PaviaU and Salinas dataset. A large number of small convolution kernels are used in the VGGNet structure, thus making the VGG network have better feature extraction capabilities and achieving good results on these datasets with high spatial resolution. However, for the Indian Pine dataset with more complex distributioncharacteristics, VGGNet is difficult to achieve a satisfactory result. The overall classification result for the Indian Pine dataset has made a more significant improvement compared to AlexNet and VGGNet networks. The kappa coefficient values for each dataset from the Table 7 show the effectiveness of our proposed network for HRS image classification. The high-level abstraction features are extracted by gradually reducing the size of the convolutionkernel in the proposed CNN and the L2 regularization and dropout strategy are used in multipleconvolutional layers in our network to obtain a better generalization performance.

Table 7. Classification accuracy and Kappa coefficient based on different neural network with the optimal number of epochs

In order to illustrate the contribution of the proposed CNN in our framework further, we compared the time overhead and the classification performance among the propoesd CNN, VGGNet and Alexnet. As we all know that the computational time is influenced by three mainfactors: experimental platform (software and hardware equipment), experimental data size and classification algorithm. Thus experiments on a specific dataset based on the same hand crafted features should be running on the same platform. There are two computers used during ourexperimental process. The first computer has two NVIDIA GeForce GTX 1080 Ti and ani7-5930K CPU while the second has a NVIDIA GeForce GTX 960 and an i7-4790K CPU.

Four datasets for each HRS image were used as the input of the deep CNN: raw data, GLAM-based feature, DMP-based feature and MS-based feature. The related experiments of the latter two datasets are running only on the first computer while the experiments on the former two data are running on the two computers. So the valid results for the time consuming is just on the DMP-based feature and MS-based feature. In order to compare the performance of different CNN, the time consuming from the CNN training process to the testing stage arerecorded and the number of epochs is 500 for each experiments. The results are shown in Table 8.

Table 8. The comparison of computational time and classification performance among AlexNet 、vggNet and the proposed CNN (ProCNN)

3.4 The Relationship between the Deep Features and Corresponding
Handcrafted Features on Classification Results

This section explores the relationship between different handcrafted features and corresponding deep features on the image classification results. Three different hand crafted features were used to perform the experiment: texture feature extracted using the GLCMalgorithm, shape feature extracted by the DMP algorithm, and the multi handcrafted feature based on the fusion of the GLCM algorithm and the DMP algorithm. Fig. 8 compares the classification performance of two different classifiers: the deep feature classification results obtained based on the proposed CNN structure and the corresponding handcrafted featureclassification results using RBF-SVM classifier. SVM is one of the most widely used classifiers in HRS image classification [27, 29, 28]. It can effectively solve complex nonlinear problems and get promising results by using kernel functions and be easy to generateclassification results by using the SVM library directly.

As can be seen that the results of deep features are much higher than the corresponding hand crafted features, because deep learning-based methods can extract the high-level abstractand invariant features of ground objects that help to change the intra- and inner- class differences. By comparing the classification results based on different handcrafted methods, it can be found that the stronger the representation ability of the handcrafted feature is, the higher the classification accuracy will be with the corresponding deep features.

Fig. 5. Comparison of classification results using different spatial information between hand crafted features and deep features

3.5 The Impact of Patch Size for the Input of CNN

CNN-based deep feature extraction method requires a fixed-size patch as input during training and testing phases. In this experiment, the GLCM-CDE feature is utilized to explore the patch size effect on deep feature learning performance for the correlation between window size in the GLCM feature extraction method and patch size of the CNN input. Then a series of different patch/window size were used and we imply the softmax classifier with GLCM-CDEfeature and RBF-SVM classifier with GLCM-based handcrafted feature for HRS datasets.

Fig. 6. Comparison of classification results using different window size between handcrafted features and the GLCM-based CNN deep features

The effect of the patch/window size on HRS images classification is provided in Fig. 6. As can be seen that the features extracted by the deep CNN significantly improve the classificationaccuracy compared with the GLCM-based handcrafted features. By observing, we can find that the classification performance of GLCM-CDE features tends to be stable earlier on therelatively smaller window than the classification performance of handcrafted features. Thelarger the difference between the window size and the average size of the ground objects is, the more obvious the performance improvement of the classification result based on the GLCM-CDE features is. This means that the features obtained by the CNN network caneffectively reduce the distance of the internal homogeneous information of the ground objects and improve the classification results of the HRS images. The square patches of 15 × 15 are used for PaviaU and Salinas datasets and 17 × 17 are used for Indian Pine dataset as the CNN input in all other experiments.

3.6 The Effect of Iteration Times

Based on the neural network derived from the data, the training results (network structure, weight and offset) is a representation of the sample data distribution. The application of BPalgorithm iterates iteratively to fine-tune the network weights, which enables the network tosimulate the distribution characteristics of the overall data through the sample data. Therefore, the degree of learning of the training samples affects the classification result of the data. If the number of training iterations is insufficient, that is, failure to make full use of the sampleinformation, it is difficult to accurately describe the distribution characteristics of the data. On the contrary, if the number of iterations is too large, the network contains information unique to the training samples, which leads to a decrease in the distribution performance of the HRSimages by the network. Therefore, we train the neural network with different number ofiterations to observe the impact on classification performance: (n_epoch = [500; 1000; 1500; 2000; 3000; 5000; 10000]). The relationship between the epoch number and the optimal classification result is presented in Table 9.

Table 9. The relationship between the epoch number and the optimal classification results

Fig. 7 shows the curve of the loss function obtained by the method based on MS-CDEfeature on the validation dataset during training process with each image. Obviously there will be over-fitting phenomenon when the number of iterations exceeds a certain value. Byobserving the trend of the curve, we can find the optimal value of the training epoch and get the best classification result. The conversion between training steps and the value of the epochis calculated as follows. The ratio represents the proportion of the validation set in the training samples during training process and n represents the number of training samples:

= (1− )∗_∗ _ (1)

According to Fig. 7 and the above Eq. (1) a suitable value of the epoch can be obtained totraining the network for each dataset and get an optimal classification result for HRS images. Fig. 8 shows the classification performance of each class of the ground objects under differentiterations. It can be seen that in a well-trained network, the difference between the different ground objects can be effectively improved. The classification performance with optimalepoch based on MS-CDE feature is provided in Fig. 9.

Fig. 7. The relationship between training epoches and the CNN classification performance

Fig. 8. per-class classification result with different epoch Fig. 9. Visualization of the classification performance based on the MS-CDEfeature with the optimal epoch

4. Conclusion

In this paper, the proposed approaches, fusion of the traditional handcrafted and deep features, are used to handle the inherent challenges in HRS classification. The rationale is that data obtained from the handcrafted features based on expert knowledge only contain the primary information of the original HRS image, with a simplified data distribution. This, in turn, reduces the need for a CNN network structure parameter scale. Spectral features based on the PCA algorithm are used as the basic map with a low dimension. Then, GLCM-based texturefeatures and shape features based on DMP are extracted as the shallow spectral-spatial features for the deep CNN network input. Meanwhile, considering the differences in recognition ability of different handcrafted features on different ground objects, the vectorstacking method is used to further improve the classification performance of the HRS images with the proposed deep CNN.

These handcrafted feature maps, used as the input of CNN, not only improve the training starting point of the deep features, but also offer better guidance and interpretability to the deep features. A simple neural network structure suitable for these HRS images was then proposed to obtain high-level abstraction features according to the characteristics of thesehand crafted data and the theory of convolution neural network, so that satisfactory classification results can be obtained with smaller sample sizes. L2 regularization and dropout strategies are used during the construction of the CNN network to mitigate the effects of over-fitting problems. In the end, we obtained the optimal network classification results in thesituation of a suitable number of iterations and CNN input patch size for each data set.

Acknowledgments

We thank the Editor and anonymous referees for their constructive feedback. This work wassupported by the National Key Research and Development Program of China (No. 2017YFC1404700), the National Natural Science Foundation of China (No. U1711266) and National Natural Science Foundation of China under Grant (No. 41701417). It was also supported by the Youth League Committee of CUG (No. 1610491B24).

References

  1. Y. Ren and Y. Liu, "Geological disaster detection from remote sensing image based on experts knowledge and image features," in Proc. of Geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE International, pp. 677-680, 2016.
  2. V. Silvia, C. Jocelyn, Benediktsson, J. Atli, T. Hugues, and W. Bjorn, "Advanced directional mathematical morphology for the detection of the road network in very high resolution remote sensing images," Pattern Recognition Letters, vol. 31, no. 10, pp. 1120-1127, 2010. https://doi.org/10.1016/j.patrec.2009.12.018
  3. V. O. Kare, "Using Deep Convolutional Networks to Detect Roads in Aerial Images," NTNU, 2016.
  4. J. Yan, Y. Ma, and L. Wang, "A cloud-based remote sensing data production system," Future Generation Computer Systems, vol. 86, pp. 1-13, 2017. https://doi.org/10.1016/j.future.2018.03.013
  5. F. Yuan, K. E. Sawaya, B. C. Loeffelholz, and M. E. Bauer, "Land cover classification and change analysis of the Twin Cities (Minnesota) Metropolitan Area by multitemporal Landsat remote sensing," Remote sensing of Environment, vol. 98, no. 2, pp. 317-328, 2005. https://doi.org/10.1016/j.rse.2005.08.006
  6. H. Masroor, C. Dongmei, C. Angela, W. Hui, and S. David, "Change detection from remotely sensed images: From pixel-based to object-based approaches," ISPRS Journal of Photogrammetry and Remote Sensing, no. 80, pp. 91-106, 2013.
  7. P. Anne, H. Jacky, and W. Christiane, "The utility of texture analysis to improve per-pixel classification for high to very high spatial resolution imagery," International Journal of Remote Sensing, vol. 26, no. 4, pp. 733-745, 2005. https://doi.org/10.1080/01431160512331316838
  8. W. Han, R. Feng, L. Wang, Y. Cheng, "A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 145, Part A, pp. 23-43, 2018. https://doi.org/10.1016/j.isprsjprs.2017.11.004
  9. L. Wang, K. Lu, P. Liu, R. Ranjan, L. Chen, "IK-SVD: Dictionary Learning for Spatial Big Data via Incremental Atom Update," Computing in Science and Engineering, vol. 16, no. 4, pp. 41-52, 2014. https://doi.org/10.1109/MCSE.2014.52
  10. A. H. Gokhan and A. Selim, "Automatic detection of geospatial objects using multiple hierarchical segmentations," IEEE transactions on Geoscience and Remote Sensing, vol. 46, no. 7, pp. 2097-2111, 2008. https://doi.org/10.1109/TGRS.2008.916644
  11. L. Zhang, L. Zhang, and B. Du, "Deep learning for remote sensing data: A technical tutorial on the state of the art," IEEE Geoscience and Remote Sensing Magazine, vol. 4, no. 2, pp. 22-40, 2016. https://doi.org/10.1109/MGRS.2016.2540798
  12. J. Wei, L. Wang, P. Liu, X. Chen, W. Li, Z. Y. Albert, "Spatiotemporal Fusion of MODIS and Landsat-7 Reflectance Images via Compressed Sensing," IEEE Trans. Geoscience and Remote Sensing, vol. 55, no. 12, pp. 7126-7139, 2017. https://doi.org/10.1109/TGRS.2017.2742529
  13. Y. Chen, X. Zhao, and X. Jia, "Spectral-spatial classification of hyperspectral data based on deep belief network," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 2381-2392, 2015. https://doi.org/10.1109/JSTARS.2015.2388577
  14. T. Tian, L. Gao, W. Song, C. K. Raymond, and J. He, "Feature extraction and classification of VHR images with attribute profiles and convolutional neural networks," Multimedia Tools and Applications, vol. 77, no. 14, pp. 18637-18656, 2018. https://doi.org/10.1007/s11042-017-5331-4
  15. K. Alex, S. Ilya, and H. Geoffrey, "Imagenet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. https://doi.org/10.1145/3065386
  16. W. Chen, X. Li, H. He, L. Wang, "A Review of Fine-Scale Land Use and Land Cover Classification in Open-Pit Mining Areas by Remote Sensing Techniques," Remote Sensing, vol. 10, no. 1, pp. 1-15, 2018.
  17. Nair. Vinod and H. Geoffrey, "Rectified linear units improve restricted boltzmann machines," in Proc. of the 27th international conference on machine learning (ICML-10), pp. 807-814, 2010.
  18. M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
  19. J. Fan, J. Yan. and Y. Ma, "Big Data Integration in Remote Sensing across a Distributed Metadata-Based Spatial Infrastructure," Remote Sensing, vol. 10, no. 1, pp. 1-7, 2018.
  20. W. Bjorn and B. J. Atli, "Fusion of support vector machines for classification of multisensor data," IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 12, pp. 3858-3866, 2007. https://doi.org/10.1109/TGRS.2007.898446
  21. W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, "Deep convolutional neural networks for hyperspectral image classification," Journal of Sensors, vol. 2015, no. 2, pp. 1-12, 2015.
  22. Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, "Deep feature extraction and classification of hyperspectral images based on convolutional neural networks," IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 10, pp. 6232-6251, 2016. https://doi.org/10.1109/TGRS.2016.2584107
  23. A. Sharma, X. Liu, X. Yang, and D. Shi, "A patch-based convolutional neural network for remote sensing image classification," Neural Networks, vol. 95, pp. 19-28, 2017. https://doi.org/10.1016/j.neunet.2017.07.017
  24. W. Zhao and S. Du, "Spectral-spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach," IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4544-4554, 2016. https://doi.org/10.1109/TGRS.2016.2543748
  25. P. Ghamisi, Y. Chen, and X. Zhu, "A self-improving convolution neural network for the classification of hyperspectral data," IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 10, pp. 1537-1541, 2016. https://doi.org/10.1109/LGRS.2016.2595108
  26. H. Wang, Y. Wang, Q. Zhang, S. Xiang, and C. Pan, "Gated Convolutional Neural Network for Semantic Segmentation in High-Resolution Images," Remote Sensing, vol. 9, no. 5, pp. 446, 2017. https://doi.org/10.3390/rs9050446
  27. M. Mattia, C. V. Gustavo, and B. Lorenzo, "A composite semisupervised SVM for classification of hyperspectral images," IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 2, pp. 234-238, 2009. https://doi.org/10.1109/LGRS.2008.2009324
  28. H. Ishida, Y. Oishi, K. Morita, K. Moriwaki, and T. Y. Nakajima, "Development of a support vector machine based cloud detection method for MODIS with the adjustability to various conditions," Remote Sensing of Environment, vol. 205, pp. 390-407, 2018. https://doi.org/10.1016/j.rse.2017.11.003
  29. J. Yan and L. Wang, "Suitability Evaluation for Products Generation from Multisource Remote Sensing Data," Remote Sensing, vol. 8, no. 12, pp. 982-995, 2016. https://doi.org/10.3390/rs8120982
  30. Y. Yuan, H. Lv, and X. Lu, "Semi-supervised change detection method for multi-temporal hyperspectral images," Neurocomputing, vol. 148, pp. 363-375, 2015. https://doi.org/10.1016/j.neucom.2014.06.024
  31. A. Erturk and A. Plaza, "Informative change detection by unmixing for hyperspectral images," IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 6, pp. 1252-1256, 2015. https://doi.org/10.1109/LGRS.2015.2390973
  32. J. Cao, Z. Chen, and B. Wang, "Deep convolutional networks with superpixel segmentation for hyperspectral image classification," in Proc. of Geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE International, pp. 3310-3313, 2016.
  33. Y. Liu, Y. Zhong, F. Fei, and L. Zhang, "Scene semantic classification based on random-scale stretched convolutional neural network for high-spatial resolution remote sensing imagery," in Proc. of Geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE International, pp. 763-766, 2016.
  34. J. Ding, B. Chen, H. Liu, and M. Huang, "Convolutional neural network with data augmentation for SAR target recognition," IEEE Geoscience and remote sensing letters, vol. 13, no. 3, pp. 364-368, 2016. https://doi.org/10.1109/LGRS.2015.2513754
  35. Y. Liu, M. Zhang, P. Xu, and Z. Guo, "SAR ship detection using sea-land segmentation-based convolutional neural network," in Proc. of Remote Sensing with Intelligent Processing (RSIP), 2017 International Workshop on, pp. 1-4, 2017.
  36. L. Wang , J. Zhang, P. Liu, C. K. Raymond, and F. Huang, "Spectral-spatial multi-feature-based deep learning for hyperspectral remote sensing image classification," Soft Computing, vol. 21, no. 1, pp. 213-221, 2017. https://doi.org/10.1007/s00500-016-2246-3
  37. W. Song, L. Wang, P. Liu, and C. K. Raymond, "Improved t-SNE based Manifold Dimensional Reduction for Remote Sensing Data Processing," Multimedia Tools and Applications, vol. 78, no. 4, pp. 4311-4326, 2019. https://doi.org/10.1007/s11042-018-5715-0

Cited by

  1. A Hyperspectral Image Classification Approach Based on Feature Fusion and Multi-Layered Gradient Boosting Decision Trees vol.23, pp.1, 2019, https://doi.org/10.3390/e23010020