1. INTRODUCTION
Alzheimer’s disease (AD) is considered the most common type of dementia seen in the elderly, making up majority of dementia cases in people over 65 years of age or older [1]. AD late-onset symptoms usually begin in the mid-60s, while early-onset symptoms occur between the 30s and mid-60s. It’s estimated that by the year 2050, roughly 640 million people may be affected by AD due to the aging population in the world [2]. As the AD progresses, the brain structure changes; initial damage occurs in the hippocampus. Mild cognitive impairment (MCI) can be considered as the early sign of AD, however, it’s not certain that MCI will turn into AD in all cases. As the disease progresses, hard plaques and tangles begin to appear in the brain, causing cognitive issues such as memory loss, and at later stages, inability to move. AD also causes the hippocampus and cerebral cortex. Because the hippocampus is responsible for episodic and spatial memory and plays the role of a relay between the brain and the rest of the body, neurons cannot communicate through synapsis. This causes issues with thinking, planning, and judgment. This is observed as a low-intensity brain cell in medical imaging. Unfortunately, there’s no definite cure for AD, and current solutions help with alleviating symptoms and improving the quality of life.
As the AD progresses, protein plaques in the brain named amyloid-β (Aβ) and hyperphospho-rated tau, and leads to further neuron and axon damage. These changes generally follow affecting the early medial temporal lobe (entorhinal cortex and hippocampus), followed by further neocortical damage [3]. These changes in the brain can be seen in medical images using different injected-chemicals into the bloodstream, as can be seen in Fig. 1. The indented surfaces of the brain disappear, leaving a more smooth and bleak plaque behind.
Fig. 1. Amyloid-positive (left) and amyloid-negative (right) images using PET/CT scans are used in diagnosis of the Alzheimer’s disease.
The changes begin to appear years before AD becomes prevalent. Nowadays it’s widely believed that the toxic effects of the plaque changes in the brain cause the symptoms of AD. This paper works with PET/CT (Positron Emission Tomography/ Computed Tomography) images where the prevalence of these hardened plaques is highlighted in patients.
Since the early 1970s, there has been a massive R&D effort on developing CAD (computer-aided diagnostics) systems that were based on manually subtracted feature vectors [4]. Afterward, these vectors would be put into a supervised training model such as the support vector machine to perform classification. Not long after it was realized there were serious limitations to such systems [5]. Due to these limitations, researchers shifted their focus on data mining approaches in the 1980s and 1990s in hopes of developing more accurate and flexible systems.
Nowadays, the performance of CAD systems has improved so drastically that they’re being compared to human experts. Due to deep learning methods such as convolutional neural networks (CNN) being able to extract meaningful features from images automatically through supervised training and give highly accurate results, the replacement of human experts with CAD systems for the future is being discussed [6]. Deep learning methods also perform well CAD systems with medical segmentation, disease and tumor detection [7], tracking, and classification in organs using microscopy and ultrasound as well.
In this paper, a stacked convolutional autoencoder is proposed to classify Alzheimer’s disease from given PET/CT images. Extracted features are sent through an added classification layer, and the diagnosis is performed. The proposed model is trained with the dataset that is obtained from Dong-A University Department of Nuclear Medicine, using PET/CT images.
The paper is organized as follows; section 2 discusses related work on AD classification in medical images, section 3 explains the proposed model and the dataset, section 4 shows the simulation results and considerations, and section 5 concludes the paper and discusses the future work.
2. ALZHEIMER’S DISEASE CLASSIFICATION
In the field of medical imaging, there has been a surge of developments to better identify in images for discrepancies in the brain as the AD progresses. Different machine learning applications have been used for the classification of AD using imaging data [8-10]. There also has been documented correlation between brain connectivity and the behavior of the patient [11]. Different imaging techniques were used in AD diagnoses such as PET [12], sMRI [13], fMRI [14], however, it’s been shown that features from multiple modalities improve the overall accuracy [15].
PET/CT is a combination of the cross-sectional anatomic information obtained through the CT, and metabolic information, and the obtained through PET, and then the information fused together. In AD screening especially, this gives advantages over single CT or PET, such as the ability to localize the increased FBB activity in abnormal locations which may be impossible to localize with PET alone. The literature shows that PET images provide better classification accuracy [16] than MRI images [17,18] alone.
Deep learning models have a wide area of use such as object recognition, object tracking, segmentation, natural language processing, etc. due to their automatic feature extraction capabilities. Their deep structures allow them to learn highlevel features, and further down the pipeline, more abstract features. They’re also capable of learning certain disease-caused features in the brain from images. That is why the CAD research was quick to apply and develop such models for a more accurate diagnosis of disorders. Brosh et. al. [19] developed a deep belief network (DBN) using manifold learning to detect AD. Suk et. al. [20-22] developed SVM kernels for an autoencoder using magnetic current imaging (MCI) for classification. Cárdenas-Peña et. al. [23] developed a model using kernel alignment ad show that pre-training of stacked autoencoder provides higher classification accuracy than unsupervised pre-training with plain autoencoders and principal component analysis (PCA).
As of 2019, AD can only be classified in its later stages using medical imaging, and early detection can only help to slow down the progression of cognitive decline. It’s vital to detect AD as early as possible in MCI stage. In this paper, a stacked convolutional autoencoder is developed that is trained with three classes; AD, MCI, and normal control (NC), and provides a high classification accuracy. The data used was provided by collaborating DongA University Department of Nuclear Medicine.
3. THE PROPOSED STACKED CONVOLUTIONAL AUTOENCODER FOR CLASSIFICATION
3.1 Deep Learning Model
Autoencoders, although were initially proposed for pre-training [24] purposes, are seeing a surge of interest due to their flexible nature such as in segmentation tasks in medical images such as U-Net [25]. Recently developed stacked convolutional autoencoders (SAE) are also state-of-theart learning and pre-training tools. Features that are learned can be applied for different scenarios, such as classification or segmentation.
When autoencoders are producing an output, they try to copy the input to their outputs. A number of hidden layers h that mark out a code in order to represent an input. Autoencoders are comprised of two major components: an encoder \(h=f(x)\) where an input data is spatially shrunk through feature extraction and a latent space representation (or bottleneck), and a decoder which reconstructs the input \(r=g(h)\), in CNN case, through the use of transpose convolution. The convolution operation is defined as,
\((I * K)_{i j}=\sum_{m=0}^{k_{1}-1} \sum_{n=0}^{k_{2}-1} \sum_{c=1}^{C} K_{m, n, c} \cdot I_{i+m, j+n, c}+b\) (1)
where input image with height \(H\), width \(W\), and C=3 channels (red, green, blue) such that \(I \in R^{H \times W \times C}\) where \(R\) is a real number. Subsequently for a bank of \(D\) filters there are \(K \in R^{k_{1} \times k_{2} \times C \times D}\) and biases \(b \quad b \in R^{D}\), one for each filter.
Convolution between the input feature map of dimension \(H \times W\) and the weight kernel of the dimension \(k_{1} \times k_{2}\) gives an output feature map of size \(\left(H-k_{1}+1\right)\) by \(\left(H-k_{2}+1\right)\). The gradient component for the individual weights can be obtained using the chain rule,
\(\frac{\partial E}{\partial w_{m^{l}, n^{l}}^{l}}=\sum_{i=0}^{H-k_{1}} \sum_{j=0}^{W-k_{2}} \frac{\partial E}{\partial x_{i, j}^{l}} \frac{\partial x_{i, j}^{l}}{\partial w_{m^{l}, n^{l}}^{l}}=\sum_{i=0}^{H-k_{1}} \sum_{j=0}^{W-k_{2}} \delta_{i, j}^{d} \frac{\partial x_{i, j}^{l}}{\partial w_{m^{l}, n^{l}}^{l}}\) (2)
where \(E\) is the error or cost function, \(l\) is the \(l^{th}\) layer where \(l=1\) is the first layer and \(l=L\) is the last layer, \(x\) is of dimension \(H \times W\) and has \(i\) and \(j\) as the iterators, \(w\) is of dimension \(k_{1} \times k_{2}\) has m and \(n\) as the iterators, \(w_{m, n}^{l}\) the weight matrix connecting neurons of layer \(l\) with neurons of layer\(l-1\).
As seen in the left part of the Fig. 2, encoder part of the proposed model has eight convolutional layers with an initial input size of 64 × 64 × 3. In each of these layers, convolutional filters with a size of 3 × 3 are used, followed by ReLU (Rectified Linear Units), and batch normalization thereafter [26]. Last encoding layer also has a dropout [27] operation with 50% probability. After two convolution layers, a max-pooling layer with a filter size of 2× 2 is used for dimension reduction. Lastly, two fully-connected layers with 1024 units each and last fully connected layer is connected to a softmax layer for classification with three classes (AD, MCI, and NC).
Fig. 2. Proposed stacked autoencoder architecture for classification of AD. Left side of the architecture is the encoder where the input image is encoded to its most abstract features through convolution and pooling operations and further, a classification layer (softmax). Right side of the architecture is the decoder part where the learned inputs from the bottleneck are used to re-create the original input. The total pixel value difference between the original image and the reconstructed image is used as a loss value (mean squared error) during the training, while the training’s main purpose is to reduce this value as low as possible through epochs.
Moreover as seen in the right part of the Fig. 2, decoder part has also eight corresponding convolutional layers with 3 × 3 filters to the encoder in order to make the output same size, with up-sampling (also called deconvolution, or transpose convolution) [28] instead of pooling in order to increase the spatial dimension After the last convolution layer, the reconstructed image is given as the output.
Loss function used in autoencoder-type architectures is generally mean squared error (MSE) \(J(x, z)=\|x-z\|^{2}\) ,which measures how similar the reconstructed input \(z\) is to the original input \(x\).
For optimization, stochastic gradient descent (SGD) with momentum was used for the proposed model. The training data is split 80% and 20% for training and validation, respectively. Nesterov momentum was also applied in order to speed up the learning process and improve convergence. This works as, given the objective function \(f(a)\) is to be minimized, the regular momentum is given as,
\(v_{t}=\mu v_{t-1}-\epsilon \nabla f\left(\alpha_{t-1}\right)\) (3)
\(\alpha_{t}=\alpha_{t-1}+v_{t}\) (4)
where \(v_t\) is the velocity,\(\epsilon>0\) is the learning rate, \(\mu \in[0,1]\) is the momentum coefficient, and \(\epsilon \nabla f\left(\alpha_{t}\right)\) is the gradient at \(a_t\), whereas Nesterov momentum is given as,
\(v_{t}=\mu v_{t-1}-\epsilon \nabla f\left(\alpha_{t-1}+\mu v_{t-1}\right)\) (5)
\(\alpha_{t}=\alpha_{t-1}+v_{t}\) (6)
3.2 Dataset
In this paper, PET/CT images dataset is the property of Dong-A university Hospital, Department of Nuclear Medicine. In the dataset, there are three classes; AD, MCI, and NC. The detailed information about the dataset can be found in Table 1. As can be seen in Fig. 3, each class in the dataset has a different stage of the brain in AD, MC, and NC whose features may be used to classify a certain individual’s case. FBB activity that is prevalent in AD and MCI cases can be observed.
Table 1. The dataset information with total number of classes and patients for each class.
Fig. 3. Sample AD, MCI, and NC images from the dataset in order. Each image was taken from the 60th slice (out of 11) of a different patient’s brain image.
In the original dataset, a single person has a total of 110 axial images that comprise of their brain. In this work, top and bottom 30 were discarded since they did not provide any useful information related to the work. Therefore, a single patient has 50 images from their brain with each image having a size of 64 × 64 × 3.
In order to see the model performance for other axes of the brain, sagittal and coronal axes were also created from a 3D brain model. For these axes, the first and the last 15 images were discarded as they did not carry useful information about the plaque formations in the brain. This means more images carrying information can be used for training those specific axes. Number of total images for coronal and sagittal axes can be seen in Table 2. The flowchart of creating these axes can be seen in Fig. 4. An example of coronal and sagittal images can be seen in Fig. 5 and Fig. 6.
Table 2. The number of images that were created and used for training after bicubic interpolation
Fig. 4. Flowchart of the followed method for creating sagittal and coronal axes using tensor of axial images. Using cubic spline method, initially a 3D image of the brain is created. Then, through bicubic interpolation, sagittal and coronal images are extracted to be used in training.
Fig. 5. Example of coronal images that were created. These images are from AD, MCI, and NC classes, respectively. Sample number of slice is taken from different patients.
Fig. 6. Example of sagittal images that were created. These images are from AD, MCI, and NC classes, respectively. Sample number of slice is taken from different patients.
For a specific set of data points \(\left(x_{k}, y_{k}\right), k=0: N\) a cubic spline consists of cubic polynomial \(s_{k}(x)\), s assigned to each subinterval classifying given constraints. Cubic spline is therefore defined as a function \(S(x)=s_{i}(x)\) on an interval\(\left[x_{i}, x_{i+1}\right]\) for \(n=0,1, \ldots, n-1\) defined as,
\(S_{j}(x)=a_{j}+b_{j}\left(x-x_{j}\right)+c_{j}\left(x-x_{j}\right)^{2}+d_{j}\left(x-x_{j}\right)^{3}\) (7)
The dataset for all axes is split between 90% training, and 10% testing. Training and testing images were not mixed in order to avoid overfitting the model. Training dataset is split between 80% training and 20% validation.
In machine learning, data augmentation is defined as making the dataset larger through different methods without affecting the characteristics of possible features, and preventing overfitting [29]. Because getting more data in medical imaging is especially costly, augmentation is the most reliable way to increase the total number of images in the dataset. These methods include, but not limited to, horizontal flip, width shift, height shift. This process was applied to training and testing dataset separately.
4. SIMULATION RESULTS AND CONSIDERATIONS
4.1 Simulation Environment and Training
Implementation of the proposed model was done with Tensorflow [31] and Keras [32], using Python 2.7 in Ubuntu 16.04 OS computer. In training SGD with a mini-batch size of 80, a learning rate of 0.001, a weight decay of 0.06, and momentum of 0.9 Nesterov optimization was used. Training was done for 100 epochs. The model’s accuracy and loss graph change can be seen in Fig. 7.
Fig. 7. Accuracy and loss graph of the proposed model during training for 100 epochs for both training and validation steps. The reason of small increments observed in the accuracy and loss is due to the small learning rate used.
Fluctuations seen in Fig. 7 are caused by the SGD optimization reducing the loss value received in the forward-propagation pass, by updating the weights in the backward-propagation. It can be observed that after 100 epochs, validation loss has reached the global minima. Getting the loss value to drop even further may be possible, however, this increases the chances of the proposed model to overfit, meaning the model memorizes the dataset instead of generalizing. This drops the accuracy of the model when tested with an image it has not encountered.
4.2 Model Performance
After the training, the model produces outputs of the initial inputs, which can be seen in Fig. 8 for axial (a), sagittal (b), and coronal (c) axes, respectively. This gives important information about the learned features, and to see if the model is not overfitting.
Fig. 8. Autoencoder’s reconstructed images from given input images in axial (a), sagittal (b), and coronal (c) axes, respectively. It should be noted that slightly blurry images are desired for the well-generalized model instead of the perfect replica of the original input.
Autoencoders give a generalized reconstruction image output instead of the exact copy (which means the system has overfit instead of generalizing according to the input dataset) of the initial input, which means the output generally appears blurry. In the proposed model, the desired output is obtained; the images are slightly blurry, and the accuracy is not affected negatively from too much training.
4.3 Results
Proposed model was compared with other stateof-the-art classification models such as VGG16 [31], GoogLeNet (Inception v4) [33], and AlexNet [34] which the comparison can be seen in Table 3. The results show that the proposed autoencoder provide higher classification accuracy than the compared models. In model inference, the accuracy of the model is calculated as,
\(\text { Accuracy }=\frac{\text { True positive }}{\text { True Positive }+\text { False Positive }}\) (8)
Table 3. Accuracy comparison between different deep learning models and axes.
The proposed model was able to extract more meaningful features from obtained sagittal images than the original images. This means there are useful spatial information carried in other planes.
5. CONCLUSION
In this work, a stacked convolutional autoencoder is proposed for AD classification. The proposed model was found to be able to extract features better than benchmark models which shows as the higher classification accuracy results. This shows that the autoencoders are still very powerful tools that can be used for classification purposes. It’s also been found that created sagittal images from the 3D brain give better overall accuracy than axial images that they were created from. More research on why sagittal images give better accuracy will be conducted.
For future work, more transition stages of AD will be introduced, along with innovative approaches for higher classification accuracy. Ways to make the model more lightweight will also be researched.
References
- Alzheimer's Association, "2019 Alzheimer's Disease Facts and Figures," Journal of Alzheimer's and Dementia, Vol. 15, No. 3, pp. 321-387, 2019. https://doi.org/10.1016/j.jalz.2019.01.010
- R. Brookmeyer, E. Johnson, K. Ziegler-Graham, and H.M. Arrighi, "Forecasting the Global Burden of Alzheimer's Disease," Journal of Alzheimer's and Dementia, Vol. 3, No. 3, pp. 186-191, 2007. https://doi.org/10.1016/j.jalz.2007.04.381
- G. Frisoni, N.C. Fox, C. Jack, P. Scheltens, and P. Thompson, "The Clinical Use of Structural MRI in Alzheimer's Disease," Journal of Nature Reviews Neurology, Vol. 6, No.3, pp. 78-87, 2010. https://doi.org/10.1038/nrneurol.2009.217
- G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, et al., "A Survey on Deep Learning in Medical Image Analysis," Journal of Medical Image Analysis, Vol. 42, No. 12, pp. 60-88, 2017. https://doi.org/10.1016/j.media.2017.07.005
- J. Yanase and E. Triantaphyllou, "A Systematic Survey of Computer-aided Diagnosis in Medicine: Past and Present Developments," Journal of Expert Systems with Applications, Vol. 138, No. 12, pp. 112-150, 2019.
- A. Tufail, C. Rudisill. C. Egan, V.V. Kapetanakis, S. Vega-Salas, C.G. Owen, et al., "Automated Diabetic Retinopathy Image Assessment Software: Diagnostic Accuracy and Cost-Effectiveness Compared with Human Graders," Journal of Ophthalmology, Vol. 124, No. 3, pp. 343-351, 2017. https://doi.org/10.1016/j.ophtha.2016.11.014
- S.Y. Kwon, Y.J. Kim, and G.G. Kim, "An Automatic Breast Mass Segmentation Based on Deeep Learning on Mammogram," Journal of Korea Multimedia Society, Vol. 21, No. 12, pp. 1363-1369, 2018. https://doi.org/10.9717/KMMS.2018.21.12.1363
- Y. Fan, S.M. Resnick, X. Wu, and C. Davatzikos, "Structural and Finctional Biomarkers of Prodromal Alzheimer's Disease:a High-dimensional Pattern Classification Study," Journal of Neuroimage, Vol. 41, No. 2, pp. 277-285, 2008. https://doi.org/10.1016/j.neuroimage.2008.02.043
- K. Hu, Y. Wang, K. Chen, L. Hou, and X. Zhang, "Multi-scale Features Extraction from Baseline Structure MRI for MCI Patient Classification and AD Early Diagnosis," Journal of Neurocomputing, Vol. 175, No. 1, pp. 132-145, 2016. https://doi.org/10.1016/j.neucom.2015.10.043
- R. Filiphovych, C. Davatzikos, and Initiative Alzheimer's Disease Neuroimaging Initiative., "Semi-supervised Pattern Classification of Medical Images: Application to Mild Cognitive Impairment (MCI)," Journal of Neuroimage, Vol. 55, No. 3, pp. 1109-1119, 2011. https://doi.org/10.1016/j.neuroimage.2010.12.066
- A.K. Ambastha, "A Deep Learning Approach to Neuroanatomical Characterization of Alzheimer's Disease," Journal of Studies in Health Technology and Informatics, Vol. 245, No. 1, pp. 1249-1249, 2017.
- K.R. Gray, R. Wolz, R.A. Heckermann, P. Aljabar, A. Hammers, D. Rueckert, et al., "Multi-region Analysis of Longitudinal FDGPET for Classification for the Alzheimer's Disease," Journal of Neuroimage, Vol. 60, No. 1, pp. 221-229, 2012. https://doi.org/10.1016/j.neuroimage.2011.12.071
- C. Davatzikos, P. Bhatt, L.M. Shaw, K.N. Batmangehelich, and J.Q. Trojanowski, "Prediction of MCI to AD Conversion, via MRI, CSF Biomarkers, and Pattern Classification," Journal of Neurobiology of Aging, Vol. 32, No. 12, pp. 19-27, 2011.
- H, Suk, C.Y. Wee, and D. Shen, "Discriminative Group Sparse Representation for Mild Cognitive Impairment Classification," Proceeding of 4th International Workshop on Machine Learning in Medical Imaging, Vol. 8134, pp. 131-138, 2013.
- R.J. Perrin, A.M. Fagan, and D.M. Holtzman, "Multimodal Techniques for Diagnosis and Prognosis of Alzheimer's Disease," Nature Journal, Vol. 461 No. 7266, pp. 916-922, 2009. https://doi.org/10.1038/nature08538
- H. Choi and K.H. Jin, "Predicting Cognitive Decline with Deep Learning of Brain Metabilism and Amyloid Imaging," Journal of Behavioural Brain Research, Vol. 344, pp. 103-109, 2018. https://doi.org/10.1016/j.bbr.2018.02.017
- M. Liu, J. Zhang, E. Adeli, and D. Shen, "Landmark-based Deep Multi-instance Learning for Brain Disease Diagnosis," Journal of Medical Image Analysis, Vol. 43, No. 1, pp. 157-168, 2017.
- J. Islam and Y. Zhang, "Brain MRI Analysis for Alzheimer's Disease Diagnosis Using an Emsemble System of Deep Convolutional Neural Networks," Journal of Brain Informatics, Vol. 5, No. 2, pp. 1-14, 2018.
- T, Brosch and R. Tam, "Manifold Learning of Brain MRIs by Deep Learning," Proceeding of the International Conference on Medical Image Computing and Computer-assisted Intervention, Vol. 16, No. 9, pp. 633-640, 2013.
- H.I. Suk, S.W. Lee, and D. Shen, "Hierarchical Feature Representation and Multimodal Fusion with Deep Learning for AD/MCI Diagnosis," Journal of Neuroimage, Vol. 101, No. 11, pp. 569-582, 2014. https://doi.org/10.1016/j.neuroimage.2014.06.077
- H.I. Suk, D. Shen, and Alzheimer's Disease Neuroimaging Initiative, "Deep Learningbased Feature Representation for AD/MCI Classification," Proceeding of International Conference on Medical Image Computing and Computer-assisted Intervention, Vol. 16, No. 7, pp. 583-590, 2013.
- H.I. Suk, S.W. Lee, and D. Shen, "Latent Feature Representation with Stacked Auto-encoder for AD/MCI Diagnosis," Journal of Brain Structure and Function, Vol. 220, No. 2, pp. 841-859, 2013.
- D. Cardenas-Pena, D. Collazos-Huertas, and G. Castellanos-Dominguez, "Centered Kernel Alignment Enhancing Neural Network Pretraining for MRI-based Dementia Diagnosis," Journal of Computational and Mathematical Methods in Medicine, Vol. 3, No. 4, pp. 1-10, 2016.
- J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Journal of Neural Networks, Vol. 61, No. 1, pp. 85-117, 2015. https://doi.org/10.1016/j.neunet.2014.09.003
- O. Ronneberger, P. Fischer, and T. Brox, "Unet: Convolutional Networks for Biomedical Image Segmentation," Proceeding of International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234-241, 2015.
- S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," Proceeding of the International Conference on Machine Learning, pp. 448-456, 2015.
- N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhundinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning, Vol. 15, No. 1, pp. 1929-1958, 2014.
- H.W. Noh, S.H. Hong, and B.H. Han, "Learning Deconvolution Network for Semantic Segmentation," Proceeding of the International Conference on Computer Vision, pp. 1520-1528, 2015.
- A. Krizhevsky, I. Sutskever, and G.E. Hinton, "Imagenet Classification with Deep 1097-1105, 2012.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, et al., "Tensorflow: Largescale Machine Learning on heterogeneous systems," https://www.tensorflow.org, (accessed April 12, 2019).
- F. Chollet, "Keras," https://github.com/kerasteam/keras, (accessed April 15, 2019).
- K. Simonyan and A. Sizzerman, "Very Deep Convolutional Networks for Large-scale Image Recognition," Journal of Computing Research Repository (CoRR), Vol. 1409, No. 1556, pp. 1-14, 2014.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, et al., "Going Deeper with Convolutions," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
- A. Krizhevsky, I. Sutskever, and G.E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Proceeding of Neural Information Processing Systems, pp. 1097-1105, 2012.