1. Introduction
Image segmentation is an important preprocessing of image recognition and computer vision. During the development of medical imaging technology, image segmentation is of great significance in medical applications. So far, many methods are applied to medical images analysis, including some traditional methods based on statistical analysis and partial differential equations. Furthermore, with the appearance of CNN, many segmentation methods based on deep learning are increasingly proposed and gradually replaced the traditional algorithms into the mainstream. The emergence of U-Net opened up the regulation of convolutional neural network to segment biomedical images.
U-Net is a segmentation network proposed by Ronneberger et al., “in the ISBI Challenge [1] which is inspired by the FCN [2]. Fig. 1 shows the overall structure. The neural network is mainly composed of two components: a contracted path and an extended path. The contracted path is mainly used to capture the context information of the image. The input via two 3×3 convolution layers and a maxpooling layer, whose process is repeated four times and feature maps are reduced to 1⁄64. The extended path precisely locates the part to be segmented in the image. The result of the contracted path via a deconvolution layer to expand the size and two 3×3 convolution layers, whose process is repeated four times to make feature maps restored to the original size. Skip connection is also used in the network to transfer the shallow feature maps to the upper layer symmetrically, making better use of the information of different scales. In many cases, the training of deep learning networks requires a large number of data sets, and the cost of biomedical data (images and texts) is higher. The U-Net is very effective for the segmentation of medical images with few samples and also has good noise immunity. To a certain extent, the noise image has less influence on training process. However, this model also has problems objectively: 1) most medical images have weak edges, which make the network perform better classification difficultly and cause partial loss of details; 2) structurally, simply superimposing the convolution layer can improve the expression ability of network, which will increase a mass of parameters and make training network difficult. Up to now, many scholars have proposed many improved methods for the U-Net [3, 4, 10, 12, 13].
Fig. 1. The U-Net structure includes convolution, maxpooling, concatenate and deconvolution.
Chen L et al., “proposed DRINet [3] and a fully automatic acute ischemic lesion segmentation model (EDD+MUSCLE Net) [4]. The DRINet absorbed the idea of DenseNet[5], ResNet [6] and Inception V1 [7] and adopts dense connection block, residual inception block and unpooling block. Compared to U-Net, the structure of DRINet does not have any skip connections and the connections are packaged in blocks, which makes the method be more flexible and has more accurate segmentation results. The EDD+MUSCLE Net [4] combines EDD Net with MUSCLE Net. The EDD Net consists two parallel full convolution network architectures, which can obtain the integrated segmentation results. MUSCLE Net composed of a mini VGG-Net [8,9] is applied to judge true or false positive accurately. The fully automatic acute ischemic lesion segmentation model has eminent ability of segmentation and recognition using for tumor images. Some details are lost in the segmentation results because of the simple network structure and fewer network layers.
Foivos I et al., “proposed ResUNet-a [10] which introduced a residual block to eliminate the problem of gradient dispersion or explosion effectively. The addition of the pyramid scene parsing pooling layer uses background context information [11], which strengthens the use of the whole network feature information and improves the performance of network. Concurrently, they improved Dice loss function to expedite the convergence of the network. However, the performance in segmentation results is not ideal in details.
MDU-Net was proposed by Zhang J et al., “[12] and created three multi-scale dense connected structure: Dense Encoder and Decoder Block, Dense Cross connections Block and Fully Multi-scale Dense connected U-shape architecture. The characteristic of dense connection is lifting gradient back propagation, which is fully applied in the U-shape structure to make the training easier. Zhou Z et al., “created a nested U-Net architecture [13] that integrates different levels of feature information. The application of the deep supervision at the end of structure contributes to update weights more quickly during training. The skip connections are various in the two methods, which also brings a lot of calculation in the training process due to the large amount of parameters
Badrinarayanan V proposed the SegNet method [14] which is extremely similar to FCN. The pooled index in upsampling layers make the model easier to optimize. The PSPNet [15] creates a pyramid pool block that can aggregate context information of different regions, which enhances the utilization of global information and enormously improves the performance of scene parsing. Nevertheless, these two methods have achieved good effect in scene image segmentation, but poor consequence in medical image segmentation.
DeepLab [16] applied the atrous convolution which expands the receptive field, but the calculation amount is the same as convolution. The receptive field can extract more global information in images. They improved the performance of image boundary positioning by introducing a full connected Condition Random Field (CRF) in the final output of network. Although the model is successful, they still exist at least two limitations. Firstly, it needs to perform convolutions on a large number of detailed feature maps that usually have high-dimensional features, which compute expensively . Moreover, a large number of high-dimensional and high-resolution feature maps also require huge amounts of GPU memory resources [17].
2. Related Work
The brain diseases are major killers of human health and are difficult to cure. Accurate diagnosis and efficient treatment of brain diseases has always been a medical problem. In the first year of human birth, the brain is in the stage of rapid development, some neuropathic diseases, such as attention deficit hyperactivity disorder (ADHD), infantile autism [18], bipolar affective disorder and schizophrenia, may be reflected in the pathological brain tissue of the patient's infancy. Therefore, it is significant that brain tissues are accurately segmented in infant images.
T1-weighted (T1-w) and T2-weighted (T2-w) non-invasive infant brain multimodal magnetic resonance images (MRI) are common, which provide enough data and good conditions for our segmentation studies. Our essential research process is to accurately segment three types of brain tissue (CSF, GM and WM), which is also very important for registration [19] and atlas building [20,21]. However, the T1-weighted and T2-weighted infant brain MR images have problems of low contrast and uneven gray scale. Fig. 2 demonstrates the gray distribution diagram of tissues in the MR images, from which it can be observed that the gray values in three tissues have a high overlap. In the gray distribution diagram of T1-w image, the gray values are largely overlapping in GM and WM. Moreover, all tissues are largely overlapping in the gray distribution diagram of T2-w image. The situation indicates that the contract is the lowest, which is the challenge for us to finish the tissue segmentation task. Obviously, some traditional methods are disabled to solve such problems, which leads us to attempt to use deep learning to complete the challenges.
Fig. 2. The T1- and T2-weighted infant MR images in the isointense phase exhibit the contrast and obvious difference. The left side of the MR images are the tissue distribution in the brain, among them, the distribution overlap of WM and GM is very high.
In structures of deep learning network, a large quantity of layers is superimposed, which means that many more features can be extracted to make the network have stronger expressive ability. However, increasing the number of layers will also bring plenty of parameters to make training more difficult,which will make convergence becoming slower and bring about lower accuracy (as opposed to shallower networks). The residual learning can effectively solve such a problem [6]. In each bottleneck block, the output is the addition of the convolution result and an identity map. The skip connection makes the convolution kernel only Learning the residual features between the input and the output, which makes the gradient not disappear during the transfer process. Therefore, the network training becomes easy. The Inception v1[7] is proposed to break the conventional convolution pattern. The creation of the inception module reduced a good deal of parameters and saved computational overhead. At the same time, the design of this module can effectively expand the scope of expression features.
U-Net and its variant models have the defects of losing details in many data sets of image segmentation task, and some of them even have problems such as too long training time, gradient disappearance and slow convergence. In order to solve these problems, we propose a triple residual multiscale fully convolution network (TRMFCN) model with three levels of input, which can extract from multiple scales effectively. Moreover, we introduce the Residual Multiscale (RM) block to make the convergence more easily and apply Concatenate Block to extract more information. Our main contributions are: 1) the multi-scale input method increases the utilization of image information; 2) the Residual Multiscale (RM) block structure improves the computational efficiency; 3) The sufficient concatenations between layers enhance the reusability of feature information.
3. TRMFCN
3.1 Triple-Branched model
Fig. 3 depicts the overall structure of our proposed model (TRMFCN), which is composed of encode and decode process. Encode process includes: traditional 2d convolution, maxpooling and residual multiscale (RM) block. Decode process includes: deconvolution, residuals multiscale (RM) block, concatenate block and traditional 2d convolution. The RM block is inspired by ResNet [6] and Inception V1 [7]. The creation of this concatenate block is inherited from U-Net.
Fig. 3. The entire process is shown in the TRMFCN structure map, RM Block and Concatenate Block are important parts of the structure, the details of them are described in Fig. 4 and Fig. 5, respectively.
In TRMFCN structure, the 2d convolution with 3 × 3 kernel-size is placed before all RM blocks to increase feature maps, and the kernel number was continuously increased to 60, 120, 240 and 480 in the encoding process, continuously decreased to 240, 120, and 60 in the decoding process. To enable the RM block to extract enough features, we have placed four 2d convolutions with a kernel size of 3 × 3 in front of the model output. Moreover, we also placed four 2d convolutions with a kernel size of 1 × 1 at the end of the model to obtain desired classification results (four classes). We set all the deconvolution step sizes to 2 to restore the image step by step. After 3 times of upsampling, the feature maps are restored to the original image size. The advantages including: 1) the form of input strengthens the information extraction of the data; 2) input branches can be adjusted according to different data sets, and the structure is flexible.
3.2 Residual Multiscale block
In the feature extraction process, a large-scale convolution kernel can make the model learn some large features, and a small-scale convolution kernel can make the model learn some details. Fig. 4 is the residual multiscale (RM) block, the basic convolution is divided into three different convolutions, and the number of convolution kernels is one third of the input channels. Finally, we put the concatenation of all convolution results and the input into the shortcut (addition). The essence of shortcut connection is the identity map. our block output can be expressed as follows:
Fig. 4. A residual multiscale block is the inception module combines residual connection, different branches have convolution kernels of different sizes, a skip connection appears after feature maps concatenated.
𝑥𝑘+1 = 𝑔((𝑓1(𝑥𝑘)⨀𝑓3(𝑥𝑘)⨀𝑓5(𝑥𝑘)), 𝑥𝑘) (1)
where the ⨀ indicates concatenation and 𝑔(⋅) denotes the identity mapping. 𝑓1(⋅), 𝑓3(⋅) and 𝑓5(⋅) represent the convolution of 1 × 1 , 3 × 3 and 5 × 5 kernel sizes after batch normalization (BN) [22] and rectified linear unit (ReLU) [23], respectively.
Unlike the traditional 2d convolution, our block can extract more information from feature maps. The residual connections [24] make the training easier, and the multiscale convolutions make the model learn more features.
3.3 Concatenate Block
Fig. 5 demonstrates the structure of concatenate block, from which we concatenate feature maps from the encoding process and decoding process (the size of the fusion here is the same) to complete feature fusion. Moreover, fusion elements in encoding process have two more parts than U-Net, and Fig. 1 and Fig. 3 clearly show the difference. The concatenate layer fuses many more shallow features, which improve the reusability of features and contribute to the decoding process.
Fig. 5. A is the encoding part of TRMFCN, B is its decoding part, and the Concatenate block represents the way of feature fusion.
3.4 Fully convolutional networks
Convolutional neural network (CNN) usually connects several full connection layers after a series of convolution, and finally obtains the final feature corresponding to the image, namely probability vector, which is used for image level classification or regression task. FCN [2] performs the pixel-level classification task and turns the full connection into convolution and deconvolution, then the feature maps are restored to the original image size. The specific formula can be expressed as follows:
𝐶𝑖 = 𝑊𝑖⨂𝐶𝑖−1 + 𝑏𝑖 (2)
𝐶𝑖 = 𝜎(𝐶𝑖) (3)
where 𝐶𝑖 denotes the result of the 𝑖-th convolution, 𝑊𝑖 denotes the 𝑖-th convolution weight vector and 𝑏𝑖 denotes the 𝑖-th convolution bias, ⨂ is the convolution, and the 𝜎(∙) is the activation function. The objective function, namely the energy function, is defined as:
\(E=-\sum_{I_{i j} \in I} \sum_{c=1}^{C} q_{c}\left(I_{i j}\right) \log p_{c}\left(I_{i j}\right) \) (4)
The 𝑝𝑐(𝐼ij) represents the output probability, which is pixel 𝐼ij belongs to class 𝑐𝑐, and 𝑞𝑐(𝐼ij) is the true probability distribution.
3.5 Fully convolutional networks
In order to reasonably analyze the segmentation result of different models, we use Jaccard similarity (Js value) [25] to evaluate the performance of different models, which is defined as:
𝐽s(𝑆1, 𝑆2) = (𝑆1 ∩ 𝑆2)/(𝑆1 ∪ 𝑆2) (5)
Where 𝑆1 denotes the segmentation result and 𝑆𝑆2 denotes the ground truth, in this way, a higher JS value indicates that the model performs better in the test. In the subsequent evaluation, Js values of CSF, GM and WM will be taken as important indicators to evaluate the model quality. The expressions of the three parts can be expressed as follows:
JsCSF (S CSF1, SCSF2) = (S CSF1 ∩ SCSF2) / (S CSF1 ∪ SCSF2) (6)
JsGM (SGM1, SGM2 ) = (SGM1 ∩ SGM2) / (SGM1 ∪ SGM2) (7)
JSWM ( SWM1, SWM2) = ( SWM1 ∩ SWM2) / ( SWM1 ∪ SWM2) (8)
Where SCSF1 denotes the segmentation result of cerebrospinal fluid (CSF), and SCSF2 denotes the ground truth of CSF. The SGM1 denotes the segmentation result of gray matter (GM), and the SGM2 denotes the ground truth of GM. The SWM1 denotes the segmentation result of white matter (WM), and the SWM2 denotes the ground truth of WM.
4. Experiments
4.1 Segmentation in infant brain MR images
Data preprocessing: The dataset we choose for the experiment is from iSeg-2017 challenge, and the average age of these babies collected was 6 months [26] without any pathology. We selected 10 labeled babies, each with 256 groups (256 brain MR T1-w images, 256 brain MR T2-w images and corresponding labeled images). Since two-thirds of the images for each baby are all black backgrounds (all pixel values are 0), we removed these images to avoid interfering with network training. At last, a total of 996 images were obtained and the data was scrambled. About 10% (96 groups) were selected as the test set and the remaining 10% as the verification set.
Model train: The encoding structure of our model (TRMFCN) has three inputs: the first input is the T1-w image with the size of 192 × 144 × 1; the second input is the T2-w image, and its size is also 192 × 144 × 1; the third input is different from the first two, which concatenates T1-w and T2-w images with a size of 192 × 144 × 2. There was only one input in the experiments of comparison methods. Therefore, we concatenate T1-w image and the corresponding T2-w image with a size of 192 × 144 × 2 as the input.
The Adam optimizer is chosen in the training process of our model. We set mini-batch size to 10 and the iterations to 80. We also apply the dropout [27] to prevent overfitting. After getting the result of the residual multiscale block, we set the dropout rate to 0.3. Our experiment used the Keras package in python codes. The training lasts for 2 hours, and it takes an average of 90 seconds to complete an iteration. The entire training process was on an NVIDIA GeForce GTX 1080 Ti GPU (11GB).
Experiment Results: We used the seven methods: FCN, U-Net, DRINet, EDD+MUSCLE Net, MDU-Net, ResUNet-a and U-Net++ as comparative experiments. Each method was trained 8 times and the best model was saved, and we randomly selected one for test and comparison. We use these trained models to test 96 T1-w images and the corresponding 96 T2-w images. The Js values (CSF, GM and WM) of the test results are listed in Table 1 clearly.
Table 1. Comparison of mean Js values of CSF, GM, WM% in eight methods
From the 7 comparison results, we can intuitively see the differences, Firstly, the Js values (CSF, GM and WM) of U-Net in all classes are far superior to FCN, which indicates the positive function of concatenation in segmentation. Although DRINet constitutes a simple network structure in a modular way and removes the skip connection, the performance is not as good as the original U-Net in this data set. Additionally, the other three methods: EDDNet, MDUNet and U-Net++ are also worse than U-Net. Moreover, the ResUNet-a outperformed U-Net on WM and GM, which indicates that the residual connection optimizes the model to some extent.
Compared to the results of these seven methods, the model we proposed demonstrates the best results, whose test results of the model obtained the highest Js values (CSF, GM and WM) in eight methods. We have concluded two main reasons that improve the accuracy: 1) Residual learning makes our model converge faster. 2) Various concatenations allow the network to learn more features from shallow layers.
We randomly select one image from 96 test result images to visually compare the segmentation results of different methods. Fig. 6 (a-g) shows the segmentation results of seven methods, where the Fig. 6 (a-f) shows the images of six comparison methods and the Fig. 6 (g) is the result of our method. The Fig. 6 (h) is the corresponding label. Due to the poor segmentation results of the FCN, we did not show the renderings. We marked the label and all segmentation images with a dark red box in the same representative position. Compared the differences in these red boxes, we find that white matter bands and cerebrospinal fluid bands have evident fractures in the segmentation images of the six comparative methods. These methods lost some details in the brain tissue. Obviously, these methods do not perform as well as our method in segmentation of baby brain MR images.
Fig. 6. Comparison of seven segmentation methods. From (a) to (h): U-Net, DRINet, EDDNet, MDUNet, ResUNet-a, U-Net++, the results of the proposed method, and the ground truth. The dark red box represents the obvious segmentation error.
We obtained the accuracy (Acc) and variance (Var) of the 350 test results by calculation in
Table 2. The accuracy (Acc) can be expressed as follows:
𝐴cc(𝑆1′, 𝑆2′) = (𝑆1′ ∩ 𝑆2′)/𝑆2′ (9)
where the 𝑆𝑆1′ denotes the segmentation result without black background information. The 𝑆𝑆2′ denotes the corresponding ground truth without black background information. Our model obtained the lowest variance in accuracy (Acc), CSF and GM, which manifests that our model has good robustness.
Table 2. Methods of accuracy (%) and variances (10-4 ) of Js values
4.2 Segmentation in 2.5D infant brain MR images
Data preprocessing: In order to acquire complete neighborhood information, we reserve all the black background images and obtain 2560 groups of images. These extra black background images are only used as domain information. The test set (96 groups) is same as the 4.1
Model train: We changed the input of the model to a 2.5D form [28], which makes the network learn the domain feature information of the image to improve the segmentation accuracy. Fig. 7 shows the difference between 2D input and 2.5D input. The 2D input has only one channel, but the 2.5D input has 2𝑘𝑘 + 1 channels (From the above 𝑘𝑘 images and the following 𝑘𝑘 images), where 𝑘𝑘 represents the number of the neighborhood images. In the experiment, we chose the value of 𝑘𝑘 to be 2. Therefore, in our proposed method: the first input is five T1-w images (192 × 144 × 5), the second input is five T2-w images (192 × 144 × 5), and the third input is the concatenation of the first two inputs (192 × 144 × 10). The input form of all comparison methods is the same as the third input of our method. In network training, we still use the Adam optimizer to set mini-batch size to 10 and the iterations to 80. The dropout rate is set to 0.3 after the residual multiscale block.
Fig. 7. The left half is the normal 2D input and the right half is the 2.5D input.
Experiment Results: With the same number of iterations, the Js value of our method is the highest among all the eight methods, and Table 3 shows the performance of each method visually. In Table 1 and Table 3, compared to the 2D input form, the 2.5D input form in most models has better segmentation accuracy. In Table 4, although the variances of white matter and accuracy are not the lowest, in general, our model is still the most robust.
Table 3. Comparison of mean Js values of CSF, GM, WM% in eight 2.5D methods
Table 4. 2.5D Methods of accuracy (%) and variances (10-4 ) of Js values
5. Discussion and Conclusion
In this paper, we propose a full convolution network TRMFCN based on multi-modal data characteristics. We create a new form of input, residual multiscale (RM) block and concatenate block. The residual multiscale block in the structure solves the problem of gradient diffusion and makes the network training more efficient. The concatenate block greatly enhances the reusability of features to make the global feature information more fully learned. Our model is flexible for NMR multi-modal image data. If a new modal data is added, we can extend an extra branch.
Our method can also be applied to single-mode MR image segmentation. We selected a dataset of adult brain MR images from the Internet Brain Segmentation Repository (IBSR) to validate our ideas. There are 2304 adult brain MR images and 2304 corresponding labels in this data set. We also remove 257 images of all black backgrounds to optimize the data. Then we randomly selected 247 images as the test set and 1800 images as the training set, finally we used 200 images in the training set for verification. We changed all three input values to the same, so that a brain MR image can enter the network. We listed all Js values in Table 5 after the test, where the results of TRMFCN are still the best.
Table 5. Comparison of mean Js values of CSF, GM, WM% in eight methods
Because of the extension of structure and the increase of the concatenated feature maps, the parameters will become more many. In order to reduce the amount of calculation, we will improve the method and even use the transfer learning [29] in the future. Meanwhile, we also guarantee its excellent ability.
참고문헌
- Ronneberger O, Fischer P, Brox T, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Proc. of MICCAI 2015. Springer International Publishing, 234-241, 2015.
- Long J, Shelhamer E, Darrell T, "Fully convolutional networks for semantic segmentation," IEEE, 3431-3440, 2015.
- Chen L, Bentley P, Mori K, et al., "DRINet for Medical Image Segmentation," IEEE Transactions on Medical Imaging, 2018.
- Chen L, Bentley P, Rueckert D, "Fully automatic acute ischemic lesion segmentation in DWI using convolutional neural networks," Neuroimage Clinical, 15, 633-643, 2017. https://doi.org/10.1016/j.nicl.2017.06.016
- Huang G, Liu Z, Laurens V D M, et al., "Densely Connected Convolutional Networks," 2016.
- He K, Zhang X, Ren S, et al., "Deep Residual Learning for Image Recognition," in Proc. of IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, vol. 1, pp. 770-778, 2016.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proc. of CVPR, pp. 1-9, 2015.
- Simonyan K, Zisserman A, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Computer Science, 2014.
- Princy Matlani and Manish Shrivastava, "Hybrid Deep VGG-NET Convolutional Classifier for Video Smoke Detection," CMES: Computer Modeling in Engineering & Sciences, Vol.119, No.3, pp.427-458, 2019. https://doi.org/10.32604/cmes.2019.04985
- Diakogiannis F I, Waldner, Francois, Caccetta P, et al., "ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data," 2019.
- Lanlan Rui, Yabin Qin, Biyao Li and Zhipeng Gao, "Context-Based Intelligent Scheduling and Knowledge Push Algorithms for AR-Assist Communication Network Maintenance," CMES: Computer Modeling in Engineering & Sciences, Vol.118, No.2, pp.291-315, 2019. https://doi.org/10.31614/cmes.2018.04240
- Zhang J, Jin Y, Xu J, et al., "MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation," 2018.
- Zhou Z, Siddiquee M M R, Tajbakhsh N, et al., "UNet++: A Nested U-Net Architecture for Medical Image Segmentation," 2018.
- Badrinarayanan V, Kendall A, Cipolla R, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-1, 2017.
- Zhao H, Shi J, Qi X, et al., "Pyramid Scene Parsing Network," 2016.
- Chen L C, Papandreou G, Kokkinos I, et al., "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs," Computer Science, 2014(4), 357-361, 2014.
- Lin G, Milan A, Shen C, et al., "RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation," 2016.
- G. Li, et al., "Early Diagnosis of Autism Disease by Multi-Channel Cnns," Machine Learning in Medical Imaging, pp. 303-309, 2018.
- S. Hu, et al., "Learning-Based Deformable Image Registration for Infant MR Images in the First Year of Life," Medical Physics, vol. 44, pp. 158-170, Jan 2017. https://doi.org/10.1002/mp.12007
- F. Shi, et al., "Neonatal Atlas Construction Using Sparse Representation," Human Brain Mapping, vol. 35, pp. 4663-4677, Sep 2014. https://doi.org/10.1002/hbm.22502
- F. Shi, et al., "Construction of Multi-Region-Multi-Reference Atlases for Neonatal Brain MRI Segmentation," NeuroImage, vol. 51, pp. 684-693, Jun 2010. https://doi.org/10.1016/j.neuroimage.2010.02.025
- S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," ICML, pp. 448-456, 2015.
- Kazuhiko Kakuda, Tomoyuki Enomoto and Shinichiro Miura, "Nonlinear Activation Functions in CNN Based on Fluid Dynamics and Its Applications," CMES: Computer Modeling in Engineering & Sciences, Vol.118, No.1, pp.1-14, 2019. https://doi.org/10.31614/cmes.2019.04676
- He K, Zhang X, Ren S, et al., "Identity Mappings in Deep Residual Networks," 2016.
- Li C, Xu C, Anderson A W, et al., "MRI Tissue Classification and Bias Field Estimation Based on Coherent Local Intensity Clustering: A Unified Energy Minimization Framework," Springer Berlin Heidelberg, 288-299, 2009.
- L. Wang, et al., "Longitudinally Guided Level Sets for Consistent Tissue Segmentation of Neonates," Human Brain Mapping, vol. 34, pp. 956-972, Apr 2013. https://doi.org/10.1002/hbm.21486
- N. Srivastava, et al., "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," The Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014.
- Hu K, Liu C, Yu X, et al., "A 2.5D Cancer Segmentation for MRI Images Based on U-Net," in Proc. of 2018 5th International Conference on Information Science and Control Engineering (ICISCE), 2018.
- Zamir A, Sax A, Shen W, et al., "Taskonomy: Disentangling Task Transfer Learning," 2018.