• Title/Summary/Keyword: Fully convolutional Network

Search Result 120, Processing Time 0.022 seconds

A Triple Residual Multiscale Fully Convolutional Network Model for Multimodal Infant Brain MRI Segmentation

  • Chen, Yunjie;Qin, Yuhang;Jin, Zilong;Fan, Zhiyong;Cai, Mao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.962-975
    • /
    • 2020
  • The accurate segmentation of infant brain MR image into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is very important for early studying of brain growing patterns and morphological changes in neurodevelopmental disorders. Because of inherent myelination and maturation process, the WM and GM of babies (between 6 and 9 months of age) exhibit similar intensity levels in both T1-weighted (T1w) and T2-weighted (T2w) MR images in the isointense phase, which makes brain tissue segmentation very difficult. We propose a deep network architecture based on U-Net, called Triple Residual Multiscale Fully Convolutional Network (TRMFCN), whose structure exists three gates of input and inserts two blocks: residual multiscale block and concatenate block. We solved some difficulties and completed the segmentation task with the model. Our model outperforms the U-Net and some cutting-edge deep networks based on U-Net in evaluation of WM, GM and CSF. The data set we used for training and testing comes from iSeg-2017 challenge (http://iseg2017.web.unc.edu).

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

  • Eom, Youngsik;Bang, Junseong
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.148-154
    • /
    • 2021
  • With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.

Facial Expression Classification Using Deep Convolutional Neural Network

  • Choi, In-kyu;Ahn, Ha-eun;Yoo, Jisang
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.1
    • /
    • pp.485-492
    • /
    • 2018
  • In this paper, we propose facial expression recognition using CNN (Convolutional Neural Network), one of the deep learning technologies. The proposed structure has general classification performance for any environment or subject. For this purpose, we collect a variety of databases and organize the database into six expression classes such as 'expressionless', 'happy', 'sad', 'angry', 'surprised' and 'disgusted'. Pre-processing and data augmentation techniques are applied to improve training efficiency and classification performance. In the existing CNN structure, the optimal structure that best expresses the features of six facial expressions is found by adjusting the number of feature maps of the convolutional layer and the number of nodes of fully-connected layer. The experimental results show good classification performance compared to the state-of-the-arts in experiments of the cross validation and the cross database. Also, compared to other conventional models, it is confirmed that the proposed structure is superior in classification performance with less execution time.

Traffic Light Recognition Using a Deep Convolutional Neural Network (심층 합성곱 신경망을 이용한 교통신호등 인식)

  • Kim, Min-Ki
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.11
    • /
    • pp.1244-1253
    • /
    • 2018
  • The color of traffic light is sensitive to various illumination conditions. Especially it loses the hue information when oversaturation happens on the lighting area. This paper proposes a traffic light recognition method robust to these illumination variations. The method consists of two steps of traffic light detection and recognition. It just uses the intensity and saturation in the first step of traffic light detection. It delays the use of hue information until it reaches to the second step of recognizing the signal of traffic light. We utilized a deep learning technique in the second step. We designed a deep convolutional neural network(DCNN) which is composed of three convolutional networks and two fully connected networks. 12 video clips were used to evaluate the performance of the proposed method. Experimental results show the performance of traffic light detection reporting the precision of 93.9%, the recall of 91.6%, and the recognition accuracy of 89.4%. Considering that the maximum distance between the camera and traffic lights is 70m, the results shows that the proposed method is effective.

Surface Water Mapping of Remote Sensing Data Using Pre-Trained Fully Convolutional Network

  • Song, Ah Ram;Jung, Min Young;Kim, Yong Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.36 no.5
    • /
    • pp.423-432
    • /
    • 2018
  • Surface water mapping has been widely used in various remote sensing applications. Water indices have been commonly used to distinguish water bodies from land; however, determining the optimal threshold and discriminating water bodies from similar objects such as shadows and snow is difficult. Deep learning algorithms have greatly advanced image segmentation and classification. In particular, FCN (Fully Convolutional Network) is state-of-the-art in per-pixel image segmentation and are used in most benchmarks such as PASCAL VOC2012 and Microsoft COCO (Common Objects in Context). However, these data sets are designed for daily scenarios and a few studies have conducted on applications of FCN using large scale remotely sensed data set. This paper aims to fine-tune the pre-trained FCN network using the CRMS (Coastwide Reference Monitoring System) data set for surface water mapping. The CRMS provides color infrared aerial photos and ground truth maps for the monitoring and restoration of wetlands in Louisiana, USA. To effectively learn the characteristics of surface water, we used pre-trained the DeepWaterMap network, which classifies water, land, snow, ice, clouds, and shadows using Landsat satellite images. Furthermore, the DeepWaterMap network was fine-tuned for the CRMS data set using two classes: water and land. The fine-tuned network finally classifies surface water without any additional learning process. The experimental results show that the proposed method enables high-quality surface mapping from CRMS data set and show the suitability of pre-trained FCN networks using remote sensing data for surface water mapping.

Residual Learning Based CNN for Gesture Recognition in Robot Interaction

  • Han, Hua
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.385-398
    • /
    • 2021
  • The complexity of deep learning models affects the real-time performance of gesture recognition, thereby limiting the application of gesture recognition algorithms in actual scenarios. Hence, a residual learning neural network based on a deep convolutional neural network is proposed. First, small convolution kernels are used to extract the local details of gesture images. Subsequently, a shallow residual structure is built to share weights, thereby avoiding gradient disappearance or gradient explosion as the network layer deepens; consequently, the difficulty of model optimisation is simplified. Additional convolutional neural networks are used to accelerate the refinement of deep abstract features based on the spatial importance of the gesture feature distribution. Finally, a fully connected cascade softmax classifier is used to complete the gesture recognition. Compared with the dense connection multiplexing feature information network, the proposed algorithm is optimised in feature multiplexing to avoid performance fluctuations caused by feature redundancy. Experimental results from the ISOGD gesture dataset and Gesture dataset prove that the proposed algorithm affords a fast convergence speed and high accuracy.

Development and Evaluation of Automatic Pothole Detection Using Fully Convolutional Neural Networks (완전 합성곱 신경망을 활용한 자동 포트홀 탐지 기술의 개발 및 평가)

  • Chun, Chanjun;Shim, Seungbo;Kang, Sungmo;Ryu, Seung-Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.5
    • /
    • pp.55-64
    • /
    • 2018
  • In this paper, we propose fully convolutional neural networks based automatic detection of a pothole that directly causes driver's safety accidents and the vehicle damage. First, the training DB is collected through the camera installed in the vehicle while driving on the road, and the model is trained in the form of a semantic segmentation using the fully convolutional neural networks. In order to generate robust performance in a dark environment, we augmented the training DB according to brightness, and finally generated a total of 30,000 training images. In addition, a total of 450 evaluation DB was created to verify the performance of the proposed automatic pothole detection, and a total of four experts evaluated each image. As a result, the proposed pothole detection showed robust performance for missing.

Bender Gestalt Test Image Recognition with Convolutional Neural Network (합성곱 신경망을 이용한 Bender Gestalt Test 영상인식)

  • Chang, Won-Du;Yang, Young-Jun;Choi, Seong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.4
    • /
    • pp.455-462
    • /
    • 2019
  • This paper proposes a method of utilizing convolutional neural network to classify the images of Bender Gestalt Test (BGT), which is a tool to understand and analyze a person's characteristic. The proposed network is composed of 29 layers including 18 convolutional layers and 2 fully connected layers, where the network is to be trained with augmented images. To verify the proposed method, 10 fold validation was adopted. In results, the proposed method classified the images into 9 classes with the mean f1 score of 97.05%, which is 13.71%p higher than a previous method. The analysis of the results shows the classification accuracy of the proposed method is stable over all the patterns as the worst f1 score among all the patterns was 92.11%.

A Fully Convolutional Network Model for Classifying Liver Fibrosis Stages from Ultrasound B-mode Images (초음파 B-모드 영상에서 FCN(fully convolutional network) 모델을 이용한 간 섬유화 단계 분류 알고리즘)

  • Kang, Sung Ho;You, Sun Kyoung;Lee, Jeong Eun;Ahn, Chi Young
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.1
    • /
    • pp.48-54
    • /
    • 2020
  • In this paper, we deal with a liver fibrosis classification problem using ultrasound B-mode images. Commonly representative methods for classifying the stages of liver fibrosis include liver biopsy and diagnosis based on ultrasound images. The overall liver shape and the smoothness and roughness of speckle pattern represented in ultrasound images are used for determining the fibrosis stages. Although the ultrasound image based classification is used frequently as an alternative or complementary method of the invasive biopsy, it also has the limitations that liver fibrosis stage decision depends on the image quality and the doctor's experience. With the rapid development of deep learning algorithms, several studies using deep learning methods have been carried out for automated liver fibrosis classification and showed superior performance of high accuracy. The performance of those deep learning methods depends closely on the amount of datasets. We propose an enhanced U-net architecture to maximize the classification accuracy with limited small amount of image datasets. U-net is well known as a neural network for fast and precise segmentation of medical images. We design it newly for the purpose of classifying liver fibrosis stages. In order to assess the performance of the proposed architecture, numerical experiments are conducted on a total of 118 ultrasound B-mode images acquired from 78 patients with liver fibrosis symptoms of F0~F4 stages. The experimental results support that the performance of the proposed architecture is much better compared to the transfer learning using the pre-trained model of VGGNet.

Single Image Depth Estimation With Integration of Parametric Learning and Non-Parametric Sampling

  • Jung, Hyungjoo;Sohn, Kwanghoon
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.9
    • /
    • pp.1659-1668
    • /
    • 2016
  • Understanding 3D structure of scenes is of a great interest in various vision-related tasks. In this paper, we present a unified approach for estimating depth from a single monocular image. The key idea of our approach is to take advantages both of parametric learning and non-parametric sampling method. Using a parametric convolutional network, our approach learns the relation of various monocular cues, which make a coarse global prediction. We also leverage the local prediction to refine the global prediction. It is practically estimated in a non-parametric framework. The integration of local and global predictions is accomplished by concatenating the feature maps of the global prediction with those from local ones. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods both qualitatively and quantitatively.