• Title/Summary/Keyword: Region-based Convolutional Neural Network

Search Result 66, Processing Time 0.025 seconds

Bottleneck-based Siam-CNN Algorithm for Object Tracking (객체 추적을 위한 보틀넥 기반 Siam-CNN 알고리즘)

  • Lim, Su-Chang;Kim, Jong-Chan
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2022
  • Visual Object Tracking is known as the most fundamental problem in the field of computer vision. Object tracking localize the region of target object with bounding box in the video. In this paper, a custom CNN is created to extract object feature that has strong and various information. This network was constructed as a Siamese network for use as a feature extractor. The input images are passed convolution block composed of a bottleneck layers, and features are emphasized. The feature map of the target object and the search area, extracted from the Siamese network, was input as a local proposal network. Estimate the object area using the feature map. The performance of the tracking algorithm was evaluated using the OTB2013 dataset. Success Plot and Precision Plot were used as evaluation matrix. As a result of the experiment, 0.611 in Success Plot and 0.831 in Precision Plot were achieved.

Siamese Network for Learning Robust Feature of Hippocampi

  • Ahmed, Samsuddin;Jung, Ho Yub
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.9-17
    • /
    • 2020
  • Hippocampus is a complex brain structure embedded deep into the temporal lobe. Studies have shown that this structure gets affected by neurological and psychiatric disorders and it is a significant landmark for diagnosing neurodegenerative diseases. Hippocampus features play very significant roles in region-of-interest based analysis for disease diagnosis and prognosis. In this study, we have attempted to learn the embeddings of this important biomarker. As conventional metric learning methods for feature embedding is known to lacking in capturing semantic similarity among the data under study, we have trained deep Siamese convolutional neural network for learning metric of the hippocampus. We have exploited Gwangju Alzheimer's and Related Dementia cohort data set in our study. The input to the network was pairs of three-view patches (TVPs) of size 32 × 32 × 3. The positive samples were taken from the vicinity of a specified landmark for the hippocampus and negative samples were taken from random locations of the brain excluding hippocampi regions. We have achieved 98.72% accuracy in verifying hippocampus TVPs.

VGG-based BAPL Score Classification of 18F-Florbetaben Amyloid Brain PET

  • Kang, Hyeon;Kim, Woong-Gon;Yang, Gyung-Seung;Kim, Hyun-Woo;Jeong, Ji-Eun;Yoon, Hyun-Jin;Cho, Kook;Jeong, Young-Jin;Kang, Do-Young
    • Biomedical Science Letters
    • /
    • v.24 no.4
    • /
    • pp.418-425
    • /
    • 2018
  • Amyloid brain positron emission tomography (PET) images are visually and subjectively analyzed by the physician with a lot of time and effort to determine the ${\beta}$-Amyloid ($A{\beta}$) deposition. We designed a convolutional neural network (CNN) model that predicts the $A{\beta}$-positive and $A{\beta}$-negative status. We performed 18F-florbetaben (FBB) brain PET on controls and patients (n=176) with mild cognitive impairment and Alzheimer's Disease (AD). We classified brain PET images visually as per the on the brain amyloid plaque load score. We designed the visual geometry group (VGG16) model for the visual assessment of slice-based samples. To evaluate only the gray matter and not the white matter, gray matter masking (GMM) was applied to the slice-based standard samples. All the performance metrics were higher with GMM than without GMM (accuracy 92.39 vs. 89.60, sensitivity 87.93 vs. 85.76, and specificity 98.94 vs. 95.32). For the patient-based standard, all the performance metrics were almost the same (accuracy 89.78 vs. 89.21), lower (sensitivity 93.97 vs. 99.14), and higher (specificity 81.67 vs. 70.00). The area under curve with the VGG16 model that observed the gray matter region only was slightly higher than the model that observed the whole brain for both slice-based and patient-based decision processes. Amyloid brain PET images can be appropriately analyzed using the CNN model for predicting the $A{\beta}$-positive and $A{\beta}$-negative status.

Ensemble Learning Based on Tumor Internal and External Imaging Patch to Predict the Recurrence of Non-small Cell Lung Cancer Patients in Chest CT Image (흉부 CT 영상에서 비소세포폐암 환자의 재발 예측을 위한 종양 내외부 영상 패치 기반 앙상블 학습)

  • Lee, Ye-Sel;Cho, A-Hyun;Hong, Helen
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.3
    • /
    • pp.373-381
    • /
    • 2021
  • In this paper, we propose a classification model based on convolutional neural network(CNN) for predicting 2-year recurrence in non-small cell lung cancer(NSCLC) patients using preoperative chest CT images. Based on the region of interest(ROI) defined as the tumor internal and external area, the input images consist of an intratumoral patch, a peritumoral patch and a peritumoral texture patch focusing on the texture information of the peritumoral patch. Each patch is trained through AlexNet pretrained on ImageNet to explore the usefulness and performance of various patches. Additionally, ensemble learning of network trained with each patch analyzes the performance of different patch combination. Compared with all results, the ensemble model with intratumoral and peritumoral patches achieved the best performance (ACC=98.28%, Sensitivity=100%, NPV=100%).

Video smoke detection with block DNCNN and visual change image

  • Liu, Tong;Cheng, Jianghua;Yuan, Zhimin;Hua, Honghu;Zhao, Kangcheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.9
    • /
    • pp.3712-3729
    • /
    • 2020
  • Smoke detection is helpful for early fire detection. With its large coverage area and low cost, vision-based smoke detection technology is the main research direction of outdoor smoke detection. We propose a two-stage smoke detection method combined with block Deep Normalization and Convolutional Neural Network (DNCNN) and visual change image. In the first stage, each suspected smoke region is detected from each frame of the images by using block DNCNN. According to the physical characteristics of smoke diffusion, a concept of visual change image is put forward in this paper, which is constructed by the video motion change state of the suspected smoke regions, and can describe the physical diffusion characteristics of smoke in the time and space domains. In the second stage, the Support Vector Machine (SVM) classifier is used to classify the Histogram of Oriented Gradients (HOG) features of visual change images of the suspected smoke regions, in this way to reduce the false alarm caused by the smoke-like objects such as cloud and fog. Simulation experiments are carried out on two public datasets of smoke. Results show that the accuracy and recall rate of smoke detection are high, and the false alarm rate is much lower than that of other comparison methods.

Vehicle Detection in Dense Area Using UAV Aerial Images (무인 항공기를 이용한 밀집영역 자동차 탐지)

  • Seo, Chang-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.693-698
    • /
    • 2018
  • This paper proposes a vehicle detection method for parking areas using unmanned aerial vehicles (UAVs) and using YOLOv2, which is a recent, known, fast, object-detection real-time algorithm. The YOLOv2 convolutional network algorithm can calculate the probability of each class in an entire image with a one-pass evaluation, and can also predict the location of bounding boxes. It has the advantage of very fast, easy, and optimized-at-detection performance, because the object detection process has a single network. The sliding windows methods and region-based convolutional neural network series detection algorithms use a lot of region proposals and take too much calculation time for each class. So these algorithms have a disadvantage in real-time applications. This research uses the YOLOv2 algorithm to overcome the disadvantage that previous algorithms have in real-time processing problems. Using Darknet, OpenCV, and the Compute Unified Device Architecture as open sources for object detection. a deep learning server is used for the learning and detecting process with each car. In the experiment results, the algorithm could detect cars in a dense area using UAVs, and reduced overhead for object detection. It could be applied in real time.

Convolutional Neural Networks for Rice Yield Estimation Using MODIS and Weather Data: A Case Study for South Korea (MODIS와 기상자료 기반 회선신경망 알고리즘을 이용한 남한 전역 쌀 생산량 추정)

  • Ma, Jong Won;Nguyen, Cong Hieu;Lee, Kyungdo;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.5
    • /
    • pp.525-534
    • /
    • 2016
  • In South Korea, paddy rice has been consumed over the entire region and it is the main source of income for farmers, thus mathematical model for the estimation of rice yield is required for such decision-making processes in agriculture. The objectives of our study are to: (1) develop rice yield estimation model using Convolutional Neural Networks(CNN), (2) choose hyper-parameters for the model which show the best performance and (3) investigate whether CNN model can effectively predict the rice yield by the comparison with the model using Artificial Neural Networks(ANN). Weather and MODIS(The MOderate Resolution Imaging Spectroradiometer) products from April to September in year 2000~2013 were used for the rice yield estimation models and cross-validation was implemented for the accuracy assessment. The CNN and ANN models showed Root Mean Square Error(RMSE) of 36.10kg/10a, 48.61kg/10a based on rice points, respectively and 31.30kg/10a, 39.31kg/10a based on 'Si-Gun-Gu' districts, respectively. The CNN models outperformed ANN models and its possibility of application for the field of rice yield estimation in South Korea was proved.

Attention Deep Neural Networks Learning based on Multiple Loss functions for Video Face Recognition (비디오 얼굴인식을 위한 다중 손실 함수 기반 어텐션 심층신경망 학습 제안)

  • Kim, Kyeong Tae;You, Wonsang;Choi, Jae Young
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.10
    • /
    • pp.1380-1390
    • /
    • 2021
  • The video face recognition (FR) is one of the most popular researches in the field of computer vision due to a variety of applications. In particular, research using the attention mechanism is being actively conducted. In video face recognition, attention represents where to focus on by using the input value of the whole or a specific region, or which frame to focus on when there are many frames. In this paper, we propose a novel attention based deep learning method. Main novelties of our method are (1) the use of combining two loss functions, namely weighted Softmax loss function and a Triplet loss function and (2) the feasibility of end-to-end learning which includes the feature embedding network and attention weight computation. The feature embedding network has a positive effect on the attention weight computation by using combined loss function and end-to-end learning. To demonstrate the effectiveness of our proposed method, extensive and comparative experiments have been carried out to evaluate our method on IJB-A dataset with their standard evaluation protocols. Our proposed method represented better or comparable recognition rate compared to other state-of-the-art video FR methods.

Manchu Script Letters Dataset Creation and Labeling

  • Aaron Daniel Snowberger;Choong Ho Lee
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.1
    • /
    • pp.80-87
    • /
    • 2024
  • The Manchu language holds historical significance, but a complete dataset of Manchu script letters for training optical character recognition machine-learning models is currently unavailable. Therefore, this paper describes the process of creating a robust dataset of extracted Manchu script letters. Rather than performing automatic letter segmentation based on whitespace or the thickness of the central word stem, an image of the Manchu script was manually inspected, and one copy of the desired letter was selected as a region of interest. This selected region of interest was used as a template to match all other occurrences of the same letter within the Manchu script image. Although the dataset in this study contained only 4,000 images of five Manchu script letters, these letters were collected from twenty-eight writing styles. A full dataset of Manchu letters is expected to be obtained through this process. The collected dataset was normalized and trained using a simple convolutional neural network to verify its effectiveness.

Building Detection by Convolutional Neural Network with Infrared Image, LiDAR Data and Characteristic Information Fusion (적외선 영상, 라이다 데이터 및 특성정보 융합 기반의 합성곱 인공신경망을 이용한 건물탐지)

  • Cho, Eun Ji;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.6
    • /
    • pp.635-644
    • /
    • 2020
  • Object recognition, detection and instance segmentation based on DL (Deep Learning) have being used in various practices, and mainly optical images are used as training data for DL models. The major objective of this paper is object segmentation and building detection by utilizing multimodal datasets as well as optical images for training Detectron2 model that is one of the improved R-CNN (Region-based Convolutional Neural Network). For the implementation, infrared aerial images, LiDAR data, and edges from the images, and Haralick features, that are representing statistical texture information, from LiDAR (Light Detection And Ranging) data were generated. The performance of the DL models depends on not only on the amount and characteristics of the training data, but also on the fusion method especially for the multimodal data. The results of segmenting objects and detecting buildings by applying hybrid fusion - which is a mixed method of early fusion and late fusion - results in a 32.65% improvement in building detection rate compared to training by optical image only. The experiments demonstrated complementary effect of the training multimodal data having unique characteristics and fusion strategy.