• Title/Summary/Keyword: deep-learning dataset

Search Result 803, Processing Time 0.023 seconds

Classification of bearded seals signal based on convolutional neural network (Convolutional neural network 기법을 이용한 턱수염물범 신호 판별)

  • Kim, Ji Seop;Yoon, Young Geul;Han, Dong-Gyun;La, Hyoung Sul;Choi, Jee Woong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.235-241
    • /
    • 2022
  • Several studies using Convolutional Neural Network (CNN) have been conducted to detect and classify the sounds of marine mammals in underwater acoustic data collected through passive acoustic monitoring. In this study, the possibility of automatic classification of bearded seal sounds was confirmed using a CNN model based on the underwater acoustic spectrogram images collected from August 2017 to August 2018 in East Siberian Sea. When only the clear seal sound was used as training dataset, overfitting due to memorization was occurred. By evaluating the entire training data by replacing some training data with data containing noise, it was confirmed that overfitting was prevented as the model was generalized more than before with accuracy (0.9743), precision (0.9783), recall (0.9520). As a result, the performance of the classification model for bearded seals signal has improved when the noise was included in the training data.

A Study on the building Dataset of Similar Case Matching in Legal Domain using Deep Learning Algorithm (딥러닝 알고리즘을 이용한 유사 판례 매칭 데이터셋 구축 방안 연구)

  • Kang, Ye-Jee;Kang, Hye-Rin;Park, Seo-Yoon;Jang, Yeon-Ji;Kim, Han-Saem
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.72-76
    • /
    • 2021
  • 판례는 일반인 또는 법률 전문가가 사건에 참조하기 위해 가장 먼저 참고할 수 있는 재판의 선례이다. 하지만 이러한 판례의 유용성에도 불구하고 현 대법원 판례 검색 시스템은 판례 검색에 용이하지 않다. 왜냐하면 법률 전문 지식이 없는 일반인은 검색 의도에 부합하는 검색 결과를 정확히 도출하는 데 어려움이 있으며, 법률 전문가는 검색에 많은 시간과 비용이 들게 되기 때문이다. 이미 해외에서는 유사 케이스 매칭 데이터셋을 구축하여 일반인과 전문가로 하여금 유사 판례 검색을 용이하게 할 뿐만 아니라 여러 자연어 처리 태스크에도 활용하고 있다. 하지만 국내에는 법률 AI와 관련하여 오직 법률과 관련한 세부 태스크 수행에 초점을 맞춘 연구가 많으며, 리소스로서의 유사 케이스 매칭 데이터셋은 구축되어 있지 않다. 이에 본 논문에서는 리소스로서의 판례 데이터셋을 위해 딥러닝 알고리즘 중 문서의 의미를 반영할 수 있는 Doc2Vec 임베딩 모델과 SBERT 임베딩 모델을 적용하여 판례 문서 간 유사도를 측정·비교하였다. 그 결과 SBERT 모델을 통해 도출된 유사 판례가 문서 간 내용적 유사성이 높게 나타났으며, 이를 통해 SBERT 모델을 이용하여 유사 판례 매칭 기초 데이터셋을 구축하였다.

  • PDF

Multi-Class Multi-Object Tracking in Aerial Images Using Uncertainty Estimation

  • Hyeongchan Ham;Junwon Seo;Junhee Kim;Chungsu Jang
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.1
    • /
    • pp.115-122
    • /
    • 2024
  • Multi-object tracking (MOT) is a vital component in understanding the surrounding environments. Previous research has demonstrated that MOT can successfully detect and track surrounding objects. Nonetheless, inaccurate classification of the tracking objects remains a challenge that needs to be solved. When an object approaching from a distance is recognized, not only detection and tracking but also classification to determine the level of risk must be performed. However, considering the erroneous classification results obtained from the detection as the track class can lead to performance degradation problems. In this paper, we discuss the limitations of classification in tracking under the classification uncertainty of the detector. To address this problem, a class update module is proposed, which leverages the class uncertainty estimation of the detector to mitigate the classification error of the tracker. We evaluated our approach on the VisDrone-MOT2021 dataset,which includes multi-class and uncertain far-distance object tracking. We show that our method has low certainty at a distant object, and quickly classifies the class as the object approaches and the level of certainty increases.In this manner, our method outperforms previous approaches across different detectors. In particular, the You Only Look Once (YOLO)v8 detector shows a notable enhancement of 4.33 multi-object tracking accuracy (MOTA) in comparison to the previous state-of-the-art method. This intuitive insight improves MOT to track approaching objects from a distance and quickly classify them.

Semantic Segmentation of Drone Images Based on Combined Segmentation Network Using Multiple Open Datasets (개방형 다중 데이터셋을 활용한 Combined Segmentation Network 기반 드론 영상의 의미론적 분할)

  • Ahram Song
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.967-978
    • /
    • 2023
  • This study proposed and validated a combined segmentation network (CSN) designed to effectively train on multiple drone image datasets and enhance the accuracy of semantic segmentation. CSN shares the entire encoding domain to accommodate the diversity of three drone datasets, while the decoding domains are trained independently. During training, the segmentation accuracy of CSN was lower compared to U-Net and the pyramid scene parsing network (PSPNet) on single datasets because it considers loss values for all dataset simultaneously. However, when applied to domestic autonomous drone images, CSN demonstrated the ability to classify pixels into appropriate classes without requiring additional training, outperforming PSPNet. This research suggests that CSN can serve as a valuable tool for effectively training on diverse drone image datasets and improving object recognition accuracy in new regions.

Application of Mask R-CNN Algorithm to Detect Cracks in Concrete Structure (콘크리트 구조체 균열 탐지에 대한 Mask R-CNN 알고리즘 적용성 평가)

  • Bae, Byongkyu;Choi, Yongjin;Yun, Kangho;Ahn, Jaehun
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.3
    • /
    • pp.33-39
    • /
    • 2024
  • Inspecting cracks to determine a structure's condition is crucial for accurate safety diagnosis. However, visual crack inspection methods can be subjective and are dependent on field conditions, thereby resulting in low reliability. To address this issue, this study automates the detection of concrete cracks in image data using ResNet, FPN, and the Mask R-CNN components as the backbone, neck, and head of a convolutional neural network. The performance of the proposed model is analyzed using the intersection over the union (IoU). The experimental dataset contained 1,203 images divided into training (70%), validation (20%), and testing (10%) sets. The model achieved an IoU value of 95.83% for testing, and there were no cases where the crack was not detected. These findings demonstrate that the proposed model realized highly accurate detection of concrete cracks in image data.

Computing machinery techniques for performance prediction of TBM using rock geomechanical data in sedimentary and volcanic formations

  • Hanan Samadi;Arsalan Mahmoodzadeh;Shtwai Alsubai;Abdullah Alqahtani;Abed Alanazi;Ahmed Babeker Elhag
    • Geomechanics and Engineering
    • /
    • v.37 no.3
    • /
    • pp.223-241
    • /
    • 2024
  • Evaluating the performance of Tunnel Boring Machines (TBMs) stands as a pivotal juncture in the domain of hard rock mechanized tunneling, essential for achieving both a dependable construction timeline and utilization rate. In this investigation, three advanced artificial neural networks namely, gated recurrent unit (GRU), back propagation neural network (BPNN), and simple recurrent neural network (SRNN) were crafted to prognosticate TBM-rate of penetration (ROP). Drawing from a dataset comprising 1125 data points amassed during the construction of the Alborze Service Tunnel, the study commenced. Initially, five geomechanical parameters were scrutinized for their impact on TBM-ROP efficiency. Subsequent statistical analyses narrowed down the effective parameters to three, including uniaxial compressive strength (UCS), peak slope index (PSI), and Brazilian tensile strength (BTS). Among the methodologies employed, GRU emerged as the most robust model, demonstrating exceptional predictive prowess for TBM-ROP with staggering accuracy metrics on the testing subset (R2 = 0.87, NRMSE = 6.76E-04, MAD = 2.85E-05). The proposed models present viable solutions for analogous ground and TBM tunneling scenarios, particularly beneficial in routes predominantly composed of volcanic and sedimentary rock formations. Leveraging forecasted parameters holds the promise of enhancing both machine efficiency and construction safety within TBM tunneling endeavors.

LH-FAS v2: Head Pose Estimation-Based Lightweight Face Anti-Spoofing (LH-FAS v2: 머리 자세 추정 기반 경량 얼굴 위조 방지 기술)

  • Hyeon-Beom Heo;Hye-Ri Yang;Sung-Uk Jung;Kyung-Jae Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.309-316
    • /
    • 2024
  • Facial recognition technology is widely used in various fields but faces challenges due to its vulnerability to fraudulent activities such as photo spoofing. Extensive research has been conducted to overcome this challenge. Most of them, however, require the use of specialized equipment like multi-modal cameras or operation in high-performance environments. In this paper, we introduce LH-FAS v2 (: Lightweight Head-pose-based Face Anti-Spoofing v2), a system designed to operate on a commercial webcam without any specialized equipment, to address the issue of facial recognition spoofing. LH-FAS v2 utilizes FSA-Net for head pose estimation and ArcFace for facial recognition, effectively assessing changes in head pose and verifying facial identity. We developed the VD4PS dataset, incorporating photo spoofing scenarios to evaluate the model's performance. The experimental results show the model's balanced accuracy and speed, indicating that head pose estimation-based facial anti-spoofing technology can be effectively used to counteract photo spoofing.

Three-Dimensional Convolutional Vision Transformer for Sign Language Translation (수어 번역을 위한 3차원 컨볼루션 비전 트랜스포머)

  • Horyeor Seong;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.3
    • /
    • pp.140-147
    • /
    • 2024
  • In the Republic of Korea, people with hearing impairments are the second-largest demographic within the registered disability community, following those with physical disabilities. Despite this demographic significance, research on sign language translation technology is limited due to several reasons including the limited market size and the lack of adequately annotated datasets. Despite the difficulties, a few researchers continue to improve the performacne of sign language translation technologies by employing the recent advance of deep learning, for example, the transformer architecture, as the transformer-based models have demonstrated noteworthy performance in tasks such as action recognition and video classification. This study focuses on enhancing the recognition performance of sign language translation by combining transformers with 3D-CNN. Through experimental evaluations using the PHOENIX-Wether-2014T dataset [1], we show that the proposed model exhibits comparable performance to existing models in terms of Floating Point Operations Per Second (FLOPs).

Research on Ocular Data Analysis and Eye Tracking in Divers

  • Ye Jun Lee;Yong Kuk Kim;Da Young Kim;Jeongtack Min;Min-Kyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.8
    • /
    • pp.43-51
    • /
    • 2024
  • This paper proposes a method for acquiring and analyzing ocular data using a special-purpose diver mask targeted at divers who primarily engage in underwater activities. This involves tracking the user's gaze with the help of a custom-built ocular dataset and a YOLOv8-nano model developed for this purpose. The model achieved an average processing time of 45.52ms per frame and successfully recognized states of eyes being open or closed with 99% accuracy. Based on the analysis of the ocular data, a gaze tracking algorithm was developed that can map to real-world coordinates. The validation of this algorithm showed an average error rate of about 1% on the x-axis and about 6% on the y-axis.

Driver Group Clustering Technique and Risk Estimation Method for Traffic Accident Prevention

  • Tae-Wook Kim;Ji-Woong Yang;Hyeon-Jin Jung;Han-Jin Lee;Ellen J. Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.8
    • /
    • pp.53-58
    • /
    • 2024
  • Traffic accidents are not only a threat to human lives but also pose significant societal costs. Recently, research has been conducted to address the issue of traffic accidents by predicting the risk using deep learning technology and spatiotemporal information of roads. However, while traffic accidents are influenced not only by the spatiotemporal information of roads but also by human factors, research on the latter has been relatively less active. This paper analyzes driver groups and characteristics by applying clustering techniques to a traffic accident dataset and proposes and applies a method to calculate the Risk Level for each driver group and characteristic. In this process, the preprocessing technique suggested in this paper demonstrates a higher Silhouette Score of 0.255 compared to the commonly used One-Hot Embedding & Min-Max Scaling techniques, indicating its suitability as a preprocessing method.