• Title/Summary/Keyword: deep-learning

Search Result 5,450, Processing Time 0.035 seconds

A modified U-net for crack segmentation by Self-Attention-Self-Adaption neuron and random elastic deformation

  • Zhao, Jin;Hu, Fangqiao;Qiao, Weidong;Zhai, Weida;Xu, Yang;Bao, Yuequan;Li, Hui
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.1-16
    • /
    • 2022
  • Despite recent breakthroughs in deep learning and computer vision fields, the pixel-wise identification of tiny objects in high-resolution images with complex disturbances remains challenging. This study proposes a modified U-net for tiny crack segmentation in real-world steel-box-girder bridges. The modified U-net adopts the common U-net framework and a novel Self-Attention-Self-Adaption (SASA) neuron as the fundamental computing element. The Self-Attention module applies softmax and gate operations to obtain the attention vector. It enables the neuron to focus on the most significant receptive fields when processing large-scale feature maps. The Self-Adaption module consists of a multiplayer perceptron subnet and achieves deeper feature extraction inside a single neuron. For data augmentation, a grid-based crack random elastic deformation (CRED) algorithm is designed to enrich the diversities and irregular shapes of distributed cracks. Grid-based uniform control nodes are first set on both input images and binary labels, random offsets are then employed on these control nodes, and bilinear interpolation is performed for the rest pixels. The proposed SASA neuron and CRED algorithm are simultaneously deployed to train the modified U-net. 200 raw images with a high resolution of 4928 × 3264 are collected, 160 for training and the rest 40 for the test. 512 × 512 patches are generated from the original images by a sliding window with an overlap of 256 as inputs. Results show that the average IoU between the recognized and ground-truth cracks reaches 0.409, which is 29.8% higher than the regular U-net. A five-fold cross-validation study is performed to verify that the proposed method is robust to different training and test images. Ablation experiments further demonstrate the effectiveness of the proposed SASA neuron and CRED algorithm. Promotions of the average IoU individually utilizing the SASA and CRED module add up to the final promotion of the full model, indicating that the SASA and CRED modules contribute to the different stages of model and data in the training process.

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

  • Jo, Min Su;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.675-681
    • /
    • 2021
  • With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.

Development of Graph based Deep Learning methods for Enhancing the Semantic Integrity of Spaces in BIM Models (BIM 모델 내 공간의 시멘틱 무결성 검증을 위한 그래프 기반 딥러닝 모델 구축에 관한 연구)

  • Lee, Wonbok;Kim, Sihyun;Yu, Youngsu;Koo, Bonsang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.3
    • /
    • pp.45-55
    • /
    • 2022
  • BIM models allow building spaces to be instantiated and recognized as unique objects independently of model elements. These instantiated spaces provide the required semantics that can be leveraged for building code checking, energy analysis, and evacuation route analysis. However, theses spaces or rooms need to be designated manually, which in practice, lead to errors and omissions. Thus, most BIM models today does not guarantee the semantic integrity of space designations, limiting their potential applicability. Recent studies have explored ways to automate space allocation in BIM models using artificial intelligence algorithms, but they are limited in their scope and relatively low classification accuracy. This study explored the use of Graph Convolutional Networks, an algorithm exclusively tailored for graph data structures. The goal was to utilize not only geometry information but also the semantic relational data between spaces and elements in the BIM model. Results of the study confirmed that the accuracy was improved by about 8% compared to algorithms that only used geometric distinctions of the individual spaces.

Performance Evaluation of YOLOv5s for Brain Hemorrhage Detection Using Computed Tomography Images (전산화단층영상 기반 뇌출혈 검출을 위한 YOLOv5s 성능 평가)

  • Kim, Sungmin;Lee, Seungwan
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.1
    • /
    • pp.25-34
    • /
    • 2022
  • Brain computed tomography (CT) is useful for brain lesion diagnosis, such as brain hemorrhage, due to non-invasive methodology, 3-dimensional image provision, low radiation dose. However, there has been numerous misdiagnosis owing to a lack of radiologist and heavy workload. Recently, object detection technologies based on artificial intelligence have been developed in order to overcome the limitations of traditional diagnosis. In this study, the applicability of a deep learning-based YOLOv5s model was evaluated for brain hemorrhage detection using brain CT images. Also, the effect of hyperparameters in the trained YOLOv5s model was analyzed. The YOLOv5s model consisted of backbone, neck and output modules. The trained model was able to detect a region of brain hemorrhage and provide the information of the region. The YOLOv5s model was trained with various activation functions, optimizer functions, loss functions and epochs, and the performance of the trained model was evaluated in terms of brain hemorrhage detection accuracy and training time. The results showed that the trained YOLOv5s model is able to provide a bounding box for a region of brain hemorrhage and the accuracy of the corresponding box. The performance of the YOLOv5s model was improved by using the mish activation function, the stochastic gradient descent (SGD) optimizer function and the completed intersection over union (CIoU) loss function. Also, the accuracy and training time of the YOLOv5s model increased with the number of epochs. Therefore, the YOLOv5s model is suitable for brain hemorrhage detection using brain CT images, and the performance of the model can be maximized by using appropriate hyperparameters.

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.

Design and Implementation of Sandcastle Play Guide Application using Artificial Intelligence and Augmented Reality (인공지능과 증강현실 기술을 이용한 모래성 놀이 가이드 애플리케이션 설계 및 구현)

  • Ryu, Jeeseung;Jang, Seungwoo;Mun, Yujeong;Lee, Jungjin
    • Journal of the Korea Computer Graphics Society
    • /
    • v.28 no.3
    • /
    • pp.79-89
    • /
    • 2022
  • With the popularity and the advanced graphics hardware technology of mobile devices, various mobile applications that help children with physical activities have been studied. This paper presents SandUp, a mobile application that guides the play of building sand castles using artificial intelligence and augmented reality(AR) technology. In the process of building the sandcastle, children can interactively explore the target virtual sandcastle through the smartphone display using AR technology. In addition, to help children complete the sandcastle, SandUp informs the sand shape and task required step by step and provides visual and auditory feedback while recognizing progress in real-time using the phone's camera and deep learning classification. We prototyped our SandUp app using Flutter and TensorFlow Lite. To evaluate the usability and effectiveness of the proposed SandUp, we conducted a questionnaire survey on 50 adults and a user study on 20 children aged 4~7 years. The survey results showed that SandUp effectively helps build the sandcastle with proper interactive guidance. Based on the results from the user study on children and feedback from their parents, we also derived usability issues that can be further improved and suggested future research directions.

Training of a Siamese Network to Build a Tracker without Using Tracking Labels (샴 네트워크를 사용하여 추적 레이블을 사용하지 않는 다중 객체 검출 및 추적기 학습에 관한 연구)

  • Kang, Jungyu;Song, Yoo-Seung;Min, Kyoung-Wook;Choi, Jeong Dan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.5
    • /
    • pp.274-286
    • /
    • 2022
  • Multi-object tracking has been studied for a long time under computer vision and plays a critical role in applications such as autonomous driving and driving assistance. Multi-object tracking techniques generally consist of a detector that detects objects and a tracker that tracks the detected objects. Various publicly available datasets allow us to train a detector model without much effort. However, there are relatively few publicly available datasets for training a tracker model, and configuring own tracker datasets takes a long time compared to configuring detector datasets. Hence, the detector is often developed separately with a tracker module. However, the separated tracker should be adjusted whenever the former detector model is changed. This study proposes a system that can train a model that performs detection and tracking simultaneously using only the detector training datasets. In particular, a Siam network with augmentation is used to compose the detector and tracker. Experiments are conducted on public datasets to verify that the proposed algorithm can formulate a real-time multi-object tracker comparable to the state-of-the-art tracker models.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

Fake News Detection on YouTube Using Related Video Information (관련 동영상 정보를 활용한 YouTube 가짜뉴스 탐지 기법)

  • Junho Kim;Yongjun Shin;Hyunchul Ahn
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.19-36
    • /
    • 2023
  • As advances in information and communication technology have made it easier for anyone to produce and disseminate information, a new problem has emerged: fake news, which is false information intentionally shared to mislead people. Initially spread mainly through text, fake news has gradually evolved and is now distributed in multimedia formats. Since its founding in 2005, YouTube has become the world's leading video platform and is used by most people worldwide. However, it has also become a primary source of fake news, causing social problems. Various researchers have been working on detecting fake news on YouTube. There are content-based and background information-based approaches to fake news detection. Still, content-based approaches are dominant when looking at conventional fake news research and YouTube fake news detection research. This study proposes a fake news detection method based on background information rather than content-based fake news detection. In detail, we suggest detecting fake news by utilizing related video information from YouTube. Specifically, the method detects fake news through CNN, a deep learning network, from the vectorized information obtained from related videos and the original video using Doc2vec, an embedding technique. The empirical analysis shows that the proposed method has better prediction performance than the existing content-based approach to detecting fake news on YouTube. The proposed method in this study contributes to making our society safer and more reliable by preventing the spread of fake news on YouTube, which is highly contagious.

Content-based Korean journal recommendation system using Sentence BERT (Sentence BERT를 이용한 내용 기반 국문 저널추천 시스템)

  • Yongwoo Kim;Daeyoung Kim;Hyunhee Seo;Young-Min Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.37-55
    • /
    • 2023
  • With the development of electronic journals and the emergence of various interdisciplinary studies, the selection of journals for publication has become a new challenge for researchers. Even if a paper is of high quality, it may face rejection due to a mismatch between the paper's topic and the scope of the journal. While research on assisting researchers in journal selection has been actively conducted in English, the same cannot be said for Korean journals. In this study, we propose a system that recommends Korean journals for submission. Firstly, we utilize SBERT (Sentence BERT) to embed abstracts of previously published papers at the document level, compare the similarity between new documents and published papers, and recommend journals accordingly. Next, the order of recommended journals is determined by considering the similarity of abstracts, keywords, and title. Subsequently, journals that are similar to the top recommended journal from previous stage are added by using a dictionary of words constructed for each journal, thereby enhancing recommendation diversity. The recommendation system, built using this approach, achieved a Top-10 accuracy level of 76.6%, and the validity of the recommendation results was confirmed through user feedback. Furthermore, it was found that each step of the proposed framework contributes to improving recommendation accuracy. This study provides a new approach to recommending academic journals in the Korean language, which has not been actively studied before, and it has also practical implications as the proposed framework can be easily applied to services.