• Title/Summary/Keyword: Deep Learning Model

Search Result 2,840, Processing Time 0.03 seconds

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

Development of Prediction Model for Yard Tractor Working Time in Container Terminal (컨테이너 터미널 야드 트랙터 작업시간 예측 모형 개발)

  • Jae-Young Shin;Do-Eun Lee;Yeong-Il Kim
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.57-58
    • /
    • 2023
  • The working time for loading and transporting containers in the container terminal is one of the factors directly related to port productivity, and minimizing working time for these operations can maximize port productivity. Among working time for container operations, the working time of yard tractors(Y/T) responsible for the transportation of containers between berth and yard is a significant portion. However, it is difficult to estimate the working time of yard tractors quantitatively, although it is possible to estimate it based on the practical experience of terminal operators. Recently, a technology based on IoT(Internet of Things), one of the core technologies of the 4th industrial revolution, is being studied to monitoring and tracking logistics resources within the port in real-time and calculate working time, but it is challenging to commercialize this technology at the actual port site. Therefore, this study aims to develop yard tractor working time prediction model to enhance the operational efficiency of the container terminal. To develop the prediction model, we analyze actual port operation data to identify factors that affect the yard tractor's works and predict its working time accordingly.

  • PDF

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Calculation of Damage to Whole Crop Corn Yield by Abnormal Climate Using Machine Learning (기계학습모델을 이용한 이상기상에 따른 사일리지용 옥수수 생산량에 미치는 피해 산정)

  • Ji Yung Kim;Jae Seong Choi;Hyun Wook Jo;Moonju Kim;Byong Wan Kim;Kyung Il Sung
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.43 no.1
    • /
    • pp.11-21
    • /
    • 2023
  • This study was conducted to estimate the damage of Whole Crop Corn (WCC; Zea Mays L.) according to abnormal climate using machine learning as the Representative Concentration Pathway (RCP) 4.5 and present the damage through mapping. The collected WCC data was 3,232. The climate data was collected from the Korea Meteorological Administration's meteorological data open portal. The machine learning model used DeepCrossing. The damage was calculated using climate data from the automated synoptic observing system (ASOS, 95 sites) by machine learning. The calculation of damage was the difference between the dry matter yield (DMY)normal and DMYabnormal. The normal climate was set as the 40-year of climate data according to the year of WCC data (1978-2017). The level of abnormal climate by temperature and precipitation was set as RCP 4.5 standard. The DMYnormal ranged from 13,845-19,347 kg/ha. The damage of WCC which was differed depending on the region and level of abnormal climate where abnormal temperature and precipitation occurred. The damage of abnormal temperature in 2050 and 2100 ranged from -263 to 360 and -1,023 to 92 kg/ha, respectively. The damage of abnormal precipitation in 2050 and 2100 was ranged from -17 to 2 and -12 to 2 kg/ha, respectively. The maximum damage was 360 kg/ha that the abnormal temperature in 2050. As the average monthly temperature increases, the DMY of WCC tends to increase. The damage calculated through the RCP 4.5 standard was presented as a mapping using QGIS. Although this study applied the scenario in which greenhouse gas reduction was carried out, additional research needs to be conducted applying an RCP scenario in which greenhouse gas reduction is not performed.

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

Development of Intelligent Severity of Atopic Dermatitis Diagnosis Model using Convolutional Neural Network (합성곱 신경망(Convolutional Neural Network)을 활용한 지능형 아토피피부염 중증도 진단 모델 개발)

  • Yoon, Jae-Woong;Chun, Jae-Heon;Bang, Chul-Hwan;Park, Young-Min;Kim, Young-Joo;Oh, Sung-Min;Jung, Joon-Ho;Lee, Suk-Jun;Lee, Ji-Hyun
    • Management & Information Systems Review
    • /
    • v.36 no.4
    • /
    • pp.33-51
    • /
    • 2017
  • With the advent of 'The Forth Industrial Revolution' and the growing demand for quality of life due to economic growth, needs for the quality of medical services are increasing. Artificial intelligence has been introduced in the medical field, but it is rarely used in chronic skin diseases that directly affect the quality of life. Also, atopic dermatitis, a representative disease among chronic skin diseases, has a disadvantage in that it is difficult to make an objective diagnosis of the severity of lesions. The aim of this study is to establish an intelligent severity recognition model of atopic dermatitis for improving the quality of patient's life. For this, the following steps were performed. First, image data of patients with atopic dermatitis were collected from the Catholic University of Korea Seoul Saint Mary's Hospital. Refinement and labeling were performed on the collected image data to obtain training and verification data that suitable for the objective intelligent atopic dermatitis severity recognition model. Second, learning and verification of various CNN algorithms are performed to select an image recognition algorithm that suitable for the objective intelligent atopic dermatitis severity recognition model. Experimental results showed that 'ResNet V1 101' and 'ResNet V2 50' were measured the highest performance with Erythema and Excoriation over 90% accuracy, and 'VGG-NET' was measured 89% accuracy lower than the two lesions due to lack of training data. The proposed methodology demonstrates that the image recognition algorithm has high performance not only in the field of object recognition but also in the medical field requiring expert knowledge. In addition, this study is expected to be highly applicable in the field of atopic dermatitis due to it uses image data of actual atopic dermatitis patients.

  • PDF

A LSTM Based Method for Photovoltaic Power Prediction in Peak Times Without Future Meteorological Information (미래 기상정보를 사용하지 않는 LSTM 기반의 피크시간 태양광 발전량 예측 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.4
    • /
    • pp.119-133
    • /
    • 2019
  • Recently, the importance prediction of photovoltaic power (PV) is considered as an essential function for scheduling adjustments, deciding on storage size, and overall planning for stable operation of PV facility systems. In particular, since most of PV power is generated in peak time, PV power prediction in a peak time is required for the PV system operators that enable to maximize revenue and sustainable electricity quantity. Moreover, Prediction of the PV power output in peak time without meteorological information such as solar radiation, cloudiness, the temperature is considered a challenging problem because it has limitations that the PV power was predicted by using predicted uncertain meteorological information in a wide range of areas in previous studies. Therefore, this paper proposes the LSTM (Long-Short Term Memory) based the PV power prediction model only using the meteorological, seasonal, and the before the obtained PV power before peak time. In this paper, the experiment results based on the proposed model using the real-world data shows the superior performance, which showed a positive impact on improving the PV power in a peak time forecast performance targeted in this study.

Implementation of Interactive Media Content Production Framework based on Gesture Recognition (제스처 인식 기반의 인터랙티브 미디어 콘텐츠 제작 프레임워크 구현)

  • Koh, You-jin;Kim, Tae-Won;Kim, Yong-Goo;Choi, Yoo-Joo
    • Journal of Broadcast Engineering
    • /
    • v.25 no.4
    • /
    • pp.545-559
    • /
    • 2020
  • In this paper, we propose a content creation framework that enables users without programming experience to easily create interactive media content that responds to user gestures. In the proposed framework, users define the gestures they use and the media effects that respond to them by numbers, and link them in a text-based configuration file. In the proposed framework, the interactive media content that responds to the user's gesture is linked with the dynamic projection mapping module to track the user's location and project the media effects onto the user. To reduce the processing speed and memory burden of the gesture recognition, the user's movement is expressed as a gray scale motion history image. We designed a convolutional neural network model for gesture recognition using motion history images as input data. The number of network layers and hyperparameters of the convolutional neural network model were determined through experiments that recognize five gestures, and applied to the proposed framework. In the gesture recognition experiment, we obtained a recognition accuracy of 97.96% and a processing speed of 12.04 FPS. In the experiment connected with the three media effects, we confirmed that the intended media effect was appropriately displayed in real-time according to the user's gesture.