Search | Korea Science

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Target Scattering Echo Simulation by Geometry Acoustic Theory (GAT(Geometry Acoustic Theory)에 의한 표적신호 합성)

신기철
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.473-476
- /
- 1998
본 연구에서는 GAT(Geometry Acoustic Theory)를 이용한 표적신호 합성모델의 이론적 배경을 제시하고, 수치모델의 결과를 음향수조에서 축소표적 실험자료 결과와 비교한다. GAT에 의한 표적신호 합성모델은 3차원 해양환경에서 음원과 표적에 의한 음장을 적절히 묘사할 뿐만 아니라 표적 형상에 의한 효과를 정밀하게 계산함으로써 고 정밀도의 표적신호 합성을 가능하게 한다.
PDF

Composition-based Simulation Speedup Methodology (모델합성 기법을 이용한 시뮬레이션 속도 개선)

이완복;김탁곤
- Proceedings of the Korea Society for Simulation Conference
- /
- 2002.11a
- /
- pp.91-97
- /
- 2002
DEVS 형식론을 비롯한 모듈러한 시스템 모델링 방법은 복잡한 시스템을 모델링 할 때 유리하다. 반면에, 모듈러한 구성요소 모델들은 타 구성요소 모델의 상태 정보를 참조, 복사함으로써 빈번한 메시지 전달을 야기 시켜 시뮬레이션 속도가 저하되는 단점이 있다. 모델 합성법(Composition)은 여러 개의 요소모델들을 하나로 통합시키는 연산으로서 시스템 검증 분야에서 많이 사용되어져 왔다. 본 논문은 모델 합성법을 이용하여 구성요소 모델들 간에 주고받는 메시지 수를 줄이고 시뮬레이션 속도를 개선시키는 방법을 제안한다. 간단한 예제를 통하여 제안한 방법을 자세히 보여주고자 한다.
PDF

Composite Beam Element for Nonlinear Seismic Analysis of Steel Frames (강재 골조의 비선형 지진해석을 위한 합성 보 요소)

Kim, Kee Dong;Ko, Man Gi;Yi, Gyu Sei;Hwang, Byoung Kuk
- Journal of Korean Society of Steel Construction
- /
- v.14 no.5 s.60
- /
- pp.577-591
- /
- 2002
This study presented a composite beam element for modeling the inelastic behavior of the steel beam, which has composite slabs in steel moment frames that are subjected to earthquake ground motions. The effects of composite slabs on the seismic behavior of steel moment frames were investigated. The element can be considered as a single-component series hinge type model whose predicted analytical results were consistent with the experimental results. Likewise, the element showed a significantly better performance than the bare steel beam elements. The composite model can also predict more accurately the local deformation demands and overall response of structural systems under earthquake loading compared with the bare steel models. Therefore, composite stabs can significantly affect locally and globally predicted responses of steel moment frames.
PDF KSCI

A Comparative Analysis of Target Strength Estimated Models for Underwater Echo Signal Synthesis (수중 반사신호 합성을 위한 표적강도 예측모델 비교분석)

김부일
- Journal of the Korea Institute of Military Science and Technology
- /
- v.4 no.1
- /
- pp.93-103
- /
- 2001
A reflection signal in an active sonar using a high frequency is mainly formed of a specular reflection from the surface of an object along with several equivalent scatters inside, which are characterized by the spatial distribution of the highlight on the object. This study analyze the existing echo signal synthesis models eq, random distribution model, equivalent interval distribution model & MUTAHID(Modified Underwater TArget by HIlight Distribution) model for simulated target, and compare the characteristics of the reflected signal synthesis results for each model in various conditions. These highlight distribution models can be efficiently applied to the simulated target signals synthesis of various real systems requiring the echo signal synthesis on the underwater target.
PDF

Time-Dependent Behavior of Partially Composite Beams (부분 강합성보의 시간의존적 거동해석)

곽효경;서영재
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.13 no.4
- /
- pp.461-473
- /
- 2000
This paper deals with a numerical model for the time-dependent analysis of steel and concrete composite beams with partial shear connection. A linear partial interaction theory is adopted in formulation of structural slip behavior, and the effect of concrete creep and shrinkage are considered. The proposed model is effective in simulating the slip behavior, combined with concrete creep and shrinkage, of multi-span continuous composite beams. Finally, correlation studies and several parameter studies are conducted with the objective to establish the validity of the proposed model.
PDF

Facial Expression Synthesis Using 3D Facial Modeling (3차원 얼굴 모델 링 을 이 용한 표정 합성)

심연숙;변혜란;정찬섭
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 1998.11a
- /
- pp.40-44
- /
- 1998
사용자에 게 친근감 있는 인터페이스를 제공하기 위해 자연스러운 얼굴 애니메이션에 대한 연구가 활발히 진행 중이다.[5][6] 본 논문에서는 자연스러운 얼굴의 표정 합성을 위한 애니메이션 방법 을 제안하였다. 특정한 사람을 모델로 한 얼굴 애니메이션을 위하여 우선 3차원 메쉬로 구성된 일반 모델(generic model)을 특정 사람에게 정합 하여 특정인의 3차원 얼굴 모델을 얻을 수 있다 본 논문에서는 한국인의 자연스러운 얼굴 표정합성을 위하여, 한국인의 표준얼굴에 관한 연구결과를 토대로 한국인 얼굴의 특징을 반영한 일반모델을 만들고 이를 이용하여 특정인의 3차원 얼굴 모델을 얻을 수 있도록 하였다. 실제 얼굴의 근육 및 피부 조직 등 해부학적 구조에 기반 한 표정 합성방법을 사용하여 현실감 있고 자연스러운 얼굴 애니메이션이 이루어질 수 있도록 하였다.
PDF

Experimental Study on the Active Controller of Structures Considering Modeling Uncertainty (구조물의 모델링 불확실성을 고려한 능동 제어기의 실험연구)

민경원;김성춘
- Journal of the Earthquake Engineering Society of Korea
- /
- v.4 no.4
- /
- pp.53-61
- /
- 2000
능동 제어기를 설계하기 위해서는 제어대상 구조물의 수학모델의 구해야한다. 그러나, 무한차원의 구조물에 대하여 정확한 모델을 구하는 것은 불가능하므로 유한차원인 저차원화된 모델을 사용하여 제어기를 설계한다. 그러나, 실제 구조물과 저차원화된 모델사이의 오차에 의하여 제어기의 성능이 저하가 되면 제어기와 구조물의 상호작용, 지진과 같은 오란 등의 불확실성, 지진시 구조물의 동적 특성 변화로 인하여 제어기의 성능이 더욱 저하가 된다. 이러한 저하 요인은 제어기 설계시 요구되는 구조물의 수학모델에 대한 불확실한 요소로 작용하기 때문에 제어성능의 저하를 일으키며 응답의 불안정을 유발하기로 한다. 본 연구에서는 질량형 능동제어기(AMD)가 설치된 3층 건물 모형의 모델 오차에 관한 불확실성을 반영한 강인제어기법을 적용하여 제어성능과 안정성을 실험을 통하여 분석하였다. 강인제어 기법인 $\mu$ 합성법에 요구되는 여러 가지 가중함수인 주파수필터는 건물과 AMD의 특성, 모델 오차, 제어율과 AMD 성능의 , 측정잡음 및 지진외란의 특성 등을 고려하여 정량적으로 선택되었다. $\mu$합성법에 의하여 제어기를 설계하였으며 강인성을 비교하기 위하여 불확실성이 고려되지 않는 LQG 기법에 의한 제어기를 선택하였다. $\mu$합성법은 규정된 불확성에 대하여 제어의 강인성을 가지므로 동적특성이 바뀐 건물모형에 관한 강인성을 LQG 기법에 의한 제어성능과 비교하였다. 그 결과 동적특성이 변화된 건물에 대하여 $\mu$합성법만이 제어의 효율성이 유지되는 강인성을 나타내었다.
PDF

Model Composition Methodology for High Speed Simulation (고속 시뮬레이션을 위한 모델합성 방법)

Lee, Wan-Bok
- The Journal of the Korea Contents Association
- /
- v.6 no.11
- /
- pp.258-265
- /
- 2006
DEVS formalism is advantageous in modeling large-scale complex systems and it reveals good readability, because it can specify discrete event systems in a hierarchical manner. In contrast, it has drawback in that the simulation speed of DEVS models is comparably slow since it requires frequent message passing between the component models in run-time. This paper proposes a method, called model composition, for simulation speedup of DEVS models. The method is viewed as a compiled simulation technique which eliminates run-time interpretation of communication paths between component models. Experimental results show that the simulation speed of transformed DEVS models is about 18 times faster than original ones.
PDF

SKU-Net: Improved U-Net using Selective Kernel Convolution for Retinal Vessel Segmentation

Hwang, Dong-Hwan;Moon, Gwi-Seong;Kim, Yoon
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.4
- /
- pp.29-37
- /
- 2021
In this paper, we propose a deep learning-based retinal vessel segmentation model for handling multi-scale information of fundus images. we integrate the selective kernel convolution into U-Net-based convolutional neural network. The proposed model extracts and segment features information with various shapes and sizes of retinal blood vessels, which is important information for diagnosing eye-related diseases from fundus images. The proposed model consists of standard convolutions and selective kernel convolutions. While the standard convolutional layer extracts information through the same size kernel size, The selective kernel convolution extracts information from branches with various kernel sizes and combines them by adaptively adjusting them through split-attention. To evaluate the performance of the proposed model, we used the DRIVE and CHASE DB1 datasets and the proposed model showed F1 score of 82.91% and 81.71% on both datasets respectively, confirming that the proposed model is effective in segmenting retinal blood vessels.
https://doi.org/10.9708/jksci.2021.26.04.029 인용 PDF KSCI HTML

Search Result 1,729, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)