Search | Korea Science

Application and Technology of Voice Synthesis Engine for Music Production (음악제작을 위한 음성합성엔진의 활용과 기술)

Park, Byung-Kyu
- Journal of Digital Contents Society
- /
- v.11 no.2
- /
- pp.235-242
- /
- 2010
Differently from instruments which synthesized sounds and tones in the past, voice synthesis engine for music production has reached to the level of creating music as if actual artists were singing. It uses the samples of human voices naturally connected to the different levels of phoneme within the frequency range. Voice synthesis engine is not simply limited to the music production but it is changing cultural paradigm through the second creations of new music type including character music concerts, media productions, albums, and mobile services. Currently, voice synthesis engine technology makes it possible that users input pitch, lyrics, and musical expression parameters through the score editor and they mix and connect voice samples brought from the database to sing. New music types derived from such a development of computer music has sparked a big impact culturally. Accordingly, this paper attempts to examine the specific case studies and the synthesis technologies for users to understand the voice synthesis engine more easily, and it will contribute to their variety of music production.
PDF KSCI

인터렉티브 하이브리드 미디어 응용기술 -MPEG-4 SNHC를 중심으로-

김형곤
- Broadcasting and Media Magazine
- /
- v.3 no.2
- /
- pp.44-58
- /
- 1998
최근의 멀티미디어 기술은 정보의 디지털화와 온라인화에 따라 가전, 컴퓨터, 통신 및 방송 기술이 융화되어 가는 추세에 있으며, 대화형의 하이브리드 멀티미디어 기술을 그 특징으로 하고있다. 하이브리드 멀티미디어는 컴퓨터 그래픽 및 미디(MIDI) 기술로 인위적으로 생성한 2D/3D그래픽 및 음향을 실제의 자연적인 영상과 소리에 추가하여 합성하므로 생성된다. MPEG-4는 이렇게 인위적으로 합성되거나 자연적인 영상 혹은 음향 정보의 디지털 하이브리드 멀티미디어 부호화를 목적으로 하며, 활성화된 혼합 미디어의 내용기반 처리, 상호, 동작 및 사용자의 쉬운 접근 등을 가능하게 한다. SNHC(Synthetic-Natural Hybrid Coding)는 기존의 수동적인 미디어의 전달뿐 아니라 실시간 처리가 가능한 인터랙티브 응용 분야까지 다루고 있으며, 통합된 시공간 부호화 기법을 사용하여 시각, 청각, 2차원, 3차원 컴퓨터 그래픽스 등 다양한 형태의 표준 AV(Aural/Visual) 객체를 처리한다. 표준화는 주로mesh-segmented 비디오 부호화, 구조물 부호화, 객체간의 동기화, AV 객체 스트림의 멀티플렉싱, 혼합 미디어 형태의 시-공간 통합화 등에서 이루어지게 되는데, 이는 궁극적으로 네트워크로 연결되는 가상 환경(Virtual Environment)에서 다수의 사용자가 서로 상호작용 할 수 있는 틀을 제공하는데 있다. 이러한 틀이 제공되면, 대화형 하이브리드 멀티미디어라는 새로운 형태의 정보를 사용함으로써 기존의 미디어로는 경험하지 못하는 다양한 응용과 서비스를 경험할 수 있을 것이다.
PDF

A Study on Sound Timbre Learning Using Convolutional Network (음색 러닝을 위한 합성 곱 신경망 모델 분석)

Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
- Proceedings of the Korea Information Processing Society Conference
- /
- 2019.05a
- /
- pp.470-471
- /
- 2019
서로 다른 음성 데이터 분류를 위한 연구는 많이 진행되고 있지만 개인이 갖고 있는 목소리 또는 각 악기들이 갖고 있는 음색 러닝 연구는 부족한 실정이다. 본 논문에서는 음색 러닝을 위한 합성 곱 신경망 분석 연구를 진행한다. 음색이란 음정과 세기가 같을 경우에도 두 소리를 구분할 수 있는 복합적인 요소이다.
https://doi.org/10.3745/PKIPS.y2019m05a.470 인용 PDF

Bird sounds classification by combining PNCC and robust Mel-log filter bank features (PNCC와 robust Mel-log filter bank 특징을 결합한 조류 울음소리 분류)

Badi, Alzahra;Ko, Kyungdeuk;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.1
- /
- pp.39-46
- /
- 2019
In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.
https://doi.org/10.7776/ASK.2019.38.1.039 인용 PDF KSCI HTML

Interactive Interface for Virtual Korean Percussion Instruments (인터렉티브 국악 타악기 인터페이스 제작 연구)

Han, Ki-Yul;Park, Sang-Bum;Kim, Jun
- Journal of Korea Multimedia Society
- /
- v.14 no.11
- /
- pp.1500-1506
- /
- 2011
This paper is to propose for the production of a digital-based new interface for the percussion of samulnori which are drum, janggu, jing and kkwaenggwari. Newly designed interface is similar to a jing-shaped percussion, and it was designed to playall of four instruments from a single interface. Two batting sufaces, which are located both in front and at trunk of the interface, generate a hitting date creating when interface to be hit and a pressure data when to grip the handle by control unit located at the handle. The information generated is transmitted to a computer via a wireless communication, and then, the computer generate a synthesized sound based on the characteristic of each instrument.
https://doi.org/10.9717/kmms.2011.14.11.1500 인용 PDF KSCI

Audio-Visual Scene Aware Dialogue System Utilizing Action From Vision and Language Features (이미지-텍스트 자질을 이용한 행동 포착 비디오 기반 대화시스템)

Jungwoo Lim;Yoonna Jang;Junyoung Son;Seungyoon Lee;Kinam Park;Heuiseok Lim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.253-257
- /
- 2023
최근 다양한 대화 시스템이 스마트폰 어시스턴트, 자동 차 내비게이션, 음성 제어 스피커, 인간 중심 로봇 등의 실세계 인간-기계 인터페이스에 적용되고 있다. 하지만 대부분의 대화 시스템은 텍스트 기반으로 작동해 다중 모달리티 입력을 처리할 수 없다. 이 문제를 해결하기 위해서는 비디오와 같은 다중 모달리티 장면 인식을 통합한 대화 시스템이 필요하다. 기존의 비디오 기반 대화 시스템은 주로 시각, 이미지, 오디오 등의 다양한 자질을 합성하거나 사전 학습을 통해 이미지와 텍스트를 잘 정렬하는 데에만 집중하여 중요한 행동 단서와 소리 단서를 놓치고 있다는 한계가 존재한다. 본 논문은 이미지-텍스트 정렬의 사전학습 임베딩과 행동 단서, 소리 단서를 활용해 비디오 기반 대화 시스템을 개선한다. 제안한 모델은 텍스트와 이미지, 그리고 오디오 임베딩을 인코딩하고, 이를 바탕으로 관련 프레임과 행동 단서를 추출하여 발화를 생성하는 과정을 거친다. AVSD 데이터셋에서의 실험 결과, 제안한 모델이 기존의 모델보다 높은 성능을 보였으며, 대표적인 이미지-텍스트 자질들을 비디오 기반 대화시스템에서 비교 분석하였다.
PDF

Convolutional neural network based amphibian sound classification using covariance and modulogram (공분산과 모듈로그램을 이용한 콘볼루션 신경망 기반 양서류 울음소리 구별)

Ko, Kyungdeuk;Park, Sangwook;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.37 no.1
- /
- pp.60-65
- /
- 2018
In this paper, a covariance matrix and modulogram are proposed for realizing amphibian sound classification using CNN (Convolutional Neural Network). First of all, a database is established by collecting amphibians sounds including endangered species in natural environment. In order to apply the database to CNN, it is necessary to standardize acoustic signals with different lengths. To standardize the acoustic signals, covariance matrix that gives distribution information and modulogram that contains the information about change over time are extracted and used as input to CNN. The experiment is conducted by varying the number of a convolutional layer and a fully-connected layer. For performance assessment, several conventional methods are considered representing various feature extraction and classification approaches. From the results, it is confirmed that convolutional layer has a greater impact on performance than the fully-connected layer. Also, the performance based on CNN shows attaining the highest recognition rate with 99.07 % among the considered methods.
https://doi.org/10.7776/ASK.2018.37.1.060 인용 PDF KSCI

CNN-based Automatic Machine Fault Diagnosis Method Using Spectrogram Images (스펙트로그램 이미지를 이용한 CNN 기반 자동화 기계 고장 진단 기법)

Kang, Kyung-Won;Lee, Kyeong-Min
- Journal of the Institute of Convergence Signal Processing
- /
- v.21 no.3
- /
- pp.121-126
- /
- 2020
Sound-based machine fault diagnosis is the automatic detection of abnormal sound in the acoustic emission signals of the machines. Conventional methods of using mathematical models were difficult to diagnose machine failure due to the complexity of the industry machinery system and the existence of nonlinear factors such as noises. Therefore, we want to solve the problem of machine fault diagnosis as a deep learning-based image classification problem. In the paper, we propose a CNN-based automatic machine fault diagnosis method using Spectrogram images. The proposed method uses STFT to effectively extract feature vectors from frequencies generated by machine defects, and the feature vectors detected by STFT were converted into spectrogram images and classified by CNN by machine status. The results show that the proposed method can be effectively used not only to detect defects but also to various automatic diagnosis system based on sound.
PDF KSCI

Sound Synthesis of Gayageum by Impulse Responses of Body and Anjok (안족과 몸통의 임펄스 응답을 이용한 가야금 사운드 합성)

Cho Sang-Jin;Choi Gin-Kyu;Chong Ui-Pil
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.3
- /
- pp.102-107
- /
- 2006
In this paper, we propose a method of a sound synthesis of Korean plucked string instrument, gayageum, by physical modeling which use impulse responses of body and Anjok. Gayageum consists of three kinds of systems: string, body, and Anjok. These are a serial combination of linear time invariant systems. String can be modeled by digital delay line. Body and Anjok can be estimated by their impulse responses. We found three resonance frequencies in the body impulse response, and implemented resonator as body. Anjok was implemented as high pass filter in fundamental frequency band of gayageum. RMSEs of synthesized sounds are distributed from 0.01 to 0.03. It was difficult to distinguish the resulting synthesized sounds from the originals sound by ear.
PDF

Performance comparison of lung sound classification using various convolutional neural networks (다양한 합성곱 신경망 방식을 이용한 폐음 분류 방식의 성능 비교)

Kim, Gee Yeun;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.5
- /
- pp.568-573
- /
- 2019
In the diagnosis of pulmonary diseases, auscultation technique is simpler than the other methods, and lung sounds can be used for predicting the types of pulmonary diseases as well as identifying patients with pulmonary diseases. Therefore, in this paper, we identify patients with pulmonary diseases and classify lung sounds according to their sound characteristics using various convolutional neural networks, and compare the classification performance of each neural network method. First, lung sounds over affected areas of the chest with pulmonary diseases are collected by using a single-channel lung sound recording device, and spectral features are extracted from the collected sounds in time domain and applied to each neural network. As classification methods, we use general, parallel, and residual convolutional neural network, and compare lung sound classification performance of each neural network through experiments.
https://doi.org/10.7776/ASK.2019.38.5.568 인용 PDF KSCI

Search Result 62, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)