Search | Korea Science

Estimating speech parameters for ultrasonic Doppler signal using LSTM recurrent neural networks (LSTM 순환 신경망을 이용한 초음파 도플러 신호의 음성 패러미터 추정)

Joo, Hyeong-Kil;Lee, Ki-Seung
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.4
- /
- pp.433-441
- /
- 2019
In this paper, a method of estimating speech parameters for ultrasonic Doppler signals reflected from the articulatory muscles using LSTM (Long Short Term Memory) RNN (Recurrent Neural Networks) was introduced and compared with the method using MLP (Multi-Layer Perceptrons). LSTM RNN were used to estimate the Fourier transform coefficients of speech signals from the ultrasonic Doppler signals. The log energy value of the Mel frequency band and the Fourier transform coefficients, which were extracted respectively from the ultrasonic Doppler signal and the speech signal, were used as the input and reference for training LSTM RNN. The performance of LSTM RNN and MLP was evaluated and compared by experiments using test data, and the RMSE (Root Mean Squared Error) was used as a measure. The RMSE of each experiment was 0.5810 and 0.7380, respectively. The difference was about 0.1570, so that it confirmed that the performance of the method using the LSTM RNN was better.
https://doi.org/10.7776/ASK.2019.38.4.433 인용 PDF KSCI HTML

A study on the vocal characteristics of spoken emotional expressions (구어체 정서표현에 있어서의 음성 특성 연구)

이수정;김명재;김정수
- Science of Emotion and Sensibility
- /
- v.2 no.2
- /
- pp.53-66
- /
- 1999
현 연구에서는 음성합성의 기초자료 수집을 위하여 대화체 감정표현의 음성적인 패러미터를 찾아내려고 시도하였다. 이를 이하여 일단 가장 자주 사용되는 대화체 감정표현자료가 수집되었고 이들 표현을 발화할 때 가장 주의를 기울이는 발성의 특징들이 탐색되었다. 구어체적 감정표현의 타당한 데이터베이스를 작성하기 위하여 20대와 30대로 연령층을 구분하여 자료를 수집, 분석하였다. 그 결과 다양한 감정표현의 발화특성들은 음의 강도, 강도변화, 그리고 음색이 중요한 기준으로 작용하는 것으로 나타났다. 다차원분석 결과 산출된 20대와 30대의 음성표현이 도면은 개별정서들이 음성의 잠재차원 상에서 상당한 일관된 특징을 지님을 보여 주었다.
PDF

A study on the vocal characteristics of spoken emotional expressions (구어체 정서표현에 있어서의 음성 특성 연구)

이수정
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 1999.11a
- /
- pp.277-291
- /
- 1999
현 연구에서는 음성합성의 기초자료 수집을 위하여 대화체 감정표현의 음성적인 패러미터를 찾아내려고 시도하였다. 이를 위하여 일단 가장 자주 사용되는 대화체 감정 표현자료가 수집되었고 이들 표현을 발화할 때 가장 주의를 기울이는 발성의 특징들이 탐색되었다. 구어체적 감정표현의 타당한 데이타베이스를 작성하기 위하여 20대와 30로 연령층을 구분하여 자료를 수집, 분석하였다. 그 결과 다양한 감정표현의 발화특성들은 음의 강도, 강도변화, 그리고 음색이 중요한 기준으로 작용하는 것으로 나타났다. 다차원 분석 결과 산출된 20대와 30대의 음성표현의 도면은 개별정서들이 음성의 잠재차원 상에서 상당한 일관된 특징을 지님을 보여 주었다.
PDF

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

Kim, Eung-Kyeu
- Journal of the Institute of Convergence Signal Processing
- /
- v.11 no.4
- /
- pp.289-297
- /
- 2010
In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.
PDF KSCI

Search Result 4, Processing Time 0.02 seconds

Estimating speech parameters for ultrasonic Doppler signal using LSTM recurrent neural networks (LSTM 순환 신경망을 이용한 초음파 도플러 신호의 음성 패러미터 추정)

A study on the vocal characteristics of spoken emotional expressions (구어체 정서표현에 있어서의 음성 특성 연구)

A study on the vocal characteristics of spoken emotional expressions (구어체 정서표현에 있어서의 음성 특성 연구)

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)