Search | Korea Science

Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation

Li, Yinan;Zhang, Xiongwei;Sun, Meng
- ETRI Journal
- /
- v.39 no.1
- /
- pp.21-29
- /
- 2017
This paper addresses the problem of unsupervised speech separation based on robust non-negative matrix factorization (RNMF) with ${\beta}$-divergence, when neither speech nor noise training data is available beforehand. We propose a robust version of non-negative matrix factorization, inspired by the recently developed sparse and low-rank decomposition, in which the data matrix is decomposed into the sum of a low-rank matrix and a sparse matrix. Efficient multiplicative update rules to minimize the ${\beta}$-divergence-based cost function are derived. A convolutional extension of the proposed algorithm is also proposed, which considers the time dependency of the non-negative noise bases. Experimental speech separation results show that the proposed convolutional RNMF successfully separates the repeating time-varying spectral structures from the magnitude spectrum of the mixture, and does so without any prior training.
https://doi.org/10.4218/etrij.17.0115.0122 인용 PDF KSCI KPUBS

Maximum mutual information estimation linear spectral transform based adaptation (Maximum mutual information estimation을 이용한 linear spectral transformation 기반의 adaptation)

Yoo, Bong-Soo;Kim, Dong-Hyun;Yook, Dong-Suk
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.53-56
- /
- 2005
In this paper, we propose a transformation based robust adaptation technique that uses the maximum mutual information(MMI) estimation for the objective function and the linear spectral transformation(LST) for adaptation. LST is an adaptation method that deals with environmental noises in the linear spectral domain, so that a small number of parameters can be used for fast adaptation. The proposed technique is called MMI-LST, and evaluated on TIMIT and FFMTIMIT corpora to show that it is advantageous when only a small amount of adaptation speech is used.
PDF

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
- Speech Sciences
- /
- v.10 no.3
- /
- pp.225-239
- /
- 2003
In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.
PDF

A Study on the Visible Speech Processing System for the Hearing Impaired (청각 장애자를 위한 시각 음성 처리 시스템에 관한 연구)

Kim, Won-Ky;Kim, Nam-Hyun;Yoo, Sun-Kook;Jung, Sung-Hun
- Proceedings of the KOSOMBE Conference
- /
- v.1990 no.05
- /
- pp.57-61
- /
- 1990
The purpose of this study is to help the hearing impaired's speech training with a visible speech processing system. In brief, this system converts the features of speech signals into graphics on monitor, and adjusts the features of hearing impaired to normal ones. There are form ant and pitch in the features used for this system. They are extracted using the digital signal processing such as linear prediotive method or AMDF(Average Magnitude Difference Function). In order to effectively train for the hearing impaired's abnormal speech, easilly visible feature has been being studied.
PDF

Speech and Textual Data Fusion for Emotion Detection: A Multimodal Deep Learning Approach (감정 인지를 위한 음성 및 텍스트 데이터 퓨전: 다중 모달 딥 러닝 접근법)

Edward Dwijayanto Cahyadi;Mi-Hwa Song
- Proceedings of the Korea Information Processing Society Conference
- /
- 2023.11a
- /
- pp.526-527
- /
- 2023
Speech emotion recognition(SER) is one of the interesting topics in the machine learning field. By developing multi-modal speech emotion recognition system, we can get numerous benefits. This paper explain about fusing BERT as the text recognizer and CNN as the speech recognizer to built a multi-modal SER system.
https://doi.org/10.3745/PKIPS.y2023m11a.526 인용 PDF

A Novel Algorithm for Discrimination of Voiced Sounds (유성음 구간 검출 알고리즘에 관한 연구)

Jang, Gyu-Cheol;Woo, Soo-Young;Yoo, Chang-D.
- Speech Sciences
- /
- v.9 no.3
- /
- pp.35-45
- /
- 2002
A simple algorithm for discriminating voiced sounds in a speech is proposed. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.
PDF

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

Kim, Eung-Kyeu
- Journal of the Institute of Convergence Signal Processing
- /
- v.11 no.4
- /
- pp.289-297
- /
- 2010
In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.
PDF KSCI

The Real-Time Implementation of G.726 ADPCM on OAK DSP Core based CSD17C00A (OAK DSP Core 기반 CSD17C00A에서의 G.726 ADPCM의 실시간 구현)

Hong SeongHoon;Shim MinKyu;Sung YooNa;Ha JungHo
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.52-55
- /
- 1999
다중 전송율(16, 24, 32, 40kbps)을 제공하는 G.726 부호화기는 ADPCM (Adaptive Differential Pulse Code Modulation) 부호화법을 사용한다. 본논문에서는 G.726 ADPCM 알고리즘을 C&S Technology에서 개발한 음성 신호 처리를 위한 범용 DSP인 CSD17C00A 칩을 이용하여 실시간 응용이 가능하도록 구현하였다. G.726에 대한 양방향 평가는 Codec Loopback test을 통해 수행되었으며, W-T에서 제공한 테스트 절차에 따라 평가되었다. 본 논문에서 구현된 G.726 부호화기는 평균 11 MIPS의 계산 속도를 갖고, 프로그램 메모리 크기는 2.8K Words이고, 데이터 메모리 크기는 550 Words 를 필요로 하였다.
PDF

A Real-Time Implementation of Speech Recognition System Using Oak DSP core in the Car Noise Environment (자동차 환경에서 Oak DSP 코어 기반 음성 인식 시스템 실시간 구현)

Woo, K.H.;Yang, T.Y.;Lee, C.;Youn, D.H.;Cha, I.H.
- Speech Sciences
- /
- v.6
- /
- pp.219-233
- /
- 1999
This paper presents a real-time implementation of a speaker independent speech recognition system based on a discrete hidden markov model(DHMM). This system is developed for a car navigation system to design on-chip VLSI system of speech recognition which is used by fixed point Oak DSP core of DSP GROUP LTD. We analyze recognition procedure with C language to implement fixed point real-time algorithms. Based on the analyses, we improve the algorithms which are possible to operate in real-time, and can verify the recognition result at the same time as speech ends, by processing all recognition routines within a frame. A car noise is the colored noise concentrated heavily on the low frequency segment under 400 Hz. For the noise robust processing, the high pass filtering and the liftering on the distance measure of feature vectors are applied to the recognition system. Recognition experiments on the twelve isolated command words were performed. The recognition rates of the baseline recognizer were 98.68% in a stopping situation and 80.7% in a running situation. Using the noise processing methods, the recognition rates were enhanced to 89.04% in a running situation.
PDF

A Study on Pitch Period Detection Algorithm Based on Rotation Transform of AMDF and Threshold

Seo, Hyun-Soo;Kim, Nam-Ho
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.4
- /
- pp.178-183
- /
- 2006
As a lot of researches on the speech signal processing are performed due to the recent rapid development of the information-communication technology. the pitch period is used as an important element to various speech signal application fields such as the speech recognition. speaker identification. speech analysis. or speech synthesis. A variety of algorithms for the time and the frequency domains related with such pitch period detection have been suggested. One of the pitch detection algorithms for the time domain. AMDF (average magnitude difference function) uses distance between two valley points as the calculated pitch period. However, it has a problem that the algorithm becomes complex in selecting the valley points for the pitch period detection. Therefore, in this paper we proposed the modified AMDF(M-AMDF) algorithm which recognizes the entire minimum valley points as the pitch period of the speech signal by using the rotation transform of AMDF. In addition, a threshold is set to the beginning portion of speech so that it can be used as the selection criteria for the pitch period. Moreover the proposed algorithm is compared with the conventional ones by means of the simulation, and presents better properties than others.
PDF

Search Result 956, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)