Browse > Article
http://dx.doi.org/10.7840/kics.2015.40.4.619

Speech Basis Matrix Using Noise Data and NMF-Based Speech Enhancement Scheme  

Kwon, Kisoo (Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University)
Kim, Hyung Young (Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University)
Kim, Nam Soo (Department of Electrical and Computer Engineering and the Institute of New Media and Communications, Seoul National University)
Abstract
This paper presents a speech enhancement method using non-negative matrix factorization (NMF). In the training phase, each basis matrix of source signal is obtained from a proper database, and these basis matrices are utilized for the source separation. In this case, the performance of speech enhancement relies heavily on the basis matrix. The proposed method for which speech basis matrix is made a high reconstruction error for noise signal shows a better performance than the standard NMF which basis matrix is trained independently. For comparison, we propose another method, and evaluate one of previous method. In the experiment result, the performance is evaluated by perceptual evaluation speech quality and signal to distortion ratio, and the proposed method outperformed the other methods.
Keywords
communication; signal processing; Neutral systems; Communication Sciences; Network;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 G. Huang, J. Benesty, T. Long, and J. Chen, "A family of maximum SNR filters for noise reduction," IEEE/ACM Trans. Audio, Speech, and Language Process., vol. 22, no. 12, Dec. 2014.
2 I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Process., vol. 81, pp. 2403-2418, 2001.   DOI   ScienceOn
3 N. S. Kim and J.-H. Chang, "Spectral enhancement based on global soft decision," IEEE Signal Process. Lett., vol. 7, no. 5, pp. 108-110, May 2000.   DOI   ScienceOn
4 K. W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran, "Speech denoising using nonnegative matrix factorization with priors," in IEEE Int. Conf. Acoustics, Speech and Signal Process., 2008.
5 N. Mohammadiha, T. Gerkmann, and A. Leijon, "A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization," IEEE WASPAA, pp. 45-48, 2011.
6 M. N. Schmidt, J. Larsen, and F. T. Hsiao, "Wind noise reduction using non-negative sparse coding," 2007 IEEE Workshop Machine Learning for Signal Process., pp. 431-436, 2007.
7 K. Kwon, J. W. Shin, S. Sukanya, I. Choi, and N. S. Kim, "Speech enhancement combining statistical models and NMF with update of speech and noise bases," IEEE ICASSP, vol. 21, no. 10, pp. 7103-7107, 2014.
8 N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and unsupervised speech enhancement using nonnegative matrix factorization," IEEE Trans. Audio, Speech, and Language Process., vol. 21, no. 10, pp. 2140-2151, 2013.   DOI   ScienceOn
9 D. D. Lee and H. S. Seung, "Learning the parts of objects by nonnegative matrix factorization," Nature, 1999.
10 M. Julien, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, "Supervised dictionary learning," Advances in Neural Inf. Process. Syst., 2009.
11 C. Ding, T. Li, W. Peng, and H. Park, "Orthogonal nonnegative matrix t-factorizations for clustering," in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data mining. ACM, pp. 126-135, 2006.
12 P. D. O'Grady and B. A. Pearlmutter, "Convolutive non-negative matrix factorisation with a sparseness constraint," in Proc. 16th IEEE Signal Process. Soc. Workshop on Machine Learning for Signal Process., pp. 427-432, Arlington, VA, Sept. 2006.
13 E. Vincent, R. Gribonval, and C. Fvotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, and Language Process., vol. 14, no. 4, pp. 1462-1469, 2006.   DOI   ScienceOn
14 J. Huang and T. Zhang, "The benefit of group sparsity," Annal. Statistics, vol. 38, no. 4, pp. 1978-2004, 2010.   DOI
15 N. Guan, D. Tao, Z. Luo, and B. Yuan, "Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent," IEEE Trans. Image Process., vol. 20, no. 7, Jul. 2011.
16 Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, Tech. Rep. ITU-T P.862, 2001.
17 A. Pascual-Montano, J. M. Carazo, K. Kochi, D. Lehmann, and R. D. Pascual-Marquil, "Nonsmooth nonnegative matrix factorization (nsNMF)," IEEE Trans. Pattern Anal. and Machine Intell., vol. 28, no. 3, pp. 403-415, 2006.   DOI   ScienceOn
18 L. Chin-Jen, "Projected gradient methods for non-negative matrix factorization," Neural Computation, vol. 19, no. 10, pp. 2756-2779, Oct. 2007.   DOI   ScienceOn
19 K. Kwon, Y. G. Jin, S. H. Bae, and N. S. Kim, "A NMF-based speech enhancement method using a prior time varying information and gain function," J. KICS, vol. 38, no. 6, pp. 503-511, Jun. 2013.
20 D. Wang and J Lim, "The unimportance of phase in speech enhancement," IEEE Trans. Audio, Speech, and Language Process., vol. 30, No. 4, pp. 679-681, Aug. 1982.
21 H.-T. Fan, J.-w. Hung, X. Lu, S.-S. Wang, and Y. Tsao, "Speech enhancement using segmental nonnegative matrix factorization," IEEE ICASSP, pp. 4516-4520, May 2014.
22 P.-S. Huang, M. Kim, M. H-Johnson, and P. Smaragdis, "Deep learning for monaural speech separation," IEEE ICASSP, pp. 3433-3437, May 2014.
23 E. M. Grais and H. Erdogan, "Discriminative nonnegative dictionary learning using cross coherence penalties for single channel source separation," INTERSPEECH, pp. 808-812, 2013.
24 G. Bao, Y. Xu, and Z. Ye, "Learning a discriminative dictionary for single-channel speech separation," IEEE Trans. Audio, Speech, and Language Process., vol. 22, no. 7, Jul. 2014.