DOI QR코드

DOI QR Code

Local Distribution Based Density Clustering for Speaker Diarization

화자분할을 위한 지역적 특성 기반 밀도 클러스터링

  • Received : 2015.03.03
  • Accepted : 2015.07.06
  • Published : 2015.07.31

Abstract

Speaker diarization is the task of determining the speakers for unlabeled data, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been widely used in the field of speaker diarization for its simplicity and computational efficiency. One challenging issue, however, is that if different clusters in non-spatial dataset are adjacent to each other, over-clustering may occur which subsequently degrades the performance of DBSCAN. In this paper, we identify the drawbacks of DBSCAN and propose a new density clustering algorithm based on local distribution property around object. Variable density criterions for local density and spreadness of object are used for effective data clustering. We compare the proposed algorithm to DBSCAN in terms of clustering accuracy. Experimental results confirm that the proposed algorithm exhibits higher accuracy than DBSCAN without over-clustering and confirm that the new approach based on local density and object spreadness is efficient.

화자 분할은 사전에 분류되지 않은 데이터를 각각의 화자로 분류하는 연구이며 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)은 간결함과 계산의 효율성으로 인해 화자분할 분야에 널리 사용되어 왔다. 그러나 클러스터의 데이터들이 공간적이지 않으며 서로 다른 클러스터가 근접하여 경계를 공유할 때 오버클러스터링 문제가 발생하여 DBSCAN의 성능이 하락한다. 본 논문에서는 DBSCAN과 문제점을 설명하고, 개체의 지역적 특성에 기반한 밀도 기반 클러스터링 알고리즘을 제안한다. 제안하는 알고리즘은 개체의 지역적 밀도와 분산의 정도에 따라 가변적인 판단 기준을 탐색에 이용한다. DBSCAN과 제안 기법의 실험을 통해 성능을 비교하고 제안 기법의 효용을 보인다. 실험 결과 제안한 방법은 오버클러스터링이 발생하지 않으며 DBSCAN에 비해 보다 높은 정확도를 보여 지역적 특성을 이용한 접근 방법이 효과적임을 증명한다.

Keywords

References

  1. Y. Esteve, T. Bazillon, J.Y. Antoine, F. Bechet, and J. Farinas, "The EPAC corpus: manual and automatic annotations of conversational speech in french broadcast news," in LREC., 1686-1689 (2010).
  2. E. El-Khoury, C. Senac, and J. Pinquier. "Improved speaker diarization system for meetings," in IEEE ICASSP., 4097-4100 (2009).
  3. A. Vinciarelli, Alessandro, F. Fernandez, and S. Favre, "Semantic segmentation of radio programs using social network analysis and duration distribution modeling," in ICME., 779-782 (2007).
  4. H. Tang, S. M. Chu, M. Hasegawa-Johnson, and T.S. Huang, "Partially supervised speaker clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 959-971 (2012). https://doi.org/10.1109/TPAMI.2011.174
  5. T. Pfau, Daniel P.W. Ellis, and A. Stolcke, "Multispeaker speech activity detection for the icsi meeting recorder," in IEEE Workshop on Automatic Speech Recognition and Understanding, 107-110 (2001).
  6. D. Wing, Y. Yan, J. Dang, and F. K. Soong, "Voice activity detection based on an unsupervised learning framework," IEEE Transactions on Audio, Speech, and Language Processing 19, 2624-2633 (2011). https://doi.org/10.1109/TASL.2011.2125953
  7. F. G. Germain, D. L. Sun, and G. J. Mysore. "Speaker and noise independent voice activity detection," in Interspeech 2013, 732-736 (2013).
  8. R. Sinha, S. Tranter, M. Gales, and P. Woodland, "The cambridge university march 2005 speaker diarisation system," in Interspeech, 2437-2440 (2005).
  9. P. Jain and H. Hermansky, "Improved mean and variance normalization for robust speech recognition," in IEEE ICASSP., 4015-4015 (2001).
  10. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," in DSP., 19-41 (2000).
  11. S. Meignier, D. Moraru, C. Fredouille, J. F. Bonastre, and L. besacier, "Step-by-step and integrated approaches in broadcast news speaker diarization," Computer Speech & Language 20, 303-330 (2006). https://doi.org/10.1016/j.csl.2005.08.002
  12. X. Zhu, C. Barras, L. Lamel, and J-L. Gauvain, "Multi-stage speaker diarization for conference and lecture meetings," in Multimodal Technologies for Perception of Humans (Springer Berlin Heidelberg, Germany, 2008), pp. 533-542.
  13. S. Salvador and P. Chan, "Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms," in 16th IEEE International Conference on Tools with Artificial Intelligence, 576-584 (2004).
  14. H-P. Kriegelet, P. Kroger, J. Sander, and A. Zimek, "Density-based clustering," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 231-240 (2011).
  15. M. Ester, H-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Knowledge Discovery and Data Mining 96, 226-231 (1996).
  16. C. Braune, S. Besecke, and R. Kruse, "Density Based Clustering: Alternatives to DBSCAN," Partitional Clustering Algorithms(Springer International Publishing, Switzerland, 2015), pp. 193-213.
  17. Z. Aoying, Z. Shuigeng, C. Jing, F. Ye, and H. Yunfa, "Approaches for scaling DBSCAN algorithm to large spatial databases," JCST. 15, 509-526 (2000).
  18. N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Transactions on Audio, Speech, and Lang. Process. 19, 788-798 (2011). https://doi.org/10.1109/TASL.2010.2064307