References
- D. Lee, S. Kim, and Y. Kay, "A speech recognition system based on a new endpoint estimation method jointly using audio/video informations," Journal of Broadcast Engineering, Vol. 8, No.2, pp.198-203, 2003.
- G. Kim, J. Ryu, and N. Cho, "Voice activity detection using motion and variation of intensity in the mouth region," Journal of Broadcast Engineering, Vol. 17, No.3, pp.519-528, 2012. https://doi.org/10.5909/JBE.2012.17.3.519
- DARPA Broadcast News Transcription and Understanding Workshop, 1998.
- T. Hain, P. C. Woodland, "Segmentation and classification of broadcast news audio," Proceeding of International Conference on Spoken Language Processing (ICSLP), pp. 2727-2730, 1998.
- L. Lu, H. J. Zhang, and S. Z. Li, "Content-based audio classification and segmentation by using support vector machines," Multimedia Systems, Vol. 8, No. 6, pp. 482-492, 2003. https://doi.org/10.1007/s00530-002-0065-0
- T. L. Nwe and H. Li, "Broadcast news segmentation by audio type analysis," Proceeding of 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2005.
- A. Misra, "Speech/nonspeech segmentation in web video," Proceeding of 13th Annual Conference of the International Speech Communication Association (INTERSPEECH 2012), September 9-13, Portland, Oregon, USA, pp. 1977-1980, 2012.
- N. Ryant, M. Libeman, J. Yuan, "Speech activity detection on YouTube using deep neural network," Proceeding of 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013), August 25-29, Lyon, France, pp. 728-731, 2013.
- F. Eyben, F. Weninger, S. Squartini and B. Schuller, "Real-life voice activity detection with LSTM recurrent neural networks and an application to Hollywood movies," Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 483-487, 2013.
- M.A. Pitt, L. Dilley, K. Johnson, S. Kiesling, W. Raymond, E. Hume, and E. Fosler-Lussier, Buckeye Corpus of Conversational Speech (2nd release), Department of Psychology, Ohio State University (Distributor), Columbus, OH, USA, 2007, www.buckeyecorpus.osu.edu (accessed Aug. 18, 2017).
- J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgrena, and V. Zue, "TIMIT acoustic-phonetic continuous speech corpus," 1993, https://catalog.ldc.upenn.edu/ldc93s1 (accessed Aug. 18, 2017).
- B. Lehner, G. Widmer and R. Sonnleitner, "Improving voice activity detection in movies," Proceeding of 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), September 6-10, Dresden, Germany, pp. 2942-2946, 2015.
- I. Jang, C. Ahn, Y. Jang, "Non-dialog section detection for the descriptive video service contents authoring," Journal of Broadcast Engineering, Vol. 19, No. 3, pp. 296-306, 2014. https://doi.org/10.5909/JBE.2014.19.3.296
- I. Jang, C. Ahn, J. Seo, Y. Jang, "Enhanced feature extraction for speech detection in media audio," Proceeding of 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), August 20-24, Stockholm, Sweden, pp. 479-483, 2017.
- D. FitzGerald, "Harmonic/percussive separation using median filtering," Proceeding of the 13th International Conference on Digital Audio Effects (DAFx-10), 2010.
- C. Hsu, D "A tandem algorithm for singing pitch extraction and voice separation from music accompaniment," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, No. 5, pp. 1482-1491, 2012. https://doi.org/10.1109/TASL.2011.2182510
- R. Fug, A. Niedermeier, J. Driedger, S. Disch, M. Muller "Harmonicpercussive- residual sound separation using the structure tensor on spectrograms," Proceeding of Acoustics, Speech and Signal Processing (ICASSP), 2016.
- D. FitzGerald and M. Gainza, "Single channel vocal separation using median filtering and factorisation techniques," ISAST Transactions on Electronic and Signal Processing, Vol. 4, No. 1, pp. 62-73, 2010.
- S. Leglaive, R. Hennequin, R. Badeau. "Singing voice detection with deep recurrent neural networks," Proceeding of Acoustics, Speech and Signal Processing (ICASSP), pp.121-125, 2015.