DOI QR코드

DOI QR Code

WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method

  • Kim, I-Gil (KT Institute of Convergence Technology)
  • 투고 : 2015.03.13
  • 심사 : 2015.04.30
  • 발행 : 2015.04.30

초록

빠르게 발전하는 IT환경 속에서 이제 동영상 콘텐츠는 TV를 통한 일방 시청이 아니라 언제 어디서든 다양한 단말에서 볼 수 있는 VOD (Video on Demand) 형태로 발전하고 있다. 이러한 동영상 시청형태의 변화는 디지털이란 특성 때문에 동영상의 재생 속도 또한 다양하게 조절할 수 있다는 부가적인 장점을 사용자에게 제공 한다. 지루하고 따분한 동영상 콘텐츠는 빠르게 돌려보고 흥미 있는 장면은 느리게 천천히 볼 수 있는 동영상 미세배속 재생 기능은 오늘날 다양한 동영상 플레이어에서 제공되고 있다. 동영상 미세배속 재생 시 동영상 콘텐츠 내용의 정확한 이해를 위해서는 시각정보 못지않게 음성정보 청취가 중요한데 정상속도 보다 빠르거나 느린 재생 시 발생하는 음성의 왜곡을 줄이기 위한 음성미세배속 기술들이 음성처리 분야에서 꾸준히 발전되어 왔다. 본 논문에서는 이중 WSOLA와 같은 우수한 음성미세배속 알고리즘에 대해 알아보고 동영상 시청 시 이러한 기능 제공이 실제 얼마나 사용자 니즈(needs)에 부합하는 지 분석해보고자 한다. 특히, 동영상 콘텐츠를 사용자의 콘텐츠 소비 목적에 따라 종류별로 구분하여 재생 배속의 선호도를 조사하고 그 결과를 분석해 봄으로써 동영상 미세배속 기능 제공시 콘텐츠별 소비 목적에 맞게 재생 배속을 제공하는 것이 필요하다는 것을 제안하고자 한다.

In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.

키워드

참고문헌

  1. J. Laroche and M. Dolson, "Improved phase vocoder time-scale modification of audio," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 323-332, May1999. https://doi.org/10.1109/89.759041
  2. D. W. Griffin and J. S. Lim, "Signal estimation from modified short time Fourier transform," IEEE Trans. Audio, Speech, Signal Process., vol. ASSP-32, no.2, pp. 236-243, Apr. 1984.
  3. E. Moulines and J. Laroche, "Non-parametric techniques for pitchscale and time-scale modification of speech," Speech Commun., vol. 16, no. 2, pp. 175-206,1995. https://doi.org/10.1016/0167-6393(94)00054-E
  4. E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speechsynthesis using diphones," Speech Commun., vol. 9,no. 5-6, pp. 453-467, 1990. https://doi.org/10.1016/0167-6393(90)90021-Z
  5. W. Verhelst, "Overlap-add methods for time-scaling of speech," Speech Commun., vol. 30, no. 4, pp. 207-221, 2000. https://doi.org/10.1016/S0167-6393(99)00051-5
  6. Shahaf Grofit, Yizhar Lavner, "TimeScale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients", IEEE Transactions on Audio, Speech & Language Processing-TASLP, vol. 16, no. 1, pp. 106-115, 2008 https://doi.org/10.1109/TASL.2007.909444
  7. Ivan Damnjanovic, Dan Barry, David Dorran, JoshuaD. Reiss, "A Real-Time Framework for Video Timeand Pitch Scale Modification," IEEE Transactionson Multimedia-TMM, vol. 12, no. 4, pp. 247-256, 2010 https://doi.org/10.1109/TMM.2010.2046296
  8. Wlodarczyk, M., Sekalski, P., "Evaluation of time-scale modification methods for audio signals on mobile devices with android OS", Proceedings of the 21st International Conference on Mixed Design of Integrated Circuits & Systems (MIXDES), 2014
  9. H.Valbret,E.Moulines,andJ.P.Tubach,"Voice transformation using PSOLA techniques," Speech Communication., vol. 11, pp. 175-187, 1992. https://doi.org/10.1016/0167-6393(92)90012-V
  10. S. Roucos and A. Wilgus, "High quality time-scalemodification of speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Tampa, FL, Mar.,pp. 493-496, 1985.
  11. S. Grofit and Y. Lavner, Time-scale modification of audio signals using enhanced wsola with management of transients, IEEE Transactions on Audio, Speech & Language Processing, 16, pp. 106-115, 2008 https://doi.org/10.1109/TASL.2007.909444
  12. W. Verhelst and M. Roelands, "An overlap-add technique based on waveform similarity (WSOLA) forhigh quality time-scale modifi-cation of speech,"in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Minneapolis, MN, pp. 554-557, 1993.
  13. A. Efrat, Q. Fan, and S. Venkatasubramanian. Curve matching, time warping, and light fields: New algorithms for computing similarity between curves. J.Mathematic Imaging and Vision, 2007.
  14. M. Munich and P. Perona, "Continuous dynamic time warping for translation-invariant curve alignmentwith applications to signature verification," in International Conference on Computer Vision (ICCV), pp.108-115, 1999.
  15. K. Huang and H. Yan. On-line signature verification based on dynamic segmentation and global andlocal matching. Optical Engineering, 34(12):3480-3487, 1995. https://doi.org/10.1117/12.215474
  16. R. Martens and L. Claesen. On-line signature verification by dynamic time-warping. In Proc. 13th Int. Conf. Pattern Recognition, pages 38-42, 1996.
  17. http://www.g-school.co.kr/community/pollEnd.jsp?poll_code=2009030400001
  18. Sun-jin Kim, The present and prospect of Online Video, Music service and Media Usage, Journal of Digital Contents Society,. vol. 16, pp.137-144, 2015 https://doi.org/10.9728/dcs.2015.16.1.137