DOI QR코드

DOI QR Code

Research on Emotional Factors and Voice Trend by Country to be considered in Designing AI's Voice - An analysis of interview with experts in Finland and Norway

AI의 음성 디자인에서 고려해야 할 감성적 요소 및 국가별 음성 트랜드에 관한 연구 - 핀란드와 노르웨이의 전문가 인뎁스 인터뷰를 중심으로

  • Namkung, Kiechan (Industry Academic Cooperation Foundation, Kookmin University)
  • Received : 2020.06.27
  • Accepted : 2020.09.20
  • Published : 2020.09.28

Abstract

Use of voice-based interfaces that can interact with users is increasing as AI technology develops. To date, however, most of the research on voice-based interfaces has been technical in nature, focused on areas such as improving the accuracy of speech recognition. Thus, the voice of most voice-based interfaces is uniform and does not provide users with differentiated sensibilities. The purpose of this study is to add a emotional factor suitable for the AI interface. To this end, we have derived emotional factors that should be considered in designing voice interface. In addition, we looked at voice trends that differed from country to country. For this study, we conducted interviews with voice industry experts from Finland and Norway, countries that use their own independent languages.

사용자와의 인터랙션이 가능한 음성 기반의 인터페이스는 AI 기술의 발달에 따라 사용이 확대되고 있다. 하지만, 현재까지의 음성 기반 인터페이스에 대한 연구는 음성 인식의 정확성 향상 등 기술적인 연구들이 대부분이었다. 이렇다 보니, 대부분의 음성 기반 인터페이스의 목소리는 차별화된 감성을 제공하지 않으며 획일화되어 있다. 본 연구에서는 AI 인터페이스의 음성에 적합한 감성 요소를 더하는 것을 목적으로 한다. 이를 위해 음성 인터페이스 디자인에서 고려되어야 할 감성적 요소를 도출하였다. 또한, 국가별로 차이를 보이는 보이스 트렌드를 조사하였다. 본 연구를 위해 자국의 언어를 독립적으로 사용하는 핀란드와 노르웨이, 두 국가의 음성 산업 전문가들과 인터뷰를 진행하였다.

Keywords

References

  1. Homma, T., Obuchi, Y., Shima, K., Ikeshita, R., Kokubo, H. & Matsumoto, T. (2018). In-vehicle voice interface with improved utterance classification accuracy using off-the-shelf cloud speech recognizer. IEICE Transactions on Information and Systems, E101D(12), 3123-3137. DOI : 10.1587/transinf.2018EDK0001.
  2. Scanion, P., Elliss, D. P. W. & Reilly, R. B. (2007). Using broad phonetic group experts for improved speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 803-802. DOI : 10.1109/TASL.2006.885907.
  3. Ramirez, J., Segura, J. C., Gorriz, J. M. & Garcia, L. (2007). Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition. IEEE Transactions on Audio, Speech & Language Processing, 15(8), 2177-2189. DOI : 10.1109/TASL.2007.903937.
  4. Zolnay, A., Kocharov, D., Schluter, R. & Ney, H. (2007). Using multiple acoustic feature sets for speech recognition. Speech Communication, 49(6), 514-525. DOI : 10.1016/j.specom.2007.04.005.
  5. HeeEun, L. (2018). Why do voice-activated technologies sound female? Sound technology and gendered voice of digital voice assistants. Korean Journal of Communication & Information, 90, 126-153. https://doi.org/10.46407/kjci.2018.08.90.126
  6. Nguyen, Q. N., Ta, A. & Prybutok, V. (2019). An integrated model of voice-user interface continuance intention: The gender effect. International Journal of Human-Computer Interaction, 35(15), 1362-1377. DOI : 10.1080/10447318.2018.1525023.
  7. Mabanza, N. (2018, December). Gender influences on preference of pedagogical interface agents. Proceedings of the International Conference on Intelligent & Innovative Computing Applications , Plaine Magnien, Mauritius. DOI : 10.1109/ICONIC.2018.8601292.
  8. Couper, M. P., Singer, E. & Tourangeau, R. (2004). Does voice matter? An interactive voice response (IVR) experiment. Journal of Official Statistics, 20(3), 551-570.
  9. Myles, J. F. (2013). Instrumentalizing voice: Applying Bahktin and Bourdieu to analyze interactive voice response service. Journal of Communication Inquiry, 37(3), 233-248. DOI : 10.1177/0196859913491765.
  10. Sydell, L. (2018). The push for a gender-neutral Siri. [Online]. www.npr.org/2018/07/09/627266501/the-push-for-agender- neutral-siri
  11. Mehrabian, A. & Ferris, S. R. (1967). Inference of attitudes from nonverbal communication in two channels. Journal of Consulting Psychology, 31(3), 248-252. https://doi.org/10.1037/h0024648
  12. GwanHae, C., DongUk, C., BumJoo, L., Yeong, P. & YeonMan, J. (2017). A study on characterizing the voices of active announcers using voice analysis technology. The Journal of Korean Institute of Communication and Information Sciences, 42(7), 1422-1431. DOI : 10.7840/kics.2017.42.7.1422.
  13. Peterson, R. A., Cannito, M. P. & Brown, S. P. (1995). An exploratory investigation of voice characteristics and selling effectiveness. Journal of Personal Selling & Sales Management, 15(1), 1-15.
  14. Suzuki, H., Zen, H., Nankaku, Y., Miyajima, C., Tokuda, K. & Kitamura, T. (2003). Speech recognition using voice-characteristic-dependent acoustic models. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing , Hong Kong, China. DOI : 10.1109/ICASSP.2003.1198887.
  15. McCroskey, J. C. & Teven, J. J. (1999). Goodwill: A reexamination of the construct and its measurement. Communication Monographs, 66, 90-103. DOI : 10.1080/03637759909376464.
  16. McCroskey, J. C. & McCain, T. A. (1974). The measurement of interpersonal attraction. Speech Monographs, 41, 261-266. DOI : 10.1080/03637757409375845.
  17. Chad, E., Automn, E., Brett, S., Xialing, L. & Noelle, M. (2019). Evaluations of an artificial intelligence instructor's voice: Social identity theory in human-robot interactions. Computers in Human Behavior, 90, 357-362. DOI : 10.1016/j.chb.2018.08.027.