• Title/Summary/Keyword: artificial intelligence speaker

Search Result 44, Processing Time 0.025 seconds

Analysis of unfairness of artificial intelligence-based speaker identification technology (인공지능 기반 화자 식별 기술의 불공정성 분석)

  • Shin Na Yeon;Lee Jin Min;No Hyeon;Lee Il Gu
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.27-33
    • /
    • 2023
  • Digitalization due to COVID-19 has rapidly developed artificial intelligence-based voice recognition technology. However, this technology causes unfair social problems, such as race and gender discrimination if datasets are biased against some groups, and degrades the reliability and security of artificial intelligence services. In this work, we compare and analyze accuracy-based unfairness in biased data environments using VGGNet (Visual Geometry Group Network), ResNet (Residual Neural Network), and MobileNet, which are representative CNN (Convolutional Neural Network) models of artificial intelligence. Experimental results show that ResNet34 showed the highest accuracy for women and men at 91% and 89.9%in Top1-accuracy, while ResNet18 showed the slightest accuracy difference between genders at 1.8%. The difference in accuracy between genders by model causes differences in service quality and unfair results between men and women when using the service.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Analysis of the utility of intelligent speakers in the Internet of Things environment (사물인터넷 환경에서 지능형 스피커의 활용성 분석)

  • Lee, Seong-Hoon;Lee, Dong-Woo
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.3
    • /
    • pp.41-46
    • /
    • 2022
  • Smart home in the Internet of Things (IoT) environment aims to provide an optimal living environment for users by connecting all devices in the home. In such a smart home environment, artificial intelligence speakers are being used as a way to manage and control all devices. The existing speaker function is changing from simple music playback to the role of an interface that controls and manages all devices in the smart home space. This study dealt with the market status and usability analysis in the US and Korea, the leader in artificial intelligence speakers. The main target companies were Amazon, Google, and Apple in the US, as well as Kakao, SKT, and KT in Korea. In addition, based on the reaction results of domestic users to artificial intelligence speakers, the derivation of major problems and directions for improvement were described.

A Study on the Use of Artificial Intelligence Speakers for the People with Physical disability using Technology Acceptance Model (기술수용모델을 활용한 지체장애인의 인공지능 스피커 사용 의도에 관한 연구)

  • Park, Hye-Hyun;Lee, Sun-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.2
    • /
    • pp.283-289
    • /
    • 2021
  • Many people with disabilities have shown interest in artificial intelligence speakers that serves as the main hub of the smart home. Therefore, the purpose of this study was to identify the intention of people with disabilities to use such speakers. The focus is on those with physical disabilities, a segment that accounts for the largest number of disability types. Based on the theoretical model of technology acceptance, the effect of perceived ease of use and perceived usefulness of artificial intelligence speakers by people with disabilities was analyzed using Structural Equation Modeling (SEM). Research has confirmed that the technology acceptance model is suitable for identifying the intention to use artificial intelligence speakers by people with disabilities, and specifically that the perceived ease of use has a significant impact on usefulness. Furthermore, the perceived ease of use for people with disabilities did not have a statistically significant effect on their intent to use whereas the perceived usefulness was shown to have a significant effect on the same. This study is meaningful as a foundation for developing customized artificial intelligence speaker services and improving the use of artificial intelligence speakers by people with disabilities.

Proposal for a Sensory Integration Self-system based on an Artificial Intelligence Speaker for Children with Developmental Disabilities: Pilot Study

  • YeJin Wee;OnSeok Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.4
    • /
    • pp.1216-1233
    • /
    • 2023
  • Conventional occupational therapy (OT) is conducted under the observation of an occupational therapist, and there are limitations in measuring and analyzing details such as degree of hand tremor and movement tendency, so this important information may be lost. It is therefore difficult to identify quantitative performance indicators, and the presence of observers during performance sometimes makes the subjects feel that they have to achieve good results. In this study, by using the Unity3D and artificial intelligence (AI) speaker, we propose a system that allows the subjects to steadily use it by themselves and helps the occupational therapist objectively evaluate through quantitative data. This system is based on the OT of the sensory integration approach. And the purpose of this system is to improve children's activities of daily living by providing various feedback to induce sensory integration, which allows them to develop the ability to effectively use their bodies. A dynamic OT cognitive assessment tool for children used in clinical practice was implemented in Unity3D to create an OT environment of virtual space. The Leap Motion Controller allows users to track and record hand motion data in real time. Occupational therapists can control the user's performance environment remotely by connecting Unity3D and AI speaker. The experiment with the conventional OT tool and the system we proposed was conducted. As a result, it was found that when the system was performed without an observer, users can perform spontaneously and several times feeling ease and active mind.

Development of a Work Management System Based on Speech and Speaker Recognition

  • Gaybulayev, Abdulaziz;Yunusov, Jahongir;Kim, Tae-Hyong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.3
    • /
    • pp.89-97
    • /
    • 2021
  • Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.

The Effect of Perceived Anthropomorphic Characteristics on Continuous Usage Intention of Artificial Intelligence Voice Speaker : Based on the Integrated Adoption Model (인공지능 음성 스피커의 의인화 특성 지각 정도가 지속적 이용 의향에 미치는 영향: 통합 수용 모델을 기반으로)

  • Lee, Sungjoon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.41-55
    • /
    • 2021
  • AI voice speaker has played an important role in forming an early market and development for AI-based goods and service with growing attention from many people. In this context, this research examined factors affecting continuous intention of AI voice speaker based on the integrated adoption model, which combined two factors of perceived playfulness and innovation resistance with extended technology acceptance model. It was also examined whether three perceived anthropomorphic features(i.e., perceived rational support, perceived intimacy, perceived cognitive openness) have influences on continuous intention of AI voice speaker. The data was collected by an online-survey and were responses of those who are in their 20s and 30s and have experienced in using AI voice speaker. They were analyzed by using SEM(Structural Equation Modeling). The results showed that all of perceived ease of use, perceived usefulness, perceived playfulness and innovation resistance had significant influences on continuous intention of AI voice speaker. In addition, all of perceived rational support, perceived intimacy and perceived cognitive openness as perceived anthropomorphic features had significant influences on perceived ease of use, perceived usefulness and perceived playfulness. The implications of found results in this research was also discussed.

A study on the usage intention of AI(artificial intelligence) speaker

  • Kwon, Soon-Hong;Lim, Yang-Whan;Kim, Hyun-Jeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.199-206
    • /
    • 2020
  • In this study, the factors affecting consumers' intention to use AI speakers were focused on the perceived value of the product and the perceived necessity of the product. Factors affectationist consumers' perceived value of the product were divided into benefits and costs. Reflecting the characteristics of information technology products, I included perceptions of usefulness of products. Empirical results show that consumers' perceptions of perceived benefits and usefulness of AI speaker products have a positive effect on perceived value and perceived necessity. Perception of necessity had a positive (+) significant effect on perception of value. Perception of necessity and perception of value had a positive(+) and positive effect on each intention of use. However, the cost perceived by consumers did not have a significant effect on perception of value.

Customer Attitude to Artificial Intelligence Features: Exploratory Study on Customer Reviews of AI Speakers (인공지능 속성에 대한 고객 태도 변화: AI 스피커 고객 리뷰 분석을 통한 탐색적 연구)

  • Lee, Hong Joo
    • Knowledge Management Research
    • /
    • v.20 no.2
    • /
    • pp.25-42
    • /
    • 2019
  • AI speakers which are wireless speakers with smart features have released from many manufacturers and adopted by many customers. Though smart features including voice recognition, controlling connected devices and providing information are embedded in many mobile phones, AI speakers are sitting in home and has a role of the central en-tertainment and information provider. Many surveys have investigated the important factors to adopt AI speakers and influ-encing factors on satisfaction. Though most surveys on AI speakers are cross sectional, we can track customer attitude toward AI speakers longitudinally by analyzing customer reviews on AI speakers. However, there is not much research on the change of customer attitude toward AI speaker. Therefore, in this study, we try to grasp how the attitude of AI speaker changes with time by applying text mining-based analysis. We collected the customer reviews on Amazon Echo which has the highest share of AI speakers in the global market from Amazon.com. Since Amazon Echo already have two generations, we can analyze the characteristics of reviews and compare the attitude ac-cording to the adoption time. We identified all sub topics of customer reviews and specified the topics for smart features. And we analyzed how the share of topics varied with time and analyzed diverse meta data for comparisons. The proportions of the topics for general satisfaction and satisfaction on music were increasing while the proportions of the topics for music quality, speakers and wireless speakers were decreasing over time. Though the proportions of topics for smart fea-tures were similar according to time, the share of the topics in positive reviews and importance metrics were reduced in the 2nd generation of Amazon Echo. Even though smart features were mentioned similarly in the reviews, the influential effect on satisfac-tion were reduced over time and especially in the 2nd generation of Amazon Echo.

Age differences of preference for humanoid AI speakers (얼굴형 인공지능 스피커에 대한 선호의 나이 효과)

  • Oh, Songjoo;Hwang, Jihyun;Yew, Jiho;Hahn, Sowon
    • Korean Journal of Cognitive Science
    • /
    • v.29 no.1
    • /
    • pp.1-16
    • /
    • 2018
  • In this study, we investigated age differences of preference and trust ratings when the appearance of an artificial intelligent speaker resembles a human face. The appearance of the artificial intelligent speaker was presented in seven levels from robot face to human face. In addition, face stimuli were divided into gender (male and female) and age (20s / 60s). Participants evaluated the reliability and likability of each face stimulus on a 7-point scale. The results show that younger adults tend to prefer the face that was halfway between the robot and the human face, while older adults evaluated that the perceived reliability and likability were higher when the stimuli resembled the human face. When asked to choose the most preferred of the four face categories, all participants chose a younger face. However, with additional conditions including emoticon face and empty condition, older adults still preferred human face, while younger adults preferred emoticon face and empty condition. Taken together, older adults are more receptive to human faces than robotic faces in the context of artificial intelligence speakers. Because artificial intelligent speakers can play an important role in the elderly living alone, the present study will be a good reference in the design and development of artificial intelligent speakers for the elderly users.