Search | Korea Science

Analysis of unfairness of artificial intelligence-based speaker identification technology (인공지능 기반 화자 식별 기술의 불공정성 분석)

Shin Na Yeon;Lee Jin Min;No Hyeon;Lee Il Gu
- Convergence Security Journal
- /
- v.23 no.1
- /
- pp.27-33
- /
- 2023
Digitalization due to COVID-19 has rapidly developed artificial intelligence-based voice recognition technology. However, this technology causes unfair social problems, such as race and gender discrimination if datasets are biased against some groups, and degrades the reliability and security of artificial intelligence services. In this work, we compare and analyze accuracy-based unfairness in biased data environments using VGGNet (Visual Geometry Group Network), ResNet (Residual Neural Network), and MobileNet, which are representative CNN (Convolutional Neural Network) models of artificial intelligence. Experimental results show that ResNet34 showed the highest accuracy for women and men at 91% and 89.9%in Top1-accuracy, while ResNet18 showed the slightest accuracy difference between genders at 1.8%. The difference in accuracy between genders by model causes differences in service quality and unfair results between men and women when using the service.
https://doi.org/10.33778/kcsa.2023.23.1.027 인용 PDF HTML

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Lightweight Speaker Recognition for Pet Robots using Residuals Neural Network (잔차 신경망을 활용한 펫 로봇용 화자인식 경량화)

Seong-Hyun Kang;Tae-Hee Lee;Myung-Ryul Choi
- Journal of IKEEE
- /
- v.28 no.2
- /
- pp.168-173
- /
- 2024
Speaker recognition refers to a technology that analyzes voice frequencies that are different for each individual and compares them with pre-stored voices to determine the identity of the person. Deep learning-based speaker recognition is being applied to many fields, and pet robots are one of them. However, the hardware performance of pet robots is very limited in terms of the large memory space and calculations of deep learning technology. This is an important problem that pet robots must solve in real-time interaction with users. Lightening deep learning models has become an important way to solve the above problems, and a lot of research is being done recently. In this paper, we describe the results of research on lightweight speaker recognition for pet robots by constructing a voice data set for pet robots, which is a specific command type, and comparing the results of models using residuals. In the conclusion, we present the results of the proposed method and Future research plans are described.
https://doi.org/10.7471/ikeee.2024.28.2.168 인용 PDF

Analysis of the utility of intelligent speakers in the Internet of Things environment (사물인터넷 환경에서 지능형 스피커의 활용성 분석)

Lee, Seong-Hoon;Lee, Dong-Woo
- Journal of Internet of Things and Convergence
- /
- v.8 no.3
- /
- pp.41-46
- /
- 2022
Smart home in the Internet of Things (IoT) environment aims to provide an optimal living environment for users by connecting all devices in the home. In such a smart home environment, artificial intelligence speakers are being used as a way to manage and control all devices. The existing speaker function is changing from simple music playback to the role of an interface that controls and manages all devices in the smart home space. This study dealt with the market status and usability analysis in the US and Korea, the leader in artificial intelligence speakers. The main target companies were Amazon, Google, and Apple in the US, as well as Kakao, SKT, and KT in Korea. In addition, based on the reaction results of domestic users to artificial intelligence speakers, the derivation of major problems and directions for improvement were described.
https://doi.org/10.20465/KIOTS.2022.8.3.041 인용 PDF KSCI

A Study on the Use of Artificial Intelligence Speakers for the People with Physical disability using Technology Acceptance Model (기술수용모델을 활용한 지체장애인의 인공지능 스피커 사용 의도에 관한 연구)

Park, Hye-Hyun;Lee, Sun-Min
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.22 no.2
- /
- pp.283-289
- /
- 2021
Many people with disabilities have shown interest in artificial intelligence speakers that serves as the main hub of the smart home. Therefore, the purpose of this study was to identify the intention of people with disabilities to use such speakers. The focus is on those with physical disabilities, a segment that accounts for the largest number of disability types. Based on the theoretical model of technology acceptance, the effect of perceived ease of use and perceived usefulness of artificial intelligence speakers by people with disabilities was analyzed using Structural Equation Modeling (SEM). Research has confirmed that the technology acceptance model is suitable for identifying the intention to use artificial intelligence speakers by people with disabilities, and specifically that the perceived ease of use has a significant impact on usefulness. Furthermore, the perceived ease of use for people with disabilities did not have a statistically significant effect on their intent to use whereas the perceived usefulness was shown to have a significant effect on the same. This study is meaningful as a foundation for developing customized artificial intelligence speaker services and improving the use of artificial intelligence speakers by people with disabilities.
https://doi.org/10.5762/KAIS.2021.22.2.283 인용 PDF KSCI

Proposal for a Sensory Integration Self-system based on an Artificial Intelligence Speaker for Children with Developmental Disabilities: Pilot Study

YeJin Wee;OnSeok Lee
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.4
- /
- pp.1216-1233
- /
- 2023
Conventional occupational therapy (OT) is conducted under the observation of an occupational therapist, and there are limitations in measuring and analyzing details such as degree of hand tremor and movement tendency, so this important information may be lost. It is therefore difficult to identify quantitative performance indicators, and the presence of observers during performance sometimes makes the subjects feel that they have to achieve good results. In this study, by using the Unity3D and artificial intelligence (AI) speaker, we propose a system that allows the subjects to steadily use it by themselves and helps the occupational therapist objectively evaluate through quantitative data. This system is based on the OT of the sensory integration approach. And the purpose of this system is to improve children's activities of daily living by providing various feedback to induce sensory integration, which allows them to develop the ability to effectively use their bodies. A dynamic OT cognitive assessment tool for children used in clinical practice was implemented in Unity3D to create an OT environment of virtual space. The Leap Motion Controller allows users to track and record hand motion data in real time. Occupational therapists can control the user's performance environment remotely by connecting Unity3D and AI speaker. The experiment with the conventional OT tool and the system we proposed was conducted. As a result, it was found that when the system was performed without an observer, users can perform spontaneously and several times feeling ease and active mind.
https://doi.org/10.3837/tiis.2023.04.010 인용 PDF HTML

Development of a Work Management System Based on Speech and Speaker Recognition

Gaybulayev, Abdulaziz;Yunusov, Jahongir;Kim, Tae-Hyong
- IEMEK Journal of Embedded Systems and Applications
- /
- v.16 no.3
- /
- pp.89-97
- /
- 2021
Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.
https://doi.org/10.14372/IEMEK.2021.16.3.89 인용 PDF KSCI

The Effect of Perceived Anthropomorphic Characteristics on Continuous Usage Intention of Artificial Intelligence Voice Speaker : Based on the Integrated Adoption Model (인공지능 음성 스피커의 의인화 특성 지각 정도가 지속적 이용 의향에 미치는 영향: 통합 수용 모델을 기반으로)

Lee, Sungjoon
- The Journal of the Korea Contents Association
- /
- v.21 no.11
- /
- pp.41-55
- /
- 2021
AI voice speaker has played an important role in forming an early market and development for AI-based goods and service with growing attention from many people. In this context, this research examined factors affecting continuous intention of AI voice speaker based on the integrated adoption model, which combined two factors of perceived playfulness and innovation resistance with extended technology acceptance model. It was also examined whether three perceived anthropomorphic features(i.e., perceived rational support, perceived intimacy, perceived cognitive openness) have influences on continuous intention of AI voice speaker. The data was collected by an online-survey and were responses of those who are in their 20s and 30s and have experienced in using AI voice speaker. They were analyzed by using SEM(Structural Equation Modeling). The results showed that all of perceived ease of use, perceived usefulness, perceived playfulness and innovation resistance had significant influences on continuous intention of AI voice speaker. In addition, all of perceived rational support, perceived intimacy and perceived cognitive openness as perceived anthropomorphic features had significant influences on perceived ease of use, perceived usefulness and perceived playfulness. The implications of found results in this research was also discussed.
https://doi.org/10.5392/JKCA.2021.21.11.041 인용 PDF KSCI HTML

A study on the usage intention of AI(artificial intelligence) speaker

Kwon, Soon-Hong;Lim, Yang-Whan;Kim, Hyun-Jeong
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.1
- /
- pp.199-206
- /
- 2020
In this study, the factors affecting consumers' intention to use AI speakers were focused on the perceived value of the product and the perceived necessity of the product. Factors affectationist consumers' perceived value of the product were divided into benefits and costs. Reflecting the characteristics of information technology products, I included perceptions of usefulness of products. Empirical results show that consumers' perceptions of perceived benefits and usefulness of AI speaker products have a positive effect on perceived value and perceived necessity. Perception of necessity had a positive (+) significant effect on perception of value. Perception of necessity and perception of value had a positive(+) and positive effect on each intention of use. However, the cost perceived by consumers did not have a significant effect on perception of value.
https://doi.org/10.9708/jksci.2020.25.01.199 인용 PDF KSCI

Customer Attitude to Artificial Intelligence Features: Exploratory Study on Customer Reviews of AI Speakers (인공지능 속성에 대한 고객 태도 변화: AI 스피커 고객 리뷰 분석을 통한 탐색적 연구)

Lee, Hong Joo
- Knowledge Management Research
- /
- v.20 no.2
- /
- pp.25-42
- /
- 2019
AI speakers which are wireless speakers with smart features have released from many manufacturers and adopted by many customers. Though smart features including voice recognition, controlling connected devices and providing information are embedded in many mobile phones, AI speakers are sitting in home and has a role of the central en-tertainment and information provider. Many surveys have investigated the important factors to adopt AI speakers and influ-encing factors on satisfaction. Though most surveys on AI speakers are cross sectional, we can track customer attitude toward AI speakers longitudinally by analyzing customer reviews on AI speakers. However, there is not much research on the change of customer attitude toward AI speaker. Therefore, in this study, we try to grasp how the attitude of AI speaker changes with time by applying text mining-based analysis. We collected the customer reviews on Amazon Echo which has the highest share of AI speakers in the global market from Amazon.com. Since Amazon Echo already have two generations, we can analyze the characteristics of reviews and compare the attitude ac-cording to the adoption time. We identified all sub topics of customer reviews and specified the topics for smart features. And we analyzed how the share of topics varied with time and analyzed diverse meta data for comparisons. The proportions of the topics for general satisfaction and satisfaction on music were increasing while the proportions of the topics for music quality, speakers and wireless speakers were decreasing over time. Though the proportions of topics for smart fea-tures were similar according to time, the share of the topics in positive reviews and importance metrics were reduced in the 2nd generation of Amazon Echo. Even though smart features were mentioned similarly in the reviews, the influential effect on satisfac-tion were reduced over time and especially in the 2nd generation of Amazon Echo.
https://doi.org/10.15813/kmr.2019.20.2.002 인용 PDF KSCI

Search Result 46, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)