• Title/Summary/Keyword: Speaker

Search Result 1,679, Processing Time 0.027 seconds

Comparison of Speaker's Source Characteristics in Different Recording Environments by Using Phonation Type Index k (녹음 환경의 차이에 따른 화자의 음원 특성 비교: 발성유형지수 k를 중심으로)

  • Lee, Hoo-Dong;Kang, Sun-Mee;Park, Han-Sang;Chang, Moon-Soo
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.213-224
    • /
    • 2003
  • Spoken sound includes not only speaker's source but the characteristics of vocal tract and speech radiation. This paper is based on the theory of Park[1], who proposes the Phonation Type Index k; a variable that shows the characteristic of speaker's source excluding those of speaker's vocal tract and speech radiation. With Park's theory, we collect data by changing recording environments and expanding experimental data, and analyze the data collected to see whether or not the PTI k shows good discriminating power as a variable for speaker recognition. In the experiment, we repeatedly record 8 sentences ten times for each of 5 males in the environment of a recording room and an office, extract PTI k for each speaker, and measure the discriminating power for each speaker by using the value of PTI k. The result shows that PTI k has the excellent discriminating power of speakers. We also confirm that, even if the recording environment is changed, PTI k shows similar results.

  • PDF

Effects of AI Speaker Users' Usage Motivations and Perception of Relationship Type with AI Speaker on Enjoyment (AI 스피커 이용자의 이용동기 및 AI 스피커에 대한 관계 유형 인식이 즐거움에 미치는 영향)

  • Jang, Yei-Beech
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.11
    • /
    • pp.558-566
    • /
    • 2019
  • Artificial intelligent (AI) smart speaker sales have increased rapidly, and AI technology has become more pervasive in our daily lives. This study explored motivations for smart speaker use and examined how motivation and relationship type with AI speakers affect enjoyment. Smart speaker use is primarily motivated by conversational, trend-leading, efficient, and entertaining factors. Among these four, trend-leading, efficient, and entertaining factors positively influenced users' enjoyment. However, among the three types of relationship with AI speakers, only the assistant/helper type affected enjoyment. The results of the current study provide practical implications for future directions in AI speaker interaction design.

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.741-751
    • /
    • 2011
  • This paper focuses on online speaker tracking for telephone conversations and broadcast news. Since the online applicability imposes some limitations on the tracking strategy, such as data insufficiency, a reliable approach should be applied to compensate for this shortage. In this framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigenvoice decomposition space are proposed in this paper. We believe that the eigenvoice adaptation techniques would help to embed the speaker space in the models and hence enrich the generality of the selected speaker models. Also, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on 2002 Rich Transcription Broadcast News and Conversational Telephone Speech corpus as well as a synthetic dataset. The indexing errors of the proposed framework on telephone conversations, broadcast news, and synthetic dataset are 8.77%, 9.36%, and 12.4%, respectively. Using the index tree structure approach, the run time of the proposed framework is improved by 22%.

Speaker Recognition Using Optimal Path and Weighted Orthogonal Parameters (최적경로와 가중직교인자를 이용한 화자인식)

  • 남기환;배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.7
    • /
    • pp.1539-1544
    • /
    • 2003
  • Recently, many researchers have studied the speaker recognition through the statistical processing method using Karhonen-Loeve Transform. However, the content of speaker's identity and the vocalization speed cause speaker recognition rate to be lowered. This parer studies the speaker recognition method using weighted parameters which are weighted with eigen-values of speech so as to emphasize the speaker's identity and optimal path which is made by DWP so as to normalize dynamic time feature of speech. To confirm this method, we compare the speaker recognition rate from this proposed method with that from the conventional statistical processing method. As a result, it is shown that this method is more excellent in speaker recognition rate than conventional method.

Multidisciplinary Design Optimization for Acoustic Characteristics of a Speaker Diaphragm (스피커 진동판의 음향특성 다분야통합최적설계)

  • Kim, Sung-Kuk;Lee, Tae-Hee;Lee, Surk-Soon
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.11a
    • /
    • pp.763-766
    • /
    • 2004
  • Recently, various acoustic artifacts that contains speaker have been produced such as cellular phone. Speaker consists of diaphragm generating sound and coil vibrating diaphragm. Generally, good speaker means that it has a wide frequency range, high output power rate to input power and flat sound pressure level in specified frequency range. Acoustic characteristic was estimated through the experiment and computer simulation, or sound power was controlled with acoustic sensitivity in a natural frequency range fer last decade. However, the flatness of sound pressure level has not been considered to enhance the sound quality of a speaker. Tn this study, a method for speaker design is proposed for a good acoustic characteristic, which is flatness of SPL(sound pressure level) and wideness between the first and second natural frequency. SYSNOISE is used fer acoustic analysis and ANSYS is used for harmonic response analysis and modal analysis. Optimization for acoustic characteristics of a speaker diaphragm is performed using ModelCenter. All analyses are done within a frequency domain. And we confirm that the experimental and computational simulations have similar trend.

  • PDF

Speaker Verification Using SVM Kernel with GMM-Supervector Based on the Mahalanobis Distance (Mahalanobis 거리측정 방법 기반의 GMM-Supervector SVM 커널을 이용한 화자인증 방법)

  • Kim, Hyoung-Gook;Shin, Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.216-221
    • /
    • 2010
  • In this paper, we propose speaker verification method using Support Vector Machine (SVM) kernel with Gaussian Mixture Model (GMM)-supervector based on the Mahalanobis distance. The proposed GMM-supervector SVM kernel method is combined GMM with SVM. The GMM-supervectors are generated by GMM parameters of speaker and other speaker utterances. A speaker verification threshold of GMM-supervectors is decided by SVM kernel based on Mahalanobis distance to improve speaker verification accuracy. The experimental results for text-independent speaker verification using 20 speakers demonstrates the performance of the proposed method compared to GMM, SVM, GMM-supervector SVM kernel based on Kullback-Leibler (KL) divergence, and GMM-supervector SVM kernel based on Bhattacharyya distance.

A Speaker Detection System based on Stereo Vision and Audio (스테레오 시청각 기반의 화자 검출 시스템)

  • An, Jun-Ho;Hong, Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.11 no.6
    • /
    • pp.21-29
    • /
    • 2010
  • In this paper, we propose the system which detects the speaker, who is speaking currently, among a number of users. A proposed speaker detection system based on stereo vision and audio is mainly composed of the followings: a position estimation of speaker candidates using stereo camara and microphone, a current speaker detection, and a speaker information acquisition based on a mobile device. We use the haar-like features and the adaboost algorithm to detect the faces of speaker candidates with stereo camera, and the position of speaker candidates is estimated by a triangulation method. Next, the Time Delay Of Arrival (TDOA) is estimated by the Cross Power Spectrum Phase (CPSP) analysis to find the direction of source with two microphone. Finally we acquire the information of the speaker including his position, voice, and face by comparing the information of the stereo camera with that of two microphone. Furthermore, the proposed system includes a TCP client/server connection method for mobile service.

Inter-speaker and intra-speaker variability on sound change in contemporary Korean

  • Kim, Mi-Ryoung
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.25-32
    • /
    • 2017
  • Besides their effect on the f0 contour of the following vowel, Korean stops are undergoing a sound change in which a partial or complete consonantal merger on voice onset time (VOT) is taking place between aspirated and lax stops. Many previous studies on sound change have mainly focused on group-normative effects, that is, effects that are representative of the population as a whole. Few systematic quantitative studies of change in adult individuals have been carried out. The current study examines whether the sound change holds for individual speakers. It focuses on inter-speaker and intra-speaker variability on sound change in contemporary Korean. Speech data were collected for thirteen Seoul Korean speakers studying abroad in America. In order to minimize the possible effects of speech production, socio-phonetic factors such as age, gender, dialect, speech rate, and L2 exposure period were controlled when recruiting participants. The results showed that, for nine out of thirteen speakers, the consonantal merger is taking place between the aspirated and lax stop in terms of VOT. There were also intra-speaker variations on the merger in three aspects: First, is the consonantal (VOT) merger between the two stops is in progress or not? Second, are VOTs for aspirated stops getting shorter or not (i.e., the aspirated-shortening process)? Third, are VOTs for lax stops getting longer or not (i.e., the lax-lengthening process)? The results of remarkable inter-speaker and intra-speaker variability indicate a synchronous speech sound change of the stop system in contemporary Korean. Some speakers are early adopters or active propagators of sound change whereas others are not. Further study is necessary to see whether the inter-speaker differences exceed intra-speaker differences in sound change.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.