Search | Korea Science

A Study on the Speaker Adaptation in CDHMM (CDHMM의 화자적응에 관한 연구)

Kim, Gwang-Tae
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.39 no.2
- /
- pp.116-127
- /
- 2002
A new approach to improve the speaker adaptation algorithm by means of the variable number of observation density functions for CDHMM speech recognizer has been proposed. The proposed method uses the observation density function with more than one mixture in each state to represent speech characteristics in detail. The number of mixtures in each state is determined by the number of frames and the determinant of the variance, respectively. The each MAP Parameter is extracted in every mixture determined by these two methods. In addition, the state segmentation method requiring speaker adaptation can segment the adapting speech more Precisely by using speaker-independent model trained from sufficient database as a priori knowledge. And the state duration distribution is used lot adapting the speech duration information owing to speaker's utterance habit and speed. The recognition rate of the proposed methods are significantly higher than that of the conventional method using one mixture in each state.
PDF KSCI

Development of the hybrid-type ultrasound speaker (하이브리드형 초음파 스피커 개발)

Lee, Hyoung-Sang;Kim, Bok-Kyu
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.3
- /
- pp.247-253
- /
- 2021
Directional ultrasonic speakers that are used to hear sound only in a specific area have been continuously researched on various improvements in terms of sound quality and cost compared to general speakers. In this paper, we propose a DSP based hybrid-type ultrasonic speaker that can be heard at the same time as a general speaker in order to compensate for the sound in the low-band range, considering that it is difficult to hear the low-band sound below 500 Hz due to the sensor characteristics of the ultrasonic speaker. In the case of the system that is implemented by simply connecting a general speaker and an ultrasonic speaker, there are issues of high cost and difficulties of control as two amplifiers are used to playback ultrasonic and general sound sources. In addition, sound quality deteriorates due to the difference in playback time between ultrasonic and general sound sources. In order to improve issues of cost, control and sound quality, we developed hybrid-type ultrasonic speaker with a DSP based amplifier that can simultaneously playback by synchronizing the general sound source with the regenerated ultrasonic sound source, in addition to implement the existing CODEC functions such as Dynamic Range Control (DRC) and Equalizer (EQ).
https://doi.org/10.7776/ASK.2021.40.3.247 인용 PDF KSCI

The Speaker Recognition System using the Pitch Alteration (피치변경을 이용한 화자인식 시스템)

Jung JongSoon;Bae MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.115-118
- /
- 2002
Parameters used in a speaker recognition system are desirable expressing speaker's characteristics filly and have in a speech. That is to say, if inter-speaker than intra-speaker variance a big characteristic, it is useful to distinguish between speakers. Also, to make minimum error between speakers, it is required the improved recognition technology as well as the distinguishing characteristics. When we see the result of recent simulation performance, we obtain more exact performance by using dynamic characteristics and constant characteristics by a speaking habit. Therefore we suggest it to solve this problem as followings. The prosodic information is used by a characteristic vector of speech. Characteristics vector generally using in speaker recognition system is a modeling spectrum information and is working for a high performance in non-noise circumstance. However, it is found a problem that characteristic vector is distorted in noise circumstance and it makes a reduction of recognition rate. In this paper, we change pitch line divided by segment which can estimate a dynamic characteristic and it is used as a recognition characteristic. we confirmed that the dynamic characteristic is very robust in noise circumstance with a simulation. We make a decision of acceptance or rejection by comparing test pattern and recognition rate using the proposed algorithm has more improvement than using spectrum and prosodic information. Especially stational recognition rate can be obtained in noise circumstance through the simulation.
PDF

Modified GMM Training for Inexact Observation and Its Application to Speaker Identification

Kim, Jin-Young;Min, So-Hee;Na, Seung-You;Choi, Hong-Sub;Choi, Seung-Ho
- Speech Sciences
- /
- v.14 no.1
- /
- pp.163-174
- /
- 2007
All observation has uncertainty due to noise or channel characteristics. This uncertainty should be counted in the modeling of observation. In this paper we propose a modified optimization object function of a GMM training considering inexact observation. The object function is modified by introducing the concept of observation confidence as a weighting factor of probabilities. The optimization of the proposed criterion is solved using a common EM algorithm. To verify the proposed method we apply it to the speaker recognition domain. The experimental results of text-independent speaker identification with VidTimit DB show that the error rate is reduced from 14.8% to 11.7% by the modified GMM training.
PDF

Performance comparison of Text-Independent Speaker Recognizer Using VQ and GMM (VQ와 GMM을 이용한 문맥독립 화자인식기의 성능 비교)

Kim, Seong-Jong;Chung, Hoon;Chung, Ik-Joo
- Speech Sciences
- /
- v.7 no.2
- /
- pp.235-244
- /
- 2000
This paper was focused on realizing the text-independent speaker recognizer using the VQ and GMM algorithm and studying the characteristics of the speaker recognizers that adopt these two algorithms. Because it was difficult ascertain the effect two algorithms have on the speaker recognizer theoretically, we performed the recognition experiments using various parameters and, as the result of the experiments, we could show that GMM algorithm had better recognition performance than VQ algorithm as following. The GMM showed better performance with small training data, and it also showed just a little difference of recognition rate as the kind of feature vectors and the length of input data vary. The GMM showed good recognition performance than the VQ on the whole.
PDF

Serial Transmission of Audio Signals for Multi-channel Speaker Systems (다채널 스피커 시스템을 위한 오디오 신호지 직렬 전송)

Kwon, Oh-Kyun;Song, Moon-Vin;Lee, Seung-Won;Lee, Young-Won;Chung, Yun-Mo
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.7
- /
- pp.387-394
- /
- 2005
In this paper, we propose a new transmission technique of audio signals for the serial connection of the speakers of multiple-channel audio systems. Analog audio signals from a multi-channel audio system are converted into digital signals with signal processing steps and transferred to each speaker through a serial line. The signal processing steps contain data compression and packet generation in association with audio signal characteristics. Each speaker gets its corresponding digital audio signals from the transmitted packets and converts the signals into analog audio signals to make sounds with the speaker All the proposed functions in this paper are modeled in VHDL. implemented with FPGA chips, and tested for actual multi-channel audio systems.
PDF KSCI

A Proposition of the Fuzzy Correlation Dimension for Speaker Recognition (화자인식을 위한 퍼지상관차원 제안)

Yoo, Byong-Wook;Kim, Chang-Seok;Park, Hyun-Sook
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.36S no.1
- /
- pp.115-122
- /
- 1999
In this paper, we confirmed that a speech signal is a chaos signal, and in order to use it as a speaker recognition parameter, analyzed chaos dimension. In order to raise speaker identification and pattern recognition, by making up the strange attractor involving an individual's vocal tract characteristics very well and applying fuzzy membership function to correlation dimension, we proposed fuzzy correlation dimension. By estimating the correlation of the points making up an attractor are limited according space dimension value, fuzzy correlation dimension absorbed the variation of the reference pattern attractor and test pattern attractor. Concerning fuzzy correlation dimension, by estimating the distance according to the average value of discrimination error per each speaker and reference pattern, investigated the validity of speaker recognition parameter.
PDF

A Study on the Educational Uses of Smart Speaker (스마트 스피커의 교육적 활용에 관한 연구)

Chang, Jiyeun
- Journal of the Korea Convergence Society
- /
- v.10 no.11
- /
- pp.33-39
- /
- 2019
Edutech, which combines education and information technology, is in the spotlight. Core technologies of the 4th Industrial Revolution have been actively used in education. Students use an AI-based learning platform to self-diagnose their needs. And get personalized training online with a cloud learning platform. Recently, a new educational medium called smart speaker that combines artificial intelligence technology and voice recognition technology has emerged and provides various educational services. The purpose of this study is to suggest a way to use smart speaker educationally to overcome the limitation of existing education. To this end, the concept and characteristics of smart speakers were analyzed, and the implications were derived by analyzing the contents provided by smart speakers. Also, the problem of using smart speaker was considered.
https://doi.org/10.15207/JKCS.2019.10.11.033 인용 PDF KSCI

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

Kang, Garam;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.27 no.2
- /
- pp.17-32
- /
- 2021
Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.
https://doi.org/10.13088/jiis.2021.27.2.017 인용 PDF KSCI

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

김유진;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.1
- /
- pp.61-74
- /
- 2004
In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.
PDF KSCI

Search Result 255, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)