• 제목/요약/키워드: Speaker Variation

Search Result 74, Processing Time 0.024 seconds

Compensation of low Frequency Resonance in Current Driven Loudspeakers using DSP (DSP를 이용한 전류구동 스피커의 저주파 공진 보상)

  • Park, Jong-phil;Eun, Changsoo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.584-588
    • /
    • 2021
  • The impedance of the speaker is likely to be recognized as a fixed value. However, speaker impedance continues to vary with frequency variation, especially larger in resonant frequency region. The sound pressure level of loudspeakers is determined by the current flowing throughout the coil that consists loudspeakers. If loudspeakers are driven by voltage, sound pressure level of the loudspeaker is distorted by the variation of loudspeaker impedance. Current-drive of loudspeakers can solve this problem, but distortion of sound pressure level occurs in low frequencies due to resonance. The distortion can degrade the sound quality of the sound system. So to solve this problem, In this paper, we propose a resonance compensation circuit using DSP. we simulates audio systems using an equivalent model of loudspeakers to verify distortion of sound pressure level due to impedance variation and propose a circuit to compensate it. The proposed circuit is configured using a state variable filter and it can adjust the center frequency and output, so it will be used various sound systems.

  • PDF

Integrated receptive field diversification method for improving speaker verification performance for variable-length utterances (가변 길이 입력 발성에서의 화자 인증 성능 향상을 위한 통합된 수용 영역 다양화 기법)

  • Shin, Hyun-seo;Kim, Ju-ho;Heo, Jungwoo;Shim, Hye-jin;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.319-325
    • /
    • 2022
  • The variation of utterance lengths is a representative factor that can degrade the performance of speaker verification systems. To handle this issue, previous studies had attempted to extract speaker features from various branches or to use convolution layers with different receptive fields. Combining the advantages of the previous two approaches for variable-length input, this paper proposes integrated receptive field diversification that extracts speaker features through more diverse receptive field. The proposed method processes the input features by convolutional layers with different receptive fields at multiple time-axis branches, and extracts speaker embedding by dynamically aggregating the processed features according to the lengths of input utterances. The deep neural networks in this study were trained on the VoxCeleb2 dataset and tested on the VoxCeleb1 evaluation dataset that divided into 1 s, 2 s, 5 s, and full-length. Experimental results demonstrated that the proposed method reduces the equal error rate by 19.7 % compared to the baseline.

Quantification of Glottal Cycle According to the Variation of Frequency and Intensity in Normal Speaker (발성의 강도와 주파수 변화에 따른 성대 움직임의 정량적 분석)

  • 손영익;이경아;류준선;백정환
    • Proceedings of the KSLP Conference
    • /
    • 1996.11a
    • /
    • pp.92-92
    • /
    • 1996
  • 비디오스트로보스코피 화상의 정량화를 통한 glottal cycle의 객관적인 평가는 여러 질환의 감별 및 치료전후의 결과를 비교하는데 중요한 역할을 담당할 수 있으리라 사려되나 아직은 정상 발성시나 병적인 조건에서의 참고치나 그 의미에 대하여 보고된 경우는 흔치 않은 실정이다. 이에 저자들은 정상성인을 대상으로 발성의 주파수와 강도의 변화에 따른 glottal cycle의 변화를 정량화 함으로써 추후 연구나 임상적용 둥의 기본자료로서 활용하고자 하였다. (중략)

  • PDF

Consideration on the Fuzzy Chaos Dimension for Speech Recognition (음성인식을 위한 퍼지 카오스 차원의 고찰)

  • Yoo, B.W.;Kim, S.K.;Park, H.S.;Kim, C.S.
    • Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.25-39
    • /
    • 1998
  • This paper deals with fuzzy correlation dimension for an appropriate speech recognition. The proposed fuzzy correlation dimension has absorbed time variation value of strange attractor as utilizing fuzzy membership function at calculation of integral correlation when the results of proposed dimension are applied to speech recognition fuzzed correlation dimension is superior to speech recognition, and correlation dimension is superior to speaker discrimination.

  • PDF

MODELING QUANTITATIVE VARIATION - In the Kyungnam Dialect of Korean -

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.137-152
    • /
    • 1997
  • The objectives of this paper are to see how the declination is realized in the different positions/lengths of the utterance, to see if the $F_0$ value throughout the utterance changes in a predictable way, and if so, to find out the best quantitative model which fits the declination. The experiment results are as follows. First, the peak value over the utterance can be affected by the position of the peak and length of the utterance. Second, the choice of quantitative models is dependent on the different list lengths. Third, in everyone's speech, there is a baseline (the lowest $F_0$ value a speaker can use), and the $F_0$ will not fall below the baseline. Forth, the peak $F_0$ of the last word in each list shows little variation in pitch value (target $F_0$) while the number of words in the list affects the starting $F_0$ values.

  • PDF

Pitch Patterns of Interrogative Sentences in relation to the Focus (초점과 관련된 의문문 억양 패턴 실험)

  • Kim, Mi-Ran;Shin, Dong-Hyun;Choe, Jae-Woong;Kim, Kee-Ho
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.203-217
    • /
    • 2000
  • In spoken language, the characteristics of prosodic realization are related to the meaning of utterance. The pitch pattern of an interrogative sentence which differs from that of declarative sentences can be considered in this respect.. If we consider the question-answer pair, we can find that the most important variation comes from the intended meaning of asking. In this paper, we experiment with four kinds of interrogative sentences and show that the difference in pitch patterns of interrogative sentences can be explained in relation to the focus phenomena that is, the differences of the boundary tones in interrogative sentences are due to the differences in the prosodic domain of focus. For a relevant explanation with the focus phenomena, we divided focus into the categories: emphatic focus, which plays a role in delivering the speaker's intended meaning for the sentence interpretation, and informational focus, delivers the central intended meaning of the utterance. The results can be summarized in three points. First, High boundary tone delivers the meaning of asking. Second, the realization of different boundary tones that are found in wh-question and alternative question are just phonetic variations caused by focusing. Third, the high rise boundary tone in echo questions is related to the meaning of surprise or incredulity, and this relation is a consensus of existing opinion, that is, the speaker's attitude of surprise can raise the pitch range. From these results we can distinguish between boundary type and phonetic variation, and we can also give appropriate meaning to the different boundary tones in interrogative sentences that have been regarded as merely a part of sentence type.

  • PDF

Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum (켑스트럼으로부터 변환된 로그 스펙트럼을 이용한 포먼트 평활화 켑스트럴 평균 차감법)

  • 김유진;정혜경;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2002
  • In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.

Measurement of the vocal tract area of vowels By MRI and their synthesis by area variation (MRI에 의한 모음의 성도 단면적 측정 및 면적 변이에 따른 합성 연구)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.19-34
    • /
    • 1998
  • The author collected and compared midsagittal, coronal, coronal oblique, and transversal images of Korean monophthongs /a, i, e, o, u, i, v/ produced by a healthy male speaker using 1.5 T MR, VISION. Area was measured by computer software after tracing the cross-section at different points along the tract. Results showed that the width of the oral and pharyngeal cavities varied compensatorily from each other on the midsagittal dimension. Formant frequency values estimated from the area functions of the seven vowels showed a strong correlation (r=0.978) with those analyzed from the spoken vowels. Moreover, almost all of 35 students who listened to the synthesized vowels from area data perceived the synthesized vowels as equivalent to the spoken ones. Movement of constriction points of vowel /u/ with wider lip opening sounded /i/ and led to slight changes in vowel quality. Jaw and tongue movement led to major volume variation with an anatomical limitation. Each comer vowel varied systematically from a somewhat constant volume of the average area. Thus, the author proposed that any simulation studies related to vocal tract area variation should reflect its constant volume. The results may be helpful to verify exact measurement of the vocal tract area through vowel synthesis and a simulation study before having any operation of the vocal tract.

  • PDF

Emotion Recognition using Robust Speech Recognition System (강인한 음성 인식 시스템을 사용한 감정 인식)

  • Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.5
    • /
    • pp.586-591
    • /
    • 2008
  • This paper studied the emotion recognition system combined with robust speech recognition system in order to improve the performance of emotion recognition system. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. Final emotion recognition is processed using the input utterance and its emotional model according to the result of speech recognition. In the experiment, robust speech recognition system is HMM based speaker independent word recognizer using RASTA mel-cepstral coefficient and its derivatives and cepstral mean subtraction(CMS) as a signal bias removal. Experimental results showed that emotion recognizer combined with speech recognition system showed better performance than emotion recognizer alone.

On a Study of Measurement Method of Utterance Velocity for the Reduction of Transmission Rate in CELP Vocoder. (LSP 파라미터를 이용한 발성측정법)

  • 장경아;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2000.11d
    • /
    • pp.199-202
    • /
    • 2000
  • Speaking Rate has variety depends on the situation and habit of speakers. It has been many studied about speaking rate In speaker recognition. The study of speaking rate in speech recognition is one of considerable matter when It is recognized the speakers and it is measured by many speech data base and complicate estimation for accuracy. In this paper, conventional vocoder process the speech signal when encoding and transmitting without regard to speaking rate so in order to apply the speaking rate for vocoder It should be considered the simpler algorithm and less computation amount than the conventional method of speaking rate used In speech recognition. We proposed the speaking rate algorithm which is used the simple parameter with Line Spectrum Pair (LSP). The proposed peaking rate method is measured by the information of LSP in speech. We measured the variety rate of phenomenon about utterances which have different velocity, respectively. As a result, It has distinct variation rate of phenomenon between utterances uttered fast and slow and the rate is 42.8% higher in case of uttered fast than in case of uttered slow.

  • PDF