• 제목/요약/키워드: voice parameter

검색결과 179건 처리시간 0.032초

HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상 (Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter)

  • 이지연;정상배;최흥식;한민수
    • 대한음성학회지:말소리
    • /
    • 제66호
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF

DHMM과 어휘해석을 이용한 Voice dialing 시스템 (The Voice Dialing System Using Dynamic Hidden Markov Models and Lexical Analysis)

  • 최성호;이강성;김순협
    • 전자공학회논문지B
    • /
    • 제28B권7호
    • /
    • pp.548-556
    • /
    • 1991
  • In this paper, Korean spoken continuous digits are ercognized using DHMM(Dynamic Hidden Markov Model) and lexical analysis to provide the base of developing voice dialing system. After segmentation by phoneme unit, it is recognized. This system can be divided into the segmentation section, the design of standard speech section, the recognition section, and the lexical analysis section. In the segmentation section, it is segmented using the ZCR, O order LPC cepstrum, and Ai, parameter of voice speech dectaction, which is changed according to time. In the standard speech design section, 19 phonemes or syllables are trained by DHMM and designed as a standard speech. In the recognition section, phomeme stream are recognized by the Viterbi algorithm.In the lexical decoder section, finally recognized continuous digits are outputed. This experiment shiwed the recognition rate of 85.1% using data spoken 7 times of 21 classes of 7 continuous digits which are combinated all of the occurence, spoken by 10 man.

  • PDF

음원 파라미터 모델과 인공신경망을 이용한 음성장애 검출 (Screening of Voice Disorder using Source Parameter Model and Artificial Neural Network)

  • 파벨시틸;조철우;미샤파벨
    • 음성과학
    • /
    • 제15권2호
    • /
    • pp.89-97
    • /
    • 2008
  • There is a number of clinical conditions that affect directly or indirectly the physical properties of the vocal folds and thereby the pressure waveforms of elicited sounds. If the relationships between the clinical conditions and the voice quality are sufficiently reliable, it should be possible to detect these diseases or disorders. The focus of this paper is to determine the set of features and their values that would characterize the speaker's state of vocal folds. To the extent that these features can capture the anatomical, physiological, and neurological aspects of the speaker they can be potentially used to mediate an unobtrusive approach to diagnosis. We will show a new approach to this problem supported with results obtained from two disordered voice corpora.

  • PDF

남녀 음성 변환 기술연구 (A Study On Male-To-Female Voice Conversion)

  • 최정규;김재민;한민수
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2000년도 하계학술발표대회 논문집 제19권 1호
    • /
    • pp.115-118
    • /
    • 2000
  • Voice conversion technology is essential for TTS systems because the construction of speech database takes much effort. In this paper. male-to-female voice conversion technology in Korean LPC TTS system has been studied. In general. the parameters for voice color conversion are categorized into acoustic and prosodic parameters. This paper adopts LSF(Line Spectral Frequency) for acoustic parameter, pitch period and duration for prosodic parameters. In this paper. Pitch period is shortened by the half, duration is shortened by $25\%, and LSFs are shifted linearly for the voice conversion. And the synthesized speech is post-filtered by a bandpass filter. The proposed algorithm is simpler than other algorithms. for example, VQ and Neural Net based methods. And we don't even need to estimate formant information. The MOS(Mean Opinion Socre) test for naturalness shows 2.25 and for female closeness, 3.2. In conclusion, by using the proposed algorithm. male-to-female voice conversion system can be simply implemented with relatively successful results.

  • PDF

파킨슨증의 음성진전 : 감별진단을 위한 예비연구 (Voice Tremor in Parkinsonism : A Preliminary Study for Differential Diagnosis)

  • 최성희;김향희;이원용;최홍식
    • 음성과학
    • /
    • 제12권3호
    • /
    • pp.19-33
    • /
    • 2005
  • Tremor is a main factor of parkinsonism. Voice tremor may be the first, later or the only symptom of a neurological disease and its frequency, amplitude, and regularity may differ among the diseases of different neural subsystems. Differential diagnosis between idiopathic Parkinson's disease (IPD) and multiple system atrophy (MSA) has been difficult. This study included three groups: (1) 6 IPD patients; (2) 6 MSA patients; and (3) 20 ageand sex-matched normal controls. The MDVP (Multidimensional Voice Program) was used to analyze the sustained /a/phonation. The results were as follows: (1) frequency perturbation parameters (jitter, sPPQ, Vf0) and FTRI of tremor parameter of two patient groups were statistically different from those of the controls (p < .01); (2) measures were higher in short-term and long-term f0 and amplitude perturbation in MSA than IPD; (3) however, any acoustic parameters between IPD and MSA were not statistically different; except for the rate of frequency tremor, 4$\sim$5 Hz in IPD, 5$\sim$11 Hz in MSA and (4) the pattern of regularity for voice tremor through histogram indicated that amplitude of IPD was irregular while both f0 and amplitude of MSA were irregular. In conclusion, F0, rate of frequency tremor, and pattern of f0 regularity may be predictors for differential diagnosis. These findings might signify that voice tremor of parkinsonism was resulted from modulation of f0.

  • PDF

음성 특징 파라메터를 이용한 모바일 기반의 OTP 설계 (Design of OTP based on Mobile Device using Voice Characteristic Parameter)

  • 차병래;김남호;김종원
    • 한국항행학회논문지
    • /
    • 제14권4호
    • /
    • pp.512-520
    • /
    • 2010
  • 유비쿼터스와 모바일의 광범위한 응용과 더불어 통신 보안은 최근 중요한 관심사가 되고 있다. 따라서 각각의 보안 요소마다 다양한 기법 및 응용에 대한 연구와 시스템적 응용에 대한 연구가 활발히 이루어지고 있다. 본 논문에서는 음성의 특징을 이용한 모바일 OTP의 일회용 암호키를 생성하는 방법을 제안한다. 본 연구는 강력한 개인 인증에 사용되는 바이오매트릭스의 음성 정보를 이용하여 가변적이고 안전한 일회용 암호 키를 생성하였으며, 또한 제안 기법에 대한 덴드로그램(dendrogram)을 이용한 음성 특징점에 의한 준동형적(homomorphic) 가변성 그리고 음성 특징점의 분포를 시뮬레이션 하였다.

음성 신호 분류에 따른 장애 음성의 변동률 분석, 비선형 동적 분석, 캡스트럼 분석의 유용성 (The Utility of Perturbation, Non-linear dynamic, and Cepstrum measures of dysphonia according to Signal Typing)

  • 최성희;최철희
    • 말소리와 음성과학
    • /
    • 제6권3호
    • /
    • pp.63-72
    • /
    • 2014
  • The current study assessed the utility of acoustic analyses the most commonly used in routine clinical voice assessment including perturbation, nonlinear dynamic analysis, and Spectral/Cepstrum analysis based on signal typing of dysphonic voices and investigated their applicability of clinical acoustic analysis methods. A total of 70 dysphonic voice samples were classified with signal typing using narrowband spectrogram. Traditional parameters of %jitter, %shimmer, and signal-to-noise ratio were calculated for the signals using TF32 and correlation dimension(D2) of nonlinear dynamic parameter and spectral/cepstral measures including mean CPP, CPP_sd, CPPf0, CPPf0_sd, L/H ratio, and L/H ratio_sd were also calculated with ADSV(Analysis of Dysphonia in Speech and VoiceTM). Auditory perceptual analysis was performed by two blinded speech-language pathologists with GRBAS. The results showed that nearly periodic Type 1 signals were all functional dysphonia and Type 4 signals were comprised of neurogenic and organic voice disorders. Only Type 1 voice signals were reliable for perturbation analysis in this study. Significant signal typing-related differences were found in all acoustic and auditory-perceptual measures. SNR, CPP, L/H ratio values for Type 4 were significantly lower than those of other voice signals and significant higher %jitter, %shimmer were observed in Type 4 voice signals(p<.001). Additionally, with increase of signal type, D2 values significantly increased and more complex and nonlinear patterns were represented. Nevertheless, voice signals with highly noise component associated with breathiness were not able to obtain D2. In particular, CPP, was highly sensitive with voice quality 'G', 'R', 'B' than any other acoustic measures. Thus, Spectral and cepstral analyses may be applied for more severe dysphonic voices such as Type 4 signals and CPP can be more accurate and predictive acoustic marker in measuring voice quality and severity in dysphonia.

CDMA 역방향 링크에서 OPEN LOOP 전력제어 알고리즘 분석 (Analysis of OPEN LOOP Power Control in CDMA Reverse Link)

  • 이철희;박종안
    • 한국통신학회논문지
    • /
    • 제22권4호
    • /
    • pp.804-811
    • /
    • 1997
  • In the CDMA mobile communication system, reverse power control can be used to minimize the interference level for a good quality of the voice channel, and used to maxmize the system capacity. In this paper, we have analyed the environment of the K-parameter and the access procedure for the mobile station, and proposed a new algorithm for the access probe procedure of the station. K-parameter is determined according to the environment of the base station and access probe can adaptively control the power according to the position changes of the mobile station or the rapid and various state changes of the channel. Simulation results in the limited test environment show that it can increase the system capacity and decrease the power comsumption of the mobile station while maintaining the good and stable quality of the voice channel.

  • PDF

Transformer 네트워크를 이용한 음성신호 변환 (Voice-to-voice conversion using transformer network)

  • 김준우;정호영
    • 말소리와 음성과학
    • /
    • 제12권3호
    • /
    • pp.55-63
    • /
    • 2020
  • 음성 변환은 다양한 음성 처리 응용에 적용될 수 있으며, 음성 인식을 위한 학습 데이터 증강에도 중요한 역할을 할 수 있다. 기존의 방법은 음성 합성을 이용하여 음성 변환을 수행하는 구조를 사용하여 멜 필터뱅크가 중요한 파라미터로 활용된다. 멜 필터뱅크는 뉴럴 네트워크 학습의 편리성 및 빠른 연산 속도를 제공하지만, 자연스러운 음성파형을 생성하기 위해서는 보코더를 필요로 한다. 또한, 이 방법은 음성 인식을 위한 다양한 데이터를 얻는데 효과적이지 않다. 이 문제를 해결하기 위해 본 논문은 원형 스펙트럼을 사용하여 음성 신호 자체의 변환을 시도하였고, 어텐션 메커니즘으로 스펙트럼 성분 사이의 관계를 효율적으로 찾아내어 변환을 위한 자질을 학습할 수 있는 transformer 네트워크 기반 딥러닝 구조를 제안하였다. 영어 숫자로 구성된 TIDIGITS 데이터를 사용하여 개별 숫자 변환 모델을 학습하였고, 연속 숫자 음성 변환 디코더를 통한 결과를 평가하였다. 30명의 청취 평가자를 모집하여 변환된 음성의 자연성과 유사성에 대해 평가를 진행하였고, 자연성 3.52±0.22 및 유사성 3.89±0.19 품질의 성능을 얻었다.

음성질환자의 음성검사 시 강도 증가에 따른 음향학적 지표의 변화 (Changes in Acoustic Parameters According to Intensity Increase in Voice Assessment)

  • 남도현;임성수;윤보람;조선아;최홍식
    • 대한후두음성언어의학회지
    • /
    • 제22권2호
    • /
    • pp.143-150
    • /
    • 2011
  • Background and Objectives : Clinically, as a tool for voice assessment before and after the operation or the voice treatment, acoustic analysis is widely used. However, in clinical situations, acoustic parameters vary according to how the assessment is made. Thus, with voice disease patients as subjects, we are to investigate what influence intensity increase exerts on acoustic parameters and how to reduce variation according to the way of assessing. Material and Method : At the voice clinic of the department of otorhinolaryngology in Gangnam Severance Hospital, with 30 female voice-disease patients (40.6 years old on the average) and 23 male voice-disease patients (40.1 years old on the average) as subjects, using the Dr Speech vocal-assessment program, we statistically tested the significance of the difference in each of acoustic parameters between when the "Ah" vowel is produced with a normal voice and when the "Ah" vowel is produced with a loud voice. Results : Acoustic parameters that showed a statistically significant difference according to intensity increase were Jitter, SD F0, and NNE for females, and Jitter, SD F0, HNR, SNR, and NNE for males. Voice quality estimates showed a statistically significant difference according to intensity increase in female hoarse voice, female breathy voice, and male breathy voice. Conclusion : In this research, acoustic analysis, which is generally used for voice assessment before and after the operation or the voice treatment, showed a tendency that acoustic parameters became better under the influence of intensity increase except for the cases where a voice disease was severe. Thus, to raise the reliability of voice assessment, the range of intensity needs to be set up. This should be the topic for the future research.

  • PDF