Search | Korea Science

A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting (음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구)

Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.2
- /
- pp.85-95
- /
- 2003
In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.
PDF

The Change of the Length of Vocal Tract in Singers according to the Phonation at Different Levels of Pitch (성악인에서 발성 시 음의 높낮이에 따른 성도 길이의 변화)

Ban, Jae-Ho;Kim, Chang-Gyu;Lee, Sang-Hyuk;Lee, Kyung-Chul;Jin, Sung-Min
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.17 no.1
- /
- pp.14-16
- /
- 2006
Background and Objectives: The purpose of this study is to investigate the change of vocal tract length according to the level of the pitch by the singers. Materials and Methods: Fifteen tenors were asked to produce successive /a/ sound in G4(382Hz) for the head register, C3(131Hz) for the chest register and usual speaking sound. The control group consisted of 15 males of an similar age who are not professional singers. The length of vocal tract was calculated by applying the formula of Fn=(2n-1) c/4L(F : formant frequency, c : the speed of sound in the vocal tract(350m/sec), L : length of vocal tract, $n=1,2,3,4,{\ldots}{\infty}$). Results: In singer's group, there showed no significant statistical difference of length among head and chest register and usual speaking sound. However in the control group, there showed statistically significant difference of length. Comparison of the absolute difference in the length of vocal tract by changing level of pitch in phonation, between the control group and the singers group. Changing from G4 phonation to C3 phonation and C3 phonation to usual speaking sound showed statistically difference of vocal tract length was less in the singers group than the control group. Conclusion: The change of vocal tract length, in either speaking or singing, was less in singers than the control group. We could assume that the singers maintain their larynx position constantly throughout the pitch range when phonation.
PDF

A Study for the Changes of Laryngeal Position and Vocal Pitch with Ageing Process (연령에 따른 정상인의 후두 위치 및 발화 기저주파수의 변화에 대한 연구)

홍기환;김현기;정경수;윤희완;김성완
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.9 no.1
- /
- pp.79-85
- /
- 1998
Changes in the human voice occur between infancy and old age and reflect a myriad of biological changes that influence the size, shape, and physical properties of the larynx. The human larynx is located near the base of the neck and attached inferiorly to the trachea and opens superiorly into the pharynx. The larynx by the third month of fetal life has the same features recognizable at birth. The fundamental frequency of vocal fold vibration generally becomes higher in early age, lower in middle age, and higher in old age. These decreases in Fo undoubtedly result from a combination of factors, consisting of modest increase in length and mass of the muscle and connective tissues of the vocal fold. But the level of the larynx in the neck may be closely connected with Fo directly, high larynx in related with high pitch and low larynx with low pitch. The purpose of this study is to determine the developmental level difference from child to adult larynx using conventional radiography, and the change of speaking fundamental frequency from second decade to sixth decade.
PDF

Flattening Techniques for Pitch Detection (피치 검출을 위한 스펙트럼 평탄화 기법)

김종국;조왕래;배명진
- Proceedings of the IEEK Conference
- /
- 2002.06d
- /
- pp.381-384
- /
- 2002
In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.
PDF

Performance Assessment of Several Established Pitch Detection Algorithms in Voices of Benign Vocal Fold Lesions (양성후두 질환 음성에 대한 여러 기존 피치검출 알고리즘의 성능 평가)

Jang, Seung-Jin;Choi, Seong-Hee;Kim, Hyo-Min;Choi, Hong-Shik;Yoon, Young-Ro
- Proceedings of the IEEK Conference
- /
- 2007.07a
- /
- pp.407-408
- /
- 2007
Robust pitch estimation is an important study in many areas of speech processing. In voice pathology, diverse statistics extracted form pitch were commonly used to test voice quality. In this study, we compared several established pitch detection algorithms (PDAs) for verification of adequacy of the PDAs. In the database of total pathological voices of 99 and normal voices of 30, an analysis of errors related with pitch detection was evaluated between pathological and normal voices, or among the types of pathological voices such as benign vocal fold lesions; polyp, nodule, and cysts. Consequently, it is required to survey the severity of tested voice in order to obtain accurate pitch estimates.
PDF

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

Kim Dong-Hyun;Hong Kwang-Seok
- Journal of Internet Computing and Services
- /
- v.4 no.3
- /
- pp.9-14
- /
- 2003
The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.
PDF

Analysis of Singing Technique of Mongolian Traditional Singing Called Khoomei (몽골 전통 발성 흐미의 발성 방법 분석에 대한 사례연구)

Nam, Do-Hyun;Paik, Jae-Yeon;Hwang, Yoen-Shin;Choi, Hong-Shik
- Speech Sciences
- /
- v.15 no.3
- /
- pp.145-156
- /
- 2008
The goal of this study was to investigate acoustic and physiologic characteristics of two phonation types of 'Khoomei' which is a traditional singing style of people who live around the Altai mountains or Mongolia region. It can be produced two pitches simultaneously - high melody pitch can be perceived along with a low drone pitch. Sygyt and kargyraa styles are the most popular and identifiable styles and they can be recognized as the different sounds depending on the method of voice production. Two trained Mongolians participated and have used at least 5 - 6 years. The characteristics of this voice production were measured by using flexible fiberscope, Stroboscopy, Lx Speech studio, Spead, and Doctor Speech. In Sygyt style, very high vocal fold closure (71.50%) with both true and false vocal folds contact and strong breathing support was observed. They also showed that tongue height and harmonics were increased (around 10dB) with resonance cavity movement. In contrast, it was found that Kargyraa sound had very low pitch with relaxed stomach, less laryngeal tension and lower vocal fold contact (69.50%) than hard Sygyt style sound without raising the tongue during phonation. 'Khoomei' phonation can be made by strong contact of both true and false vocal folds and by increasing the harmonics as well.
PDF

A Study on the Perceptual Aspects of an Emotional Voice Using Prosody Transplantation (운율이식을 통해 나타난 감정인지 양상 연구)

Yi, So-Pae
- MALSORI
- /
- no.62
- /
- pp.19-32
- /
- 2007
This study investigated the perception of emotional voices by transplanting some or all of the prosodic aspects, i.e. pitch, duration, and intensity, of the utterances produced with emotional voices onto those with normal voices and vice versa. Listening evaluation by 24 raters revealed that prosodic effect was greater than segmental & vocal quality effect on the preception of the emotion. The degree of influence of prosody and that of segments & vocal quality varied according to the type of emotion. As for fear, prosodic elements had far greater influence than segmental & vocal quality elements whereas segmental and vocal elements had as much effect as prosody on the perception of happy voices. Different amount of contribution to the perception of emotion was found among prosodic features with the descending order of pitch, duration and intensity. As for the length of the utterances, the perception of emotion was more effective with long utterances than with short utterances.
PDF

Changes in Aerodynamic Function and Closed Quotient with the Variable Pitch and Loudness in Male Classic Singers (남성 성악가의 음도고정시 강도 변화와 강도고정시 음도 변화의 공기역학 및 성대접촉율의 변화)

Nam, Do-Hyun;Paik, Jae-Yeon;Kim, Jae-Ok;Park, Sun-Young;Choi, Hong-Shik
- Speech Sciences
- /
- v.14 no.2
- /
- pp.23-33
- /
- 2007
This study examined the aerodynamic functions (mean airflow rate MFR, subglottal pressure Psub) and closed quotients (CQs) in the fixed pitch (C3, E3, G3, C4) with the variable loudness (70 and 80 dB) as well as in the fixed loudness at 70 dB and 80 dB with the variable pitch (C3, E3, G3, C4) in five classic male singers (Baritone). Results showed that MFR significantly increased at C3, E3, and G3 and Psub significantly increased at C4 when the loudness increased from 70 to 80 dB. At 70 dB, MFR and Psub significantly increased and CQ significantly decreased when the pitch increased from C3 to C4. At 80 dB, MFR significantly decreased when the pitch increased from C3 to G3. However, Psub showed the significant decrease with the pitch increased at 80 dB. In conclusion, as the loudness increases, the aerodynamic loss is getting higher and vocal efficiency becomes lower at low pitch than at higher pitch. At a low loudness level, the main mechanism to control loudness is the amount of medial compression of the vocal folds rather than the aerodynamic function. In addition, the aerodynamic function and medial compression of the vocal folds have a significant role in increasing the loudness level.
PDF

Acoustic Features of Oral Vowels in the Esophagus Speakers (식도음성의 모음종류에 따른 음향학적 특성)

Yun, Eunmi;Mok, Eunhee;Minh, Phan huu Ngoc;Hong, Kihwan
- Phonetics and Speech Sciences
- /
- v.7 no.4
- /
- pp.85-92
- /
- 2015
This study aimed to establish characteristics related to voice and speech through the natural base frequency analysis of esophagus vocalization. In the study, 8 subjects were selected for esophagus vocals, and 10 other subjects were selected for a control group. MDVP(Multi-dimensional Voice Program, Model 4800, USA, 2001), Multi Speech(Model 3700, Kaypantax, USA, 2008) were used as experiment equipment. The speech samples selected for evaluation were vowels and sentences (both declarative and interrogative). For acoustic analysis, the intonation form of fo, jitter, energy, shimmer, HNR, and intonation patterns of the speech sample were measured. The results were as follows: First, the natural intrinsic frequency of extended vowels in the esophagus vocal group was lower than the frequency in the normal vocal group. In particular, the intrinsic frequency difference for high vowel /i/ was much greater than the frequency difference for low vowel /a/. Second, the jitter values of the esophagus vocal group were higher than the control group. In particular, there was a large difference between the jitter values for /a/ and /i/, with the jitter values being highest for /i/. Third, there was no significant difference in vocal strength between the esophagus vocal patient group and the control group. Fourth, the shimmer values of the voices in the esophagus vocal group were higher than shimmer values in the control group. In particular, there was a large difference in shimmer values for low vowel /a/. Fifth, the HNR values of the esophagus vocal group were showed significantly lower than the control group. In particular, the largest difference in HNR values between the two groups was for high vowel /i/. Sixth, the pitch contours of interrogative and declarative sentences of the esophagus vocal patient group showed a different form or only had with small differences compared to the pitch contours of the normal vocal group, thus presenting an inconsistent pattern.
https://doi.org/10.13064/KSSS.2015.7.4.085 인용 PDF KSCI

Search Result 144, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)