• 제목/요약/키워드: Speech sound

검색결과 628건 처리시간 0.034초

MDVP와 Praat, Dr. Speech간의 음향학적 측정치에 관한 상관연구 (A Correlation Study among Acoustic Parameters of MDVP, Praat, and Dr. Speech)

  • 유재연;정옥란;장태엽;고도흥
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.29-36
    • /
    • 2003
  • The purposes of this study was to conduct a correlational analysis among $F_^{0}$, Jitter, Shimmer, and NHR (HNR), and NNE estimated by three speech analysis softwares, MDVP, Praat and Dr. Speech. Thirty females and 15 males with normal voice participated in the study. We used Sound Forge 6.0 to record their voice. MDVP, Praat and Dr. Speech were used to measure the acoustic parameters. The Pearson correlation coefficient was determined through a statistical analysis. The results came out as follows: Firstly, there was a strong correlation between $F_^{0}$ and Shimmer of both instruments. However, there was no correlation between Jitter of both instruments. Secondly, Shimmer showed a stronger correlation with HNR, NHR, and NNE than Jitter. Therefore, Shimmer was considered to be more useful and sensitive parameter to identify dysphonic voice compared to jitter.

  • PDF

연장음 길이에 따른 비유창성 정도 평가: 등간척도와 직접크기평정 비교 연구 (The perceptual judgment of sound prolongation: Equal-appearing interval and direct magnitude estimation)

  • 박진;차화정;배세진
    • 말소리와 음성과학
    • /
    • 제15권3호
    • /
    • pp.59-67
    • /
    • 2023
  • 본 연구는 연장음의 길이에 따른 비유창성 지각 정도에 대해 각각 등간척도와 직접크기평정을 통한 청지각적 평가를 실시한 후, 두 평가의 결과치가 선형적인 또는 비선형적인 관계를 보이는지를 알아보고자 진행되었다. 이를 통해 연장음의 길이에 따른 비유창성 지각 정도에 대한 적절한 평가 방법을 제안하고자 하였다. 이를 위해 한국어를 모국어로 하는 만 19세 이상 성인 남녀 34명(남: 9명, 여: 25명, 평균연령: 32.9세)이 평가자로 참여였다. 실험참여자는 먼저 한국어 평마찰음 /s/를 원래 길이에서 80 ms씩 연장하여 2,000 ms(i.e., 285 ms, 365 ms., ..., 2,125 ms, 2,205 ms)까지 연장 변조한 총 25개의 자극이 들어 있는 문장을 듣고, 등간척도(1-7점, 1은 '정상', 7은 '심도')로 평가하였다. 이후에 등간척도 평가 결과, '경중도'(4점)에 해당하는 음성샘플을 선정해 이를 기준 평가치(modulus)로 하여 직접크기평정을 실시하였다. 두 평가 결과치에 대한 산포도를 작성한 후, 모형 분석을 통해 두 측정치 간의 관계가 선형적(linear)인지 비선형적(curvilinear)인지 R2값을 통해 조사하였다. 연구 결과, 두 평가 결과치의 관계가 비선형적인 양상을 보이는 것으로 나타났으며 이는 연장음의 길이에 따른 비유창성 정도 평가에 있어 등간척도보다는 직접크기평정이 적절한 평가 방법임을 보여주는 결과이다.

Improved Melody Recognition Performance of a Cochlear Implant Speech Processing Strategy Using Instantaneous Frequency Encoding Based on Teager Energy Operator

  • Choi, Sung-Jin;Ryu, Sang-Baek;Kim, Kyung-Hwan
    • 대한의용생체공학회:의공학회지
    • /
    • 제31권6호
    • /
    • pp.417-426
    • /
    • 2010
  • We present a speech processing strategy incorporating instantaneous frequency (IF) encoding for the enhancement of melody recognition performance of cochlear implants. For the IF extraction from incoming sound, we propose the use of a Teager energy operator (TEO), which is advantageous for its lower computational load. From time-frequency analysis, we verified that the TEO-based method provides proper IF encoding of input sound, which is crucial for melody recognition. Similar benefit could be obtained also from the use of a Hilbert transform (HT), but much higher computational cost was required. The melody recognition performance of the proposed speech processing strategy was compared with those of a conventional strategy using envelope extraction, and the HT-based IF encoding. Hearing tests on normal subjects were performed using acoustic simulation and a musical contour identification task. Insignificant difference in melody recognition performance was observed between the TEO-based and HT-based IF encodings, and both were superior to the conventional strategy. However, the TEO-based strategy was advantageous considering that it was approximately 35% faster than the HT-based strategy.

몽골 전통 발성 흐미의 발성 방법 분석에 대한 사례연구 (Analysis of Singing Technique of Mongolian Traditional Singing Called Khoomei)

  • 남도현;백재연;황연신;최홍식
    • 음성과학
    • /
    • 제15권3호
    • /
    • pp.145-156
    • /
    • 2008
  • The goal of this study was to investigate acoustic and physiologic characteristics of two phonation types of 'Khoomei' which is a traditional singing style of people who live around the Altai mountains or Mongolia region. It can be produced two pitches simultaneously - high melody pitch can be perceived along with a low drone pitch. Sygyt and kargyraa styles are the most popular and identifiable styles and they can be recognized as the different sounds depending on the method of voice production. Two trained Mongolians participated and have used at least 5 - 6 years. The characteristics of this voice production were measured by using flexible fiberscope, Stroboscopy, Lx Speech studio, Spead, and Doctor Speech. In Sygyt style, very high vocal fold closure (71.50%) with both true and false vocal folds contact and strong breathing support was observed. They also showed that tongue height and harmonics were increased (around 10dB) with resonance cavity movement. In contrast, it was found that Kargyraa sound had very low pitch with relaxed stomach, less laryngeal tension and lower vocal fold contact (69.50%) than hard Sygyt style sound without raising the tongue during phonation. 'Khoomei' phonation can be made by strong contact of both true and false vocal folds and by increasing the harmonics as well.

  • PDF

일본어 모음 무성화의 통시적 변화 (Diachronic Change of High Vowel Devoicing in Japanese Dialects)

  • 변희경
    • 말소리와 음성과학
    • /
    • 제5권4호
    • /
    • pp.171-184
    • /
    • 2013
  • This study investigated the devoicing rate of Japanese high vowels, focusing on regional and generational differences by acoustically analyzing vowels from two large speech databases. The first speech database used in this study was collected between 1986 and 1988 from 41 areas (prefectures) which included 607 participants (299 high school students and 308 their grandparents). The second was taken from a 2006-2007 collection from seven areas as a follow-up investigation to the first database consisting of 463 participants ranging in age from 8-90 year olds. The results revealed there is a generational as well as regional difference in the devoicing rate in almost all areas. Based on those results, a new distribution map reflecting a current devoicing rate of the younger generation was presented. Furthermore, by comparing the two data sets, this study confirmed age difference in the devoicing rate is not age-grading but a sound change in progress. This study discusses the social factors for changes in the devoicing rate of some areas and then applies the devoicing rate of five areas to an S-curve model to predict the future devoicing rate.

일반 및 말소리장애 아동의 탈비음화 오류패턴 (Denasalization error pattern for typically developing and SSD children)

  • 김민정
    • 말소리와 음성과학
    • /
    • 제7권2호
    • /
    • pp.3-8
    • /
    • 2015
  • Denasalization that nasals are replaced by stops is an unusual error pattern related to manner of articulation. The purpose of this study is to investigate the prevalence of denasalization and to scrutinize the nasal production according to phonological context for typically developing children and children with speech sound disorders(SSD). 220 typically developing children and 48 SSD children from 2~6 years of age were tested with a formal word test, and those who demonstrate denasalization were selected. In addition, the nasal production of SSD children with denasalization were analyzed for the correctness and the error types using the formal word test and spontaneous conversation. The results were as follows: (1) Denasalization was shown in below 10% of 2-3 years of age with typically developing children and in above 20% of 2-5 years of age with SSD. (2) The SSD children who demonstrate denasalization were categorized into 4 types according to the error context of nasals; nasal errors with all word positions, nasal errors with word-final and word-medial positions, nasal errors with word-medial position preceding vowels, and nasal errors with word-medial position preceding obstruents. These results indicate that denasalization is a clinically important error pattern, and word-medial position preceding obstruents is an essential context for denasalization in terms of Korean phonotactics.

음성정보 내용분석을 통한 골프 동영상에서의 선수별 이벤트 구간 검색 (Retrieval of Player Event in Golf Videos Using Spoken Content Analysis)

  • 김형국
    • 한국음향학회지
    • /
    • 제28권7호
    • /
    • pp.674-679
    • /
    • 2009
  • 본 논문은 골프 동영상에 포함된 오디오 정보로부터 검출된 이벤트 사운드 구간과 골프 선수이름이 포함된 음성구간을 결합하여 선수별 이벤트 구간을 검색하는 방식을 제안한다. 전체적인 시스템은 동영상으로부터 분할된 오디오 스트림으로부터 잡음제거, 오디오 구간분할, 음성 인식 등의 과정을 통한 자동색인 모듈과 사용자가 텍스트로 입력한 선수 이름을 발음열로 변환하고, 색인된 데이터베이스에서 질의된 선수 이름과 상응하는 음성구간과 연결되는 이벤트 구간을 찾아주는 검색 모듈로 구성된다. 선수이름 검색을 위해서 본 논문에서는 음소 기반, 단어 기반, 단어와 음소를 결합한 하이브리드 방식을 적용한 선수별 이벤트 구간 검색결과를 비교하였다.

SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE

  • Lee, Kyu-Keon;Mochida, Takemi;Sakurai, Naohiro;Shirai, Katasuhiko
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
    • /
    • pp.949-956
    • /
    • 1994
  • In this paper, we introduce a new speech synthesis method for Japanese and Korean arbitrary sentences using the natural speech data-base. Also, application of this method to a CAI system is discussed. In our synthesis method, a basic sentence and basic accent-phrases are selected from the data-base against a target sentence. Factors for those selections are phrase dependency structure (separation degree), number of morae, type of accent and phonemic labels. The target pitch pattern and phonemic parameter series are generated using those selected basic units. As the pitch pattern is generated using patterns which are directly extracted form real speech, it is expected to be more natural than any other pattern which is estimated by any model. Until now, we have examined this method on Japanese sentence speech and affirmed that the synthetic sound preserves human-like features fairly well. Now we extend this method to Korean sentence speech synthesis. Further more, we are trying to apply this synthesis unit to a CAI system.

  • PDF

통신환경에서 음성인식 인터페이스 (Speech Recognition Interface in the Communication Environment)

  • 한태근;김종근;이동욱
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2001년도 하계학술대회 논문집 D
    • /
    • pp.2610-2612
    • /
    • 2001
  • This study examines the recognition of the user's sound command based on speech recognition and natural language processing, and develops the natural language interface agent which can analyze the recognized command. The natural language interface agent consists of speech recognizer and semantic interpreter. Speech recognizer understands speech command and transforms the command into character strings. Semantic interpreter analyzes the character strings and creates the commands and questions to be transferred into the application program. We also consider the problems, related to the speech recognizer and the semantic interpreter, such as the ambiguity of natural language and the ambiguity and the errors from speech recognizer. This kind of natural language interface agent can be applied to the telephony environment involving all kind of communication media such as telephone, fax, e-mail, and so on.

  • PDF