• Title/Summary/Keyword: Syllable Frequency

Search Result 92, Processing Time 0.021 seconds

Exclusion of Non-similar Candidates using Positional Accuracy based on Levenstein Distance from N-best Recognition Results of Isolated Word Recognition (레벤스타인 거리에 기초한 위치 정확도를 이용한 고립 단어 인식 결과의 비유사 후보 단어 제외)

  • Yun, Young-Sun;Kang, Jeom-Ja
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.109-115
    • /
    • 2009
  • Many isolated word recognition systems may generate non-similar words for recognition candidates because they use only acoustic information. In this paper, we investigate several techniques which can exclude non-similar words from N-best candidate words by applying Levenstein distance measure. At first, word distance method based on phone and syllable distances are considered. These methods use just Levenstein distance on phones or double Levenstein distance algorithm on syllables of candidates. Next, word similarity approaches are presented that they use characters' position information of word candidates. Each character's position is labeled to inserted, deleted, and correct position after alignment between source and target string. The word similarities are obtained from characters' positional probabilities which mean the frequency ratio of the same characters' observations on the position. From experimental results, we can find that the proposed methods are effective for removing non-similar words without loss of system performance from the N-best recognition candidates of the systems.

  • PDF

Correlation of acoustic features and electrophysiological outcomes of stimuli at the level of auditory brainstem (자극음의 음향적 특성과 청각 뇌간에서의 전기생리학적 반응의 상관성)

  • Chun, Hyungi;Han, Woojae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.63-73
    • /
    • 2016
  • It is widely acknowledged that the human auditory system is organized tonotopically and people generally listen to sounds as a function of frequency distribution through the auditory system. However, it is still unclear how acoustic features of speech sounds are indicated to the human brain in terms of speech perception. Thus, the purpose of this study is to investigate whether two sounds with similar high-frequency characteristics in the acoustic analysis show similar results at the level of auditory brainstem. Thirty three young adults with normal hearing participated in the study. As stimuli, two Korean monosyllables (i.e., /ja/ and /cha/) and four frequencies of toneburst (i.e., 500, 1000, 2000, and 4000 Hz) were used to elicit the auditory brainstem response (ABR). Measures of monosyllable and toneburst were highly replicable and the wave V of waveform was detectable in all subjects. In the results of Pearson correlation analysis, the /ja/ syllable had a high correlation with 4000 Hz of toneburst which means that its acoustic characteristics (i.e., 3671~5384 Hz) showed the same results in the brainstem. However, the /cha/ syllable had a high correlation with 1000 and 2000 Hz of toneburst although it has acoustical distribution of 3362~5412 Hz. We concluded that there was disagreement between acoustic features and physiology outcomes at the auditory brainstem level. This finding suggests that an acoustical-perceptual mapping study is needed to scrutinize human speech perception.

A Study of Dysfluency Characteristics in Normal Adults and Children in Monologue (혼자말하기에서 정상 아동 및 성인의 비유창성 특성에 관한 연구)

  • Shin, Myung-Sun;Ahn, Jong-Bok;Nam, Hyun-Wook;Kwon, Do-Ha
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.49-57
    • /
    • 2005
  • The purpose of the this study was to establish preliminary data on the characteristic of dysfluency in monologue. The subjects were 30 normal speaking adults(15 males and 15 females), aged from 18 to 30 and 30 normal speaking children, aged from 8 to 10. This study attempted to sample 1-minute portion of talking about the daily routine. Videotapes were made to analyze his/her speech sample in terms of the patterns and the frequency of dysfluency. The result of the present study were a follows: (1) The children had total dysfluency type ratios of 12.48%, dysfluency type ratios of 2.83%. Interjection was the most frequently occuring, followed by revision, incomplete phrase. (2) The adults had total dysfluency type ratios of 8.51%, dysfluency type ratios of 0.59%. Interjection was the most frequently occuring, followed by revision, syllable repetition. (3) In adults, both total dysfluency type ratios and dysfluency type ratios differed significantly by the gender. (4) Both total dysfluency type ratios and dysfluency type ratios differed significantly between adults and children.

  • PDF

The Role of Post-lexical Intonational Patterns in Korean Word Segmentation

  • Kim, Sa-Hyang
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.37-62
    • /
    • 2007
  • The current study examines the role of post-lexical tonal patterns of a prosodic phrase in word segmentation. In a word spotting experiment, native Korean listeners were asked to spot a disyllabic or trisyllabic word from twelve syllable speech stream that was composed of three Accentual Phrases (AP). Words occurred with various post-lexical intonation patterns. The results showed that listeners spotted more words in phrase-initial than in phrase-medial position, suggesting that the AP-final H tone from the preceding AP helped listeners to segment the phrase-initial word in the target AP. Results also showed that listeners' error rates were significantly lower when words occurred with initial rising tonal pattern, which is the most frequent intonational pattern imposed upon multisyllabic words in Korean, than with non-rising patterns. This result was observed both in AP-initial and in AP-medial positions, regardless of the frequency and legality of overall AP tonal patterns. Tonal cues other than initial rising tone did not positively influence the error rate. These results not only indicate that rising tone in AP-initial and AP_final position is a reliable cue for word boundary detection for Korean listeners, but further suggest that phrasal intonation contours serve as a possible word boundary cue in languages without lexical prominence.

  • PDF

Distinct Segmental Implementations in English and Spanish Prosody

  • Lee, Joo-Kyeong
    • Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.199-206
    • /
    • 2004
  • This paper attempts to provide a substantial explanation of different prosodic implementations on segments in English and Spanish, arguing that the phonetic modification invoked by prosody may effectively reflect phonological structure. In English, a high front vowel in accented syllables is acoustically realized as higher F1 and F2 frequencies than in unaccented syllables, due to its more peripheral and sonorous articulation (Harrington et al. 1999). In this paper, an acoustic experiment was conducted to see if such a manner of segmental modification invoked by prosody in English extends to other languages such as Spanish. Results show that relatively more prominent syllables entail higher F1 values as a result of their more sonorous articulation in Spanish, but either front or back vowel does not show a higher F2 or a lower F2 frequency. This is interpreted as an indication that a prosodically prominent syllable entails its vocalic enhancement in both horizontal and vertical dimensions of articulation in English. In Spanish, however, only the vertical dimensional articulation is maximized, resulting in a higher F1. I suggest that this difference may be attributed to the different phonological structures of vowels in English and Spanish, and that sonority expansion alone would be sufficient in the articulation of prosodic prominence as long as the phonological distinction of vowels is well retained.

  • PDF

Perception of native Korean Speakers on English and German

  • Kang, Hyun-Sook;Koo, So-Ryeong;Lee, Sook-hyang
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.86-87
    • /
    • 2000
  • In this paper, we discuss why two different surface forms appear in loanwords for English and German /${\int}$/ In Korean, a vowel is inserted into loanwords if a consonant cannot be properly syllabified. Therefore, /${\int}$/ in some positions of loanwords trigger vowel insertion. Interestingly, /${\int}$/s in the onset cluster of English and German words were borrowed in Korean as Iful with the inserted vowel [u] whereas If Is in the coda position of English and German words were borrowed as Ifil with the inserted vowel [i]. For example, 'shrimp' is adopted as [${\int}urimphi$] whereas 'rush' is adopted as [$ra{\int}i$]. In this paper, we attempt to find out the phonetic reason for the distribution of the surface forms of /${\int}$/. We assume that since the formant frequency of [i] is higher than that of [u], the peak frequency of /${\int}$/ with the surface form of [${\int}$i] in loanwords may be higher than that of /${\int}$/ with the surface form of [${\int}u$]. We also assume that duration may be another factor for the distribution of [${\int}i$] and [${\int}u$]. Since /${\int}$/ and /u/ use lip rounding whereas /i/ doesn't, the duration for [${\int}i$] might be longer than that of [${\int}u$]. German supports our assumption. /${\int}$/ in the onset cluster is longer than /${\int}$/ in the coda position. It also has higher peak frequency than that of /${\int}$/ in the coda position. In loanwords, ${\int}$ in the onset cluster is borrowed as [${\int}u$] as in Spiegel whereas /${\int}$/ in the coda position is borrowed as [${\int}i$] as in Bosch. English, however, does not support our assumption. Peak frequency of [${\int}$] depends on the preceding vowel, not on its position in the syllable structure. If the preceding vowel is front, then the peak freuency of the following of the following /${\int}$/ is high but if the preceding vowel is back, than the peak frequency of the following /${\int}$/ is low. The peak frequency of /${\int}$/ in the onset cluster seems to be in between. As we assumed, however, the duration of /${\int}$/ in the coda position is longer than of /${\int}$/ in the onset cluster. With the mixed results, we question whether Koreans really hear two different xounds for /${\int}$/ in English words. For the future experiment, we would like to perform the perception tet for /${\int}$/ in English words.

  • PDF

Tonal Characteristics Based on Intonation Pattern of the Korean Emotion Words (감정단어 발화 시 억양 패턴을 반영한 멜로디 특성)

  • Yi, Soo Yon;Oh, Jeahyuk;Chong, Hyun Ju
    • Journal of Music and Human Behavior
    • /
    • v.13 no.2
    • /
    • pp.67-83
    • /
    • 2016
  • This study investigated the tonal characteristics in Korean emotion words by analyzing the pitch patterns transformed from word utterance. Participants were 30 women, ages 19-23. Each participant was instructed to talk about their emotional experiences using 4-syllable target words. A total of 180 utterances were analyzed in terms of the frequency of each syllable using the Praat. The data were transformed into meantones based on the semi-tone scale. When emotion words were used in the middle of a sentence, the pitch pattern was transformed to A3-A3-G3-G3 for '즐거워서(joyful)', C4-D4-B3-A3 for '행복해서(happy)', G3-A3-G3-G3 for '억울해서(resentful)', A3-A3-G3-A3 for '불안해서(anxious)', and C4-C4-A3-G3 for '침울해서(frustrated)'. When the emotion words were used at the end of a sentence, the pitch pattern was transformed to G4-G4-F4-F4 for '즐거워요(joyful)', D4-D4-A3-G3 for '행복해요(happy)', G3-G3-G3-A3 and F3-G3-E3-D3 for '억울해요(resentful)', A3-G3-F3-F3 for '불안해요(anxious)', and A3-A3-F3-F3 for '침울해요(frustrated)'. These results indicate the differences in pitch patterns depending on the conveyed emotions and the position of words in a sentence. This study presents the baseline data on the tonal characteristics of emotion words, thereby suggesting how pitch patterns could be utilized when creating a melody during songwriting for emotional expression.

Analysis of Acoustic Characteristics of Vowel and Consonants Production Study on Speech Proficiency in Esophageal Speech (식도발성의 숙련 정도에 따른 모음의 음향학적 특징과 자음 산출에 대한 연구)

  • Choi, Seong-Hee;Choi, Hong-Shik;Kim, Han-Soo;Lim, Sung-Eun;Lee, Sung-Eun;Pyo, Hwa-Young
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.7-27
    • /
    • 2003
  • Esophageal Speech uses the esophageal air during phonation. Fluent esophageal speakers frequently intake air in oral communication, but unskilled esophageal speakers are difficult with swallowing lots of air. The purpose of this study was to investigate the difference of acoustic characteristics of vowel and consonants production according to the speech proficiency level in esophageal speech. 13 normal male speakers and 13 male esophageal speakers (5 unskilled esophageal speakers, 8 skilled esophageal speakers) with age ranging from 50 to 70 years old. The stimuli were sustained /a/ vowel and 36 meaningless two syllable words. Used vowel is /a/ and consonants were 18 : /k, n, t, m, p, s, c, $C^{h},\;k^{h},\;t^{h},\;p^{h}$, h, I, k', t', p', s', c'/. Fundermental frequency (Fx), Jitter, shimmer, HNR, MPT were measured with by electroglottography using Lx speech studio (Laryngograph Ltd, London, UK). 36 meaningless words produced by esophageal speakers were presented to 3 speech-language pathologists who phonetically transcribed their responses. Fx, Jitter, HNR parameters is significant different between skilled esophageal speakers and unskilled esophageal speakers (P<.05). Considering manner of articulation, ANOVA showed that differences in two esophageal speech groups on speech proficiency were significant; Glide had the highest number of confusion with the other phoneme class, affricates are the most intelligible in the unskilled esophageal speech group, whereas in the skilled esophageal speech group fricatives resulted highest number of confusions, nasals are the most intelligible. In the place of articulation, glottal /h/ is the highest confusion consonant in both groups. Bilabials are the most intelligible in the skilled esophageal speech, velars are the most intelligible in the unskilled esophageal speech. In the structure of syllable, 'CV+V' is more confusion in the skilled esophageal group, unskilled esophageal speech group has similar confusion in both structures. In unskilled esophageal speech, significantly different Fx, Jitter, HNR acoustic parameters of vowel and the highest confusions of Liquid, Nasals consonants could be attributed to unstable, improper contact of neoglottis as vibratory source and insufficiency in the phonatory air supply, and higher motoric demand of remaining articulation due to morphological characteristics of vocal tract after laryngectomy.

  • PDF

Comparison of Acoustic Phonetic Characteristics of Korean Fricative Sounds Pronounced by Hearing-impaired Children and Normal Children (청각장애 아동과 일반 아동의 마찰음에 나타난 음향음성학적 특성 비교)

  • Kim, YunHa;Kim, Eunyeon;Jang, Seoung-Jin;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.73-79
    • /
    • 2014
  • Alveolar fricative sounds /s/ and /s'/ are learned last for normal children in the speech development process for Koreans. These are especially difficult to articulate for hearing-impaired children often causing articulation errors. The acoustic phonetic evaluation uses testing tools to provide indirect and object information. These objective resources can be compared with standardized resources on speech when interpreting the results of a test. However, most previous studies in Korea did not consider acoustic studies that used the spectrum moment values of hearing-impaired children. Therefore, this study was conducted to compare the characteristics of hearing-impaired children's pronunciation of fricative sounds using spectrum moment values. For this purpose, the study selected a total of 10 hearing-impaired children (5 boys and 5 girls) currently in 3rd or 5th grade and attending one of the elementary schools in Seoul or Gyeonggi-do. For the selection process, their age, type of hearing aid, implantation of hearing aid (CI) before two years of age, hearing capacity (dB) before and after wearing the hearing aid, duration of speech rehabilitation, and time of learning alveolar fricative sounds were all considered. Also, 10 normal children (5 boys and 5 girls) were selected among 3rd or 5th grade students attending one of the elementary schools in Seoul or Gyeonggi-do. The subjects were asked to read the carrier sentence, "I say _______," including a list of 12 meaningless syllables composed of CV and VCV syllables, including alveolar fricative sounds /s/ and /s'/ and vowels /a/, /i/, and /u/. The recorded resources were processed through the Time-frequency Analysis Software Program to measure M1 (mean), M2 (variance), M3 (skewness), and M4 (kurtosis) of the fricative noise. No significant differences were found when comparing spectrum threshold values in the acoustic phonetic characteristics of hearing-impaired children and normal children in alveolar fricative sound pronunciation according to vowels /a/, /i/, and /u/, alveolar fricative sounds /s/ and /s'/, and syllable structure (CV, VCV) other than, for M3 in the comparison of groups according to disability. In the comparison of syllable structures, there were statistically significant differences in M1, M2, M3, and M4 with clinical significance. However, there was no significant difference in results when comparing the alveolar fricative sounds according to the vowels.

The Inter- and Intra-specific Comparison of Stereotyped Songs in Sympatric Gray-headed Bunting (Emberiza fucata) and Siberian-Meadow Bunting (Emberiza cioides) (동소성 붉은 뺨멧새 ( Emberiza fucata ) 와 멧새 ( Emberiza cioides ) 의 Stereotyped Song 의 비교)

  • Kim, Kil-Won;Shi-Ryong Park
    • The Korean Journal of Ecology
    • /
    • v.16 no.3
    • /
    • pp.317-327
    • /
    • 1993
  • Stands profiles, yearly changes in growth of annual rings, age and diameter structure, and spatial distribution pattern of individuals in the Pinus densiflora stands around the Yeocheon industrial complex were investigated. Growth of annual ring in Pinus densiflora, which survived when vegetation of this area was damaged by air pollutants, was suppressed for about 10 years since 1974 when factories in this area began to operate, but since then such suppressed growth tended to be recovered. It was supposed that the suppresed growth was originated from air pollution and that improvement of growth since the suppressed period was due to the release from competition with them by death of neighbouring trees and the resuction of the amount of air pollutants. Physiognomy of Pinus densiflora stands showed mosaic pattern composed of different patches. Spatial distribution pattern of individuals an stand profiles were similar to those of Pinus densiflora stands regenerated after natural and artificial disturbances. In an age distribution diagram, age of Pinus densiflora population ranged from 1 to 33 years, Among these individuals were recrited corresponded to the suppresed period of growth of annual ring in Pinus densiflora survived when the vegetation was damaged by air pollution. On the other hand, from the result of analysis of frequency distribution diagram of diameter, it was postulated that even if whis Pinus densiflora community can be maintained as it is for the time being, it might be changed to Quercus community with the lapse of time. Regeneration; Pinus densiflora; Air pollution; Annual ring; Age structure; Diameter structure; Quercus spp. In these analyses, factors for individual recognition and species recognition were suggested.

  • PDF