• Title/Summary/Keyword: acoustic features

Search Result 328, Processing Time 0.027 seconds

A Study of Segmental and Syllabic Intervals of Canonical Babbling and Early Speech

  • Chen, Xiaoxiang;Xiao, Yunnan
    • Cross-Cultural Studies
    • /
    • v.28
    • /
    • pp.115-139
    • /
    • 2012
  • Interval or duration of segments, syllables, words and phrases is an important acoustic feature which influences the naturalness of speech. A number of cross-sectional studies regarding acoustic characteristics of children's speech development found that intervals of segments, syllables, words and phrases tend to change with the growing age. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991), it has been supported by quite a number of researches on the basis of cross-sectional studies (Tingley & Allen,1975; Kent & Forner,1980; Chermak & Schneiderman, 1986), but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). Researchers seem to come up with conflicting postulations and inconsistent results about the change trends concerning intervals of segments, syllables, words and phrases, leaving it as an issue unresolved. Most acoustic investigations of children's speech production have been conducted via cross-sectional designs, which involves studying several groups of children. So far, there are only a few longitudinal studies. This issue needs more longitudinal investigations; moreover, the acoustic measures of the intervals of child speech are hardly available. All former studies focus on word stages excluding the babbling stages especially the canonical babbling stage, but we need to find out when concrete changes of intervals begin to occur and what causes the changes. Therefore, we conducted an acoustic study of interval characteristics of segments and words concerning Canonical Babble ( CB) and early speech in an infant aged from 0;9 to 2;4 acquiring Mandarin Chinese. The current research addresses the following two questions: 1. Whether decreases in interval would be greater when children were younger and smaller when they were older or vice versa? 2. Whether the child speech concerning the acoustic features of interval drifts in the direction of the language they are exposed to? The female infant whose L1 was Southern Mandarin living in Changsha was audio- and video-taped at her home for about one hour almost on a weekly basis during her age range from 0;9 to 2;4 under natural observation by us investigators. The recordings were digitized. Parts of the digitized material were labeled. All the repetitions were excluded. The utterances were extracted from 44 sessions ranging from 30 minutes to one hour. The utterances were divided into segments as well as syllable-sized units. Age stages are 0;9-1;0,1;1-1;5, 1;6-2;0, 2;1-2;4. The subject was a monolingual normal child from parents with a good education. The infant was audio-and video-taped in her home almost every week. The data were digitized, segments and syllables from 44 sessions spanning the transition from babble to speech were transcribed in narrow IPA and coded for analysis. Babble was coded from age 0;9-1;0, and words were coded from 1;0 to 2;4, the data has been checked by two professionally trained persons who majored in phonetics. The present investigation is a longitudinal analysis of some temporal characteristics of the child speech during the age periods of 0;9-1;0, 1;1-1;5, 1;6-2;0, 2;1-2;4. The answer to Research Question 1 is that our results are in agreement with neither of the hypotheses. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991); but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). On the whole, there is a tendency of decrease in segmental and syllabic duration with the growing age, but the changes are not drastic and abrupt. For example, /a/ after /k/ in Table 1 has greater decrease during 1;1-1;5, while /a/ after /p/, /t/ and /w/ has greater decrease during 2;1-2;4. /ka/ has greater decrease during 1;1-1;5, while /ta/ and /na/ has greater decrease during 2;1-2;4.Across the age periods, interval change experiences lots of fluctuation all the time. The answer to Research Question 2 is yes. Babbling stage is a period in which the children's acoustic features of intervals of segments, syllables, words and phrases is shifted in the direction of the language to be learned, babbling and children's speech emergence is greatly influenced by ambient language. The phonetic changes in terms of duration would go on until as late as 10-12 years of age before reaching adult-like levels. Definitely, with the increase of exposure to ambient language, the variation would be less and less until they attain the adult-like competence. Via the analysis of the SPSS 15.0, the decrease of segmental and syllabic intervals across the four age periods proves to be of no significant difference (p>0.05). It means that the change of segmental and syllabic intervals is continuous. It reveals that the process of child speech development is gradual and cumulative.

English Phoneme Recognition using Segmental-Feature HMM (분절 특징 HMM을 이용한 영어 음소 인식)

  • Yun, Young-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.167-179
    • /
    • 2002
  • In this paper, we propose a new acoustic model for characterizing segmental features and an algorithm based upon a general framework of hidden Markov models (HMMs) in order to compensate the weakness of HMM assumptions. The segmental features are represented as a trajectory of observed vector sequences by a polynomial regression function because the single frame feature cannot represent the temporal dynamics of speech signals effectively. To apply the segmental features to pattern classification, we adopted segmental HMM(SHMM) which is known as the effective method to represent the trend of speech signals. SHMM separates observation probability of the given state into extra- and intra-segmental variations that show the long-term and short-term variabilities, respectively. To consider the segmental characteristics in acoustic model, we present segmental-feature HMM(SFHMM) by modifying the SHMM. The SFHMM therefore represents the external- and internal-variation as the observation probability of the trajectory in a given state and trajectory estimation error for the given segment, respectively. We conducted several experiments on the TIMIT database to establish the effectiveness of the proposed method and the characteristics of the segmental features. From the experimental results, we conclude that the proposed method is valuable, if its number of parameters is greater than that of conventional HMM, in the flexible and informative feature representation and the performance improvement.

The Acoustic Analysis of Korean Read Speech - with respect to the prosodic phrasing - (한국어 낭독체 문장의 음향분석 -바람과 햇님의 운율구 생성을 중심으로-)

  • Sung Chuljae
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.157-172
    • /
    • 1996
  • This study aims to suggest some theoretical methodology for analysis of the prosodic patterns in Korean Read Speech. The engineering effort relevant to the phonetic study has focused to the importance of prosodic phrasing which may play a major role in analyzing the phonetic DB. Before establishing the prosodic phrase as the prosodic unit, we should describe the features of the boundary signal in a target sentence. With this in mind, the general characteristics of Read Speech and the ToBI(tones and Break Indices), which has been currently in vogue with respect to the prosodic labelling system were presented as the first step. The concrete analysis was carried out with the fable 'North Wind and the Sun' Korean version, where about 25 prosodic units were discriminated by perceptual approach for 5 subjects. Establishing various informations which can be used for deciding a boundary position systematically, we can proceed to the next, viz. acoustic analysis of prosodic unit. The most important which we primarily study for improving the naturalness of synthetic speech may be, at first, detecting the boundary signals in the speech file and accordingly reestablishment it within the raw text.

  • PDF

A Study of Korean Phonetic and Phonological Properties for Speech Recognition and Synthesis (음성 인식/합성을 위한 국어의 음성-음운론적 특성 연구)

  • Chung, Kook;Koo, Hee-San;Lee, Chan-Do;Kim, Jong-Mi;Han , Sun-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.31-44
    • /
    • 1994
  • The paper introduces several studies of various aspects of Korean phonology and phonetics for speech recognition and synthesis. The phonological and phonetic studies presented in this paper are : i) For a study of segmental phonology, we made an annotated list of Korean allophones and their corresponding alphabetic symbols to type into computers. ii) For a study of segmental phonetics, we present some acoustic regulations in Korean consonants according to their phonological environment within a word. iii) For a study of prosodic phonology, we suggest the phonological functions of prosodic features and their acoustic cues. iv) For a study of prosodic phonetics, we present the characteristic patterns of accent and intonation in Korean. v) Finally, we suggest some ways of using this phonological and phonetic knowledge for possible improvement of speech recognition and synthesis.

  • PDF

Fracture Analysis of Plasma Spray Coating by Classification of AE Signals (AE파형분류에 의한 용사코팅재의 파손해석)

  • Kim, G.S.;Park, K.S.;Hong, Y.U.
    • Journal of Power System Engineering
    • /
    • v.6 no.3
    • /
    • pp.24-30
    • /
    • 2002
  • The deformation and fracture behaviors of both Al2O3 and Ni 4.5wt.%Al plasma thermal spray coating were investigated by an acoustic emission method. Plasma thermal spray coating is formed by a process in which melted particles flying with high speed towards substrate, then crash and spread on the substrate surface cooled and solidified in a very short time, stacking of the particles makes coating. A tensile test is conducted on notch specimens in a stress range below the elastic limit of substrate. A bendind test is done on smooth specimens. The waveforms of AE generated from the both test coating specimens can be classified by FFT analysis into two types which low frequency(type I) and high frequency(type II). The type I waveform is considered to corresponds exfoliation of coating layers and type II waveform corresponds the plastic deformation of notch tip. The fracture of the coating layers can estimate by AE event and amplitude, because AE features increase when the deformation generates.

  • PDF

A Study on Fracture Behaviors of Single-Edge-Notched Glass Fiber/Aluminum Laminates Using Acoustic Emission (음향방출법을 이용한 편측노치를 갖는 유리섬유/알루미늄 적층판의 파괴거동 해석)

  • Woo Sung-Choong;Choi Nak-Sam
    • Composites Research
    • /
    • v.18 no.2
    • /
    • pp.1-12
    • /
    • 2005
  • Fracture behaviors of single-edge-notched monolithic aluminum plates and glass fiber/aluminum laminates under tensile loadings have been studied using acoustic emission(AE) monitoring. AE signals from monolithic aluminum could beclassified into two different types. For glass fiber/aluminum laminates, AE signals with high amplitude and long duration were additionally confirmed on FFT frequency analysis, which corresponded to macrocrack propagation and/or delamination. AE source location determined by signal arrival time showed the zone of fracture. On the basis of the above AE analysis and fracture observation, characteristic features of fracture processes of single-edge-notched glass fiber/aluminum laminates were elucidated according to different fiber ply orientations and fiber/aluminum lay-up ratios.

Acoustic Identification of Six Fish Species using an Artificial Neural Network (인공 신경망에 의한 6개 어종의 음향학적 식별)

  • Lee, Dae-Jae
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.49 no.2
    • /
    • pp.224-233
    • /
    • 2016
  • The objective of this study was to develop an artificial neural network (ANN) model for the acoustic identification of commercially important fish species in Korea. A broadband echo acquisition and processing system operating over the frequency range of 85-225 kHz was used to collect and process species-specific, time-frequency feature images from six fish species: black rockfish Sebastes schlegeli, black scraper Thamnaconus modesutus [K], chub mackerel Scomber japonicus, goldeye rockfish Sebastes thompsoni, konoshiro gizzard shad Konosirus punctatus and large yellow croaker Larimichthys crocea. An ANN classifier was developed to identify fish species acoustically on the basis of only 100 dimension time-frequency features extracted by the principal components analysis (PCA). The overall mean identification rate for the six fish species was 88.5%, with individual identification rates of 76.6% for black rockfish, 82.8% for black scraper, 93.8% for chub mackerel, 90.6% for goldeye rockfish, 96.9% for konoshiro gizzard shad and 90.6% for large yellow croaker, respectively. These results demonstrate that individual live fish in well-controlled environments can be identified accurately by the proposed ANN model.

ETRI small-sized dialog style TTS system (ETRI 소용량 대화체 음성합성시스템)

  • Kim, Jong-Jin;Kim, Jeong-Se;Kim, Sang-Hun;Park, Jun;Lee, Yun-Keun;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.217-220
    • /
    • 2007
  • This study outlines a small-sized dialog style ETRI Korean TTS system which applies a HMM based speech synthesis techniques. In order to build the VoiceFont, dialog-style 500 sentences were used in training HMM. And the context information about phonemes, syllables, words, phrases and sentence were extracted fully automatically to build context-dependent HMM. In training the acoustic model, acoustic features such as Mel-cepstrums, logF0 and its delta, delta-delta were used. The size of the VoiceFont which was built through the training is 0.93Mb. The developed HMM-based TTS system were installed on the ARM720T processor which operates 60MHz clocks/second. To reduce computation time, the MLSA inverse filtering module is implemented with Assembly language. The speed of the fully implemented system is the 1.73 times faster than real time.

  • PDF

An Experimental Phonetic study of Perception of native Korean speakers on English and German $/\int/$ (한국인의 외국어 $/\int/$음에 대한 실험음성학적 연구)

  • Lee Sook-hyang;Kang Hyunsook
    • MALSORI
    • /
    • no.40
    • /
    • pp.1-12
    • /
    • 2000
  • This paper investigated how $/\int/$ in English and German is perceived and interpreted in the loanwords in Korean. $/\int/$ in these languages does not show one-to-one correspondence in Korean: $/\int/$ in the coda position in English and German is perceived as [swi] in Korean while $/\int/$ in the onset position is perceived as [syu]. This paper examined phonetic characteristics of $/\int/$ in English and German through its acoustic analysis and attempted to figure out which factor could explain this surface distribution of [swi] and [syu]; phonological (onset vs. coda) or phonetic (coarticulation) factor. Two acoustic features of $/\int/$ in English and German were examined: duration and energy Peak frequency of the frication noise. German $/\int/$ Perceived as [swi] in Korean showed higher energy Peak frequency and longer duration than that perceived as [syu] in Korean. English iii perceived as [swi] also showed longer duration than that Perceived as [syu] in Korean but energy Peak frequency showed different behavior. English $/\int/$ showed coarticulation with the preceding vowel rather than being affected by its position in the syllable in English. This paper concludes that 1)Phonetic characteristics used are duration and energy Peak frequency of its frication noise when $/\int/$ in English and German are adopted in Korean, 2)duration is used prior to energy peak frequency, which can be used as an enhancing feature.

  • PDF

ACOUSTIC FEATURES DIFFERENTIATING KOREAN MEDIAL LAX AND TENSE STOPS

  • Shin, Ji-Hye
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.53-69
    • /
    • 1996
  • Much research has been done on the rues differentiating the three Korean stops in word initial position. This paper focuses on a more neglected area: the acoustic cues differentiating the medial tense and lax unaspirated stops. Eight adult Korean native speakers, four males and four females, pronounced sixteen minimal pairs containing the two series of medial stops with different preceding vowel qualities. The average duration of vowels before lax stops is 31 msec longer than before their tense counterparts (70 msec for lax vs 39 msec for tense). In addition, the average duration of the stop closure of tense stops is 135 msec longer than that of lax stops (69 msec for lax vs 204msec for tense). THESE DURATIONAL DIFFERENCES ARE 50 LARGE THAT THEY MAY BE PHONOLOGICALLY DETERMINED, NOT PHONETICALLY. Moreover, vowel duration varies with the speaker's sex. Female speakers have 5 msec shorter vowel duration before both stops. The quality of voicing, tense or lax, is also a cue to these two stop types, as it is in initial position, but the relative duration of the stops appears to be much more important cues. The duration of stops changes the stop perception while that of preceding vowel does not. The consequences of these results for the phonological description of Korean as well as the synthesis and automatic recognition of Korean will be discussed.

  • PDF