• Title/Summary/Keyword: acoustic features

Search Result 328, Processing Time 0.025 seconds

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.

Comparison of acoustic features due to the Lombard effect in typically developing children and adults (롬바르드 효과가 아동과 성인의 말소리 산출에 미치는 영향: 음향학적 특성과 모음공간면적을 중심으로)

  • Yelim Jang;Jaehee Hwang;Nuri Lee;Nakyung Lee;Seeun Eum;Youngmee Lee
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.19-27
    • /
    • 2024
  • The Lombard effect is an involuntary response to speakers' experiences in the presence of noise during voice communication. This study aimed to investigate the Lombard effect by comparing the acoustic features of children and adults under different listening conditions. Twelve male children (5-9 years old) and 12 young adult men (24-35 years old) were recruited to produce speech under three different listening conditions (quiet, noise-55 dB, noise-70 dB). Acoustic analyses were then carried out to characterize their acoustic features, such as F0, intensity, duration, and vowel space area, under the three listening conditions. A Lombard effect was observed in the intensity and duration for children and adults who participated in this study under adverse listening conditions. However, we did not observe a Lombard effect in the F0 and vowel space areas of either group. These findings suggest that children can adjust their speech production in challenging listening conditions as much as adults.

Post-Affricate Phonatory Processes in Korean and English: Acoustic Correlates and Implications for Phonological Analysis

  • Ahn, Hyun-Kee
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.137-148
    • /
    • 2002
  • This study investigates phonation modes of vowels following the affricate consonants in Korean and English- -tense affricate /c'/, lenis affricate /c/, and aspirated affricate /$c^{h}$/ for Korean; voiced affricate /$\check{J}$/ and aspirated affricate /c/ for English. The investigation makes significant use of the H1*-H2* measure (a normalized amplitude difference between the first and second harmonics) to provide acoustic correlates of the phonation types. The major findings for English are that the H1*-H2* measure at the vowel onset was significantly larger in post-aspirated position than in post-voiced position. The Korean data showed the H1*-H2* measure at the vowel onset to be significantly higher in the post-aspirated class than in the post-tense class. On the other hand, the Fo values for the post-lenis vowels were significantly lower than those of the other two classes during the first half of the vowel. Based on the phonetic results, this study argues for the need to incorporate the [stiff vocal folds] and [slack vocal folds] features into the phonological treatments of Korean affricates, while maintaining the two features [constricted glottis] and [spread glottis].

  • PDF

Diagnostic Imaging Features of Abdominal Foreign Body in Dogs; Retained Surgical Gauze (개에서 복강내 잔존한 거즈 이물의 진단영상)

  • Choi, Ji-Hye;Kim, Gye-Dong;Keh, Seo-Yeun;Jang, Jae-Yong;Choi, Hee-Yeon;Yoon, Jung-Hee
    • Journal of Veterinary Clinics
    • /
    • v.28 no.1
    • /
    • pp.94-100
    • /
    • 2011
  • This study was performed to describe the radiographic and ultrasonographic features of retained surgical gauze known as gossypiboma in 9 dogs. Female dogs (n = 8) were at higher risk and seven out of the eight cases had a history of ovariohysterectomy. Seven dogs were symptomatic and the most common clinical signs were vomiting, anorexia, and inertia. A palpable abdominal mass was detected in six dogs. Radiographic signs included a localized abdominal mass with soft tissue density (n = 7) or a mass containing speckled gas (n = 1). Ultrasonography showed a hypoechoic mass with a hyperechoic center (n = 4), or a homogeneous hypoechoic mass (n = 3). The remaining dogs (n = 2) showed an intestinal wall surrounding a hyperechoic center. Regardless of the characteristics of a mass, an acoustic shadowing was accompanied from the center of a mass in all dogs. Ultrasonography also revealed complications such as adhesion between a mass and adjacent organs, and peritonitis and intestinal obstruction around a mass. The gossypiboma can be considered when a hypoechoic mass accompanying a hyperechoic center with acoustic shadowing is observed on ultrasound examination.

Noise Effects on Foreign Language Learning (소음이 외국어 학습에 미치는 영향)

  • Lim, Eun-Su;Kim, Hyun-Gi;Kim, Byung-Sam;Kim, Jong-Kyo
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.197-217
    • /
    • 1999
  • In a noisy class, the acoustic-phonetic features of the teacher and the perceptual features of learners are changed comparison with a quiet environment. Acoustical analyses were carried out on a set of French monosyllables consisting of 17 consonants and three vowel /a, e, i/, produced by 1 male speaker talking in quiet and in 50, 60 and 70 dB SPL of masking noise on headphone. The results of the acoustic analyses showed consistent differences in energy and formant center frequency amplitude of consonants and vowels, $F_1$ frequency of vowel and duration of voiceless stops suggesting the increase of vocal effort. The perceptual experiments in which 18 undergraduate female students learning French served as the subjects, were conducted in quiet and in 50, 60 dB of masking noise. The identification scores on consonants were higher in Lombard speech than in normal speech, suggesting that the speaker's vocal effort is useful to overcome the masking effect of noise. And, with increased noise level, the perceptual response to the French consonants given had a tendency to be complex and the subjective reaction score on the noise using the vocabulary representative of 'unpleasant' sensation to be higher. And, in the point of view on the L2(second language) acquisition, the influence of L1 (first language) on L2 examined in the perceptual result supports the interference theory.

  • PDF

Deep neural network based seafloor sediment mapping using bathymetric features of MBES multifrequency

  • Khomsin;Mukhtasor;Suntoyo;Danar Guruh Pratomo
    • Ocean Systems Engineering
    • /
    • v.14 no.2
    • /
    • pp.101-114
    • /
    • 2024
  • Seafloor sediment mapping is an essential research topic in shallow coastal waters, especially in port development, benthic habitat mapping, and underwater communications. The seafloor sediments can be interpreted by collecting sediment samples directly in the field using a grab sampler or corer. Another method is optical, especially using underwater cameras and videos. Both methods each have weaknesses in terms of area coverage (mechanic) and accurate positioning (optic). The latest technology used to overcome it is the acoustic method (echosounder) with Global Navigation Satellite System (GNSS) Real Time Kinematic (RTK) positioning. Therefore, in this study will propose the classification of seafloor sediments in coastal waters using acoustic method that is Multibeam Echosounder (MBES) multi-frequency with five frequency (200 kHz, 250 kHz, 300 kHz, 350 kHz, and 400 kHz). In this study, the deep neural network (DNN) used the bathymetric multi frequency, bathymetric difference inters frequencies, and bathymetric features from 5 (five) frequencies as input layer and 4 (four) sediment types in 74 (seventy-four) sample sediment as output layer to make a seafloor sediment map. Results of sediment mapping using the DNN method show an overall accuracy of 71.6% (significant) and a kappa coefficient of 0.59 (moderate). The distribution of seafloor sediment in the study area is mainly silt (41.6%), followed by clayey sand (36.6%), sandy silt (14.2%), and silty sand (7.5%).

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

Speech Feature based Double-talk Detector for Acoustic Echo Cancellation (반향제거를 위한 음성특징 기반의 동시통화 검출 기법)

  • Park, Jun-Eun;Lee, Yoon-Jae;Kim, Ki-Hyeon;Ko, Han-Seok
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.132-139
    • /
    • 2009
  • In this paper, a speech feature based double-talk detector method is proposed for an acoustic echo cancellation in hands-free communication system. The double-talk detector is an important element, since it controls the update of the adaptive filter for an acoustic echo cancellation. In previous research, the double talk detector is considered in the signal processing stage without taking the speech characteristics into account. However, in the proposed method, speech features which are used for the speech recognition is used for the discriminative features between the far-end and near-end speech. We obtained a substantial improvement over the previous double-talk detector methods using the only signal in time domain.

  • PDF

Bearing Multi-Faults Detection of an Induction Motor using Acoustic Emission Signals and Texture Analysis (음향 방출 신호와 질감 분석을 이용한 유도전동기의 베어링 복합 결함 검출)

  • Jang, Won-Chul;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.55-62
    • /
    • 2014
  • This paper proposes a fault detection method utilizing converted images of acoustic emission signals and texture analysis for identifying bearing's multi-faults which frequently occur in an induction motor. The proposed method analyzes three texture features from the converted images of multi-faults: multi-faults image's entropy, homogeneity, and energy. These extracted features are then used as inputs of a fuzzy-ARTMAP to identify each multi-fault including outer-inner, inner-roller, and outer-roller. The experimental results using ten times trials indicate that the proposed method achieves 100% accuracy in the fault classification.

A Study on Automatic Phoneme Segmentation of Continuous Speech Using Acoustic and Phonetic Information (음향 및 음소 정보를 이용한 연속제의 자동 음소 분할에 대한 연구)

  • 박은영;김상훈;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.4-10
    • /
    • 2000
  • The work presented in this paper is about a postprocessor, which improves the performance of automatic speech segmentation system by correcting the phoneme boundary errors. We propose a postprocessor that reduces the range of errors in the auto labeled results that are ready to be used directly as synthesis unit. Starting from a baseline automatic segmentation system, our proposed postprocessor trains the features of hand labeled results using multi-layer perceptron(MLP) algorithm. Then, the auto labeled result combined with MLP postprocessor determines the new phoneme boundary. The details are as following. First, we select the feature sets of speech, based on the acoustic phonetic knowledge. And then we have adopted the MLP as pattern classifier because of its excellent nonlinear discrimination capability. Moreover, it is easy for MLP to reflect fully the various types of acoustic features appearing at the phoneme boundaries within a short time. At the last procedure, an appropriate feature set analyzed about each phonetic event is applied to our proposed postprocessor to compensate the phoneme boundary error. For phonetically rich sentences data, we have achieved 19.9 % improvement for the frame accuracy, comparing with the performance of plain automatic labeling system. Also, we could reduce the absolute error rate about 28.6%.

  • PDF