• Title/Summary/Keyword: Speech sound

Search Result 628, Processing Time 0.023 seconds

Phonological Awareness in Hearing Impaired Children (청각장애아동의 음운인식능력에 대한 연구)

  • Park, Sang-Hee;Seok, Dong-Il;Jeong, Ok-Ran
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.193-202
    • /
    • 2002
  • The purpose of this study is to examine the phonological awareness of hearing impaired children. A number of researches indicate that hearing impaired children have articulation disorders due to their impaired auditory feedback. However, in children who have the ability to distinguish certain phonemes, they sometimes show misarticulation of the phonemes. Phonological awareness refers to recognizing the speech-sound units and their forms in spoken language (Hong, 2001). The subjects who participated in the experiment are composed of four hearing impaired children (3 cochlear implanted children and 1 hearing aided child). Phonological Awareness was evaluated by the test battery developed by Paik et al. (2001). The subtests consisted of rhyme matching, onset matching I II, word initial segmentation and matching I II. If the children asked for retelling, it was retold to a maximum of 4 times. Each item score was 1 point. The results were compared to those of Paik et al. (2001). The results of study were that subject 1 showed superior rhyme matching ability, subjects 2 and 3 fair ability, and subject 4 inferior ability. In onset matching I, all subjects showed inferior ability except for subject 3. Interestingly, subjects 1 showed the lowest onset matching I score. In word initial segmentation and matching I, subjects 1 and 4 showed inferior ability and subjects 2 and 3 showed fair ability. In onset matching II, subject 2 showed the perfect score 10 even though she showed very low score. In word initial segmentation and matching II, only subjects 2 and 3 showed appropriate levels of the skill. The results show that the phonological awareness of hearing impaired children is different from that of normal children.

  • PDF

Low delay window switching modified discrete cosine transform for speech and audio coder (음성 및 오디오 부호화기를 위한 저지연 윈도우 스위칭 modified discrete cosine transform)

  • Kim, Young-Joon;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.2
    • /
    • pp.110-117
    • /
    • 2018
  • In this paper, we propose a low delay window switching MDCT (Modified Discrete Cosine Transform) method for speech/audio coder. The window switching algorithm is used to reduce the degradation of sound quality in non-stationary trasient duration and to reduce the algorithm delay by using the low delay TDAC (Time Domain Aliasing Cancellation). While the conventional window switching algorithms uses overlap-add with different lengths, the proposed method uses the fixed overlap add length. It results the reduction of algorithm delay by half and 1 bit reduction in frame indication information by using 2 window types. We apply the proposed algorithm to G.729.1 based on MDCT in order to evaluate the performance. The propose method shows the reduction of algorithm delay by half while speech quality of the proposed method maintains same as the conventional method.

Optimization of the Kernel Size in CNN Noise Attenuator (CNN 잡음 감쇠기에서 커널 사이즈의 최적화)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.987-994
    • /
    • 2020
  • In this paper, we studied the effect of kernel size of CNN layer on performance in acoustic noise attenuators. This system uses a deep learning algorithm using a neural network adaptive prediction filter instead of using the existing adaptive filter. Speech is estimated from a single input speech signal containing noise using a 100-neuron, 16-filter CNN filter and an error back propagation algorithm. This is to use the quasi-periodic property in the voiced sound section of the voice signal. In this study, a simulation program using Tensorflow and Keras libraries was written and a simulation was performed to verify the performance of the noise attenuator for the kernel size. As a result of the simulation, when the kernel size is about 16, the MSE and MAE values are the smallest, and when the size is smaller or larger than 16, the MSE and MAE values increase. It can be seen that in the case of an speech signal, the features can be best captured when the kernel size is about 16.

Longitudinal music perception performance of postlingual deaf adults with cochlear implants using acoustic and/or electrical stimulation

  • Chang, Son A;Shin, Sujin;Kim, Sungkeong;Lee, Yeabitna;Lee, Eun Young;Kim, Hanee;Shin, You-Ree;Chun, Young-Myoung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.103-109
    • /
    • 2021
  • In this study, we investigated longitudinal music perception of adult cochlear implant (CI) users and how acoustic stimulation with CI affects their music performance. A total of 163 participants' data were analyzed retrospectively. 96 participants were using acoustic stimulation with CI and 67 participants were using electrical stimulation only via CI. The music performance (melody identification, appreciation, and satisfaction) data were collected pre-implantation, 1-year, and 2-year post-implantation. Mixed repeated measures of ANOVA and pairwise analysis adjusted by Tukey were used for the statistics. As result, in both groups, there were significant improvements in melody identification, music appreciation, and music satisfaction at 1-year, and 2-year post-implantation than a pre-implantation, but there was no significant difference between 1 and 2 years in any of the variables. Also, the group of acoustic stimulation with CI showed better perception skill of melody identification than the CI-only group. However, no differences found in music appreciation and satisfaction between the two groups, and possible explanations were discussed. In conclusion, acoustic and/or electrical hearing devices benefit the recipients in music performance over time. Although acoustic stimulation accompanied with electrical stimulation could benefit the recipients in terms of listening skills, those benefits may not extend to the subjective acceptance of music. These results suggest the need for improved sound processing mechanisms and music rehabilitation.

A preliminary study on laryngeal and supralaryngeal articulatory distinction of the three-way contrast of Korean velar stops

  • Jiyeon Song;Sahyang Kim;Taehong Cho
    • Phonetics and Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.19-24
    • /
    • 2023
  • This study investigated acoustic (VOT) and articulatory characteristics of Korean velar stops in monosyllabic CV structures to examine how the three-way distinction is realized in the laryngeal and supralaryngeal domains and how the distinction is manifested in male versus female speakers' speech production. EMA data were collected from 22 speakers. In line with previous studies, male speakers preserved the three-way differentiation of velar stops (/k*/</k/</kh/) in terms of VOT while female speakers showed only a two-way distinction (/k*/</k/=/kh/). As for the kinematic characteristics, a clear three-way distinction was found only in male speakers' peak velocity measure in the C-to-V opening movement (/kh/</k/</k*/). For the other kinematic measures (i.e., articulatory closure duration, deceleration duration of the opening movement and the entire opening movement duration), male speakers showed only a two-way distinction between fortis and the other two stops. Female speakers did not show a three-way contrast in any kinematic measure. They showed a two-way distinction between lenis and the other two stops in C-to-V deceleration duration (/k*/=/kh/</k/), and a two-way distinction between fortis and lenis stops in the opening movement duration. An overall comparison of VOT and articulatory analyses revealed that the lenis-aspirated kinematic distinction is diminishing, driven by female speakers, in line with the loss of the lenis-aspirated distinction in VOT that could influence supralaryngeal articulation.

MPEG-D USAC: Unified Speech and Audio Coding Technology (MPEG-D USAC: 통합 음성 오디오 부호화 기술)

  • Lee, Tae-Jin;Kang, Kyeong-Ok;Kim, Whan-Woo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.589-598
    • /
    • 2009
  • As mobile devices become multi-functional, and converge into a single platform, there is a strong need for a codec that is able to provide consistent quality for speech and music content MPEG-D USAC standardization activities started at the 82nd MPEG meeting with a CfP and approved WD3 at the 88th MPEG meeting. MPEG-D USAC is converged technology of AMR-WB+ and HE-AAC V2. Specifically, USAC utilizes three core codecs (AAC ACELP and TCX) for low frequency regions, SBR for high frequency regions and the MPEG Surround tool for stereo information. USAC can provide consistent sound quality for both speech and music content and can be applied to various applications such as multi-media download to mobile device Digital radio Mobile TV and audio books.

Performance Improvement of CPSP Based TDOA Estimation Using the Preemphasis (프리엠퍼시스를 이용한 CPSP 기반의 도달시간차이 추정 성능 개선)

  • Kwon, Hong-Seok;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.461-470
    • /
    • 2009
  • We investigate and analyze the problems encountered in frame-based estimation of TDOA (Time Difference of Arrival) using CPSP function. Spectral leakage occurring in framing of a speech signal by a rectangular window could make estimation of CPSP spectrum inaccurate. Framing with other windows to reduce the spectral leakage distorts the signal due to the asynchronous weighting around the frame specifically both ends of the frame. These problems degrade the performance of the CPSP-based TDOA estimation. In this paper, we propose a method to alleviate those problems by pre-emphasis of the speech signal. It reduces the influence of the spectral leakage by reducing dynamic range of the spectrum of a speech signal with pre-emphasis. To validate the proposed method of pre-emphasis, we carry out TDOA estimation experiments in various noise and reverberation conditions, Experimental results have shown that the framing of pre-emphasized microphone output by a rectangular window achieves higher success rate of TDOA estimation than any other framing methods.

Generalized cross correlation with phase transform sound source localization combined with steered response power method (조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정)

  • Kim, Young-Joon;Oh, Min-Jae;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.5
    • /
    • pp.345-352
    • /
    • 2017
  • We propose a methods which is reducing direction estimation error of sound source in the reverberant and noisy environments. The proposed algorithm divides speech signal into voice and unvoice using VAD. We estimate the direction of source when current frame is voiced. TDOA (Time-Difference of Arrival) between microphone array using the GCC-PHAT (Generalized Cross Correlation with Phase Transform) method will be estimated in that frame. Then, we compare the peak value of cross-correlation of two signals applied to estimated time-delay with other time-delay in time-table in order to improve the accuracy of source location. If the angle of current frame is far different from before and after frame in successive voiced frame, the angle of current frame is replaced with mean value of the estimated angle in before and after frames.

Source Localization Based on Independent Doublet Array (독립적인 센서쌍 배열에 기반한 음원 위치추정 기법)

  • Choi, Young Doo;Lee, Ho Jin;Yoon, Kyung Sik;Lee, Kyun Kyung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.164-170
    • /
    • 2014
  • A single near-field sounde source bearing and ranging method based on a independent doublet array is proposed. In the common case of bearing estimation method, unform linear array or uniform circular array are used. It is constrained retaining aperture because of array structure to estimate the distance of the sound source. Recent using independent doublet array sound source's bearing and distance esmtimation method is proposed by wide aperture. It is limited to the case doublets are located on a straight line. In this paper, we generalize the case and estimate the localization of a sound source in the various array structure. The proposed algorithm was verified performance through simulation.

A Phonetic Study og German (2) (독어음의 음성학적 고찰(2) - 현대독어의 복모음에 관하여 -)

  • Yun Jong-sun
    • MALSORI
    • /
    • no.19_20
    • /
    • pp.33-42
    • /
    • 1990
  • Those who are interested in the German diphthongs wil1 find that they are classified into three kinds of forms in accordance with their gliding directions: closing, centring and rising. The German [aI], for example, which derives its origin from [i:] of the riddle high German. Is regarded as a distinctive feature that distinguishes the new high German from the middle high German. The diphthong [aI] is cal led fall ing one, because the sonority of the sound undergoes a diminution as the articulation proceeds. The end part of the diphthong [aI] is less sonorous than the beginning part. In most of the German diphthongs the diminution of prominence is caused by the fact that the end part is inherently less sonorous than the beginning. This applies to the other c los Ing and centring diphthongs. This way of diminution of sonority exerts influence on methods of constructing systems of phonetic notation. The above mentioned less sonorous end part of diphthong [I] shows that it differs from some analogous sound in another context. It is useful to demonstrate the occurrence of particular allophones by introducing special symbols to denote them (here: at→ae). Forms of transcription embodying extra symbol s are cal led narrow. But since strict adherence to the principle 'one sound one symbol' would involve the introduction of a large number of symbols, this would render phonetic transcriptions cumbrous and difficult to read. A broad style of transcription provides 'one symbol for each phoneme' of the language that is transcribed. Phonemic transcriptions are simple and unambiguous to everyone who knows the principles governing the use of allophones in the language transcribed. Among those German ways of transcriptions of diphthongs ( a?, a?, ??: ae, ao, ?ø; ae, ao, ?ø) the phonemic (broad) transcription is general Iy to be recommended, for Instance, in teaching the pronunciation of a foreign language, since it combines accuracy with the greatest measure of simplicity (Some passages and terms from Daniel Jones) .

  • PDF