Search | Korea Science

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
- Phonetics and Speech Sciences
- /
- v.15 no.2
- /
- pp.53-59
- /
- 2023
Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.
https://doi.org/10.13064/KSSS.2023.15.2.053 인용 PDF

The Effect of Auditory Condition on Voice Parameter of Orofacial Pain Patient (청각 환경이 구강안면 통증환자의 음성 파라미터에 미치는 영향)

Lee, Ju-Young;Baek, Kwang-Hyun;Hong, Jung-Pyo
- Journal of Oral Medicine and Pain
- /
- v.30 no.4
- /
- pp.427-432
- /
- 2005
This study have been compared and analyzed voice parameter under the condition of normal voice and auditory condition(noise and music) for 29 patients of orofacial pain and 31 normal people to investigate voice feature and vocal variation for auditory condition of orofacial pain patient. 1. Compared to normal voice, orofacial pain patient showed lower and unstable voice feature which has low F0 rate and high jitter and shimmer rate. 2. Voice of orofacial pain patient showed more relaxed and stable voice feature with low F0 and shimmer rate in the music condition than noise condition. 3. Normal people's voice has no significant difference between music and noise condition even though it has high F0 rate under the noise condition. As a result, orofacial pain patient showed difference of feature and different response for external auditory condition compared to normal voice. Providing of positive emotional environment such as music could be considered for better outcome of oral facial pain patient's functional disability.
PDF KSCI

Bird sounds classification by combining PNCC and robust Mel-log filter bank features (PNCC와 robust Mel-log filter bank 특징을 결합한 조류 울음소리 분류)

Badi, Alzahra;Ko, Kyungdeuk;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.1
- /
- pp.39-46
- /
- 2019
In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.
https://doi.org/10.7776/ASK.2019.38.1.039 인용 PDF KSCI HTML

Acoustic Characteristics of Gas-related Structures in the Upper Sedimentary Layer of the Ulleung Basin, East Sea (동해 울릉분지 퇴적층 상부에 존재하는 가스관련 퇴적구조의 음향 특성연구)

Park, Hyun-Tak;Yoo, Dong-Geun;Han, Hyuk-Soo;Lee, Jeong-Min;Park, Soo-Chul
- Economic and Environmental Geology
- /
- v.45 no.5
- /
- pp.513-523
- /
- 2012
The upper sedimentary layer of the Ulleung Basin in the East Sea shows stacked mass-flow deposits such as slide/slump deposits in the upper slope, debris-flow deposits in the middle and lower slope, and turbidites in the basin plain. Shallow gases or gas hydrates are also reported in many area of the Ulleung Basin, which are very important in terms of marine resources, environmental changes, and geohazard. This paper aims at studying acoustic characteristics and distribution pattern of gas-related structures such as acoustic column, enhanced reflector, dome structure, pockmark, and gas seepage in the upper sedimentary layer, by analysing high-resolution chirp profiles. Acoustic column shows a transparent pillar shape in the sedimentary layer and mainly occurs in the basin plain. Enhanced reflector is characterized by an increased amplitude and laterally extended to several tens up kilometers. Dome structure is characterized by an upward convex feature at the seabed, and mainly occurs in the lower slope. The pockmark shows a small crater-like feature and usually occurs in the middle and lower slope. Gas seepage is commonly found in the middle slope of the southern Ulleung Basin. These gas-related structures seem to be mainly caused by gas migration and escape in the sedimentary layer. The distribution pattern of the gas-related structures indicates that formation of these structures in the Ulleung Basin is controlled not only by sedimentary facies in upper sedimentary layer but also by gas-solubility changes depending on water depth. Especially, it is interpreted that the chaotic and discontinuous sedimentary structures of debris-flow deposits cause the facilitation of gas migration, whereas the continuous sedimentary layers of turbidites restrict the vertical migration of gases.
https://doi.org/10.9719/EEG.2012.45.5.513 인용 PDF KSCI

A New Method for Segmenting Speech Signal by Frame Averaging Algorithm

Byambajav D.;Kang Chul-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.4E
- /
- pp.128-131
- /
- 2005
A new algorithm for speech signal segmentation is proposed. This algorithm is based on finding successive similar frames belonging to a segment and represents it by an average spectrum. The speech signal is a slowly time varying signal in the sense that, when examined over a sufficiently short period of time (between 10 and 100 ms), its characteristics are fairly stationary. Generally this approach is based on finding these fairly stationary periods. Advantages of the. algorithm are accurate border decision of segments and simple computation. The automatic segmentations using frame averaging show as much as $82.20\%$ coincided with manually verified segmentation of CMU ARCTIC corpus within time range 16 ms. More than $90\%$ segment boundaries are coincided within a range of 32 ms. Also it can be combined with many types of automatic segmentations (HMM based, acoustic cues or feature based etc.).
PDF KSCI

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

Bae Hyojoon;Jung Sungyun;Bae Keunsung
- MALSORI
- /
- no.52
- /
- pp.111-120
- /
- 2004
This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.
PDF

An Emotion Recognition and Expression Method using Facial Image and Speech Signal (음성 신호와 얼굴 표정을 이용한 감정인식 몇 표현 기법)

Ju, Jong-Tae;Mun, Byeong-Hyeon;Seo, Sang-Uk;Jang, In-Hun;Sim, Gwi-Bo
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2007.04a
- /
- pp.333-336
- /
- 2007
본 논문에서는 감정인식 분야에서 가장 많이 사용되어지는 음성신호와 얼굴영상을 가지고 4개의(기쁨, 슬픔, 화남, 놀람) 감정으로 인식하고 각각 얻어진 감정인식 결과를 Multi modal 기법을 이용해서 이들의 감정을 융합한다. 이를 위해 얼굴영상을 이용한 감정인식에서는 주성분 분석(Principal Component Analysis)법을 이용해 특징벡터를 추출하고, 음성신호는 언어적 특성을 배재한 acoustic feature를 사용하였으며 이와 같이 추출된 특징들을 각각 신경망에 적용시켜 감정별로 패턴을 분류하였고, 인식된 결과는 감정표현 시스템에 작용하여 감정을 표현하였다.
PDF

Design of a Korean Speech Recognition Platform (한국어 음성인식 플랫폼의 설계)

Kwon Oh-Wook;Kim Hoi-Rin;Yoo Changdong;Kim Bong-Wan;Lee Yong-Ju
- MALSORI
- /
- no.51
- /
- pp.151-165
- /
- 2004
For educational and research purposes, a Korean speech recognition platform is designed. It is based on an object-oriented architecture and can be easily modified so that researchers can readily evaluate the performance of a recognition algorithm of interest. This platform will save development time for many who are interested in speech recognition. The platform includes the following modules: Noise reduction, end-point detection, met-frequency cepstral coefficient (MFCC) and perceptually linear prediction (PLP)-based feature extraction, hidden Markov model (HMM)-based acoustic modeling, n-gram language modeling, n-best search, and Korean language processing. The decoder of the platform can handle both lexical search trees for large vocabulary speech recognition and finite-state networks for small-to-medium vocabulary speech recognition. It performs word-dependent n-best search algorithm with a bigram language model in the first forward search stage and then extracts a word lattice and restores each lattice path with a trigram language model in the second stage.
PDF

A Study on Formants of Vowels for Speaker Recognition (화자 인식을 위한 모음의 포만트 연구)

Ahn Byoung-seob;Shin Jiyoung;Kang Sunmee
- MALSORI
- /
- no.51
- /
- pp.1-16
- /
- 2004
The aim of this paper is to analyze vowels in voice imitation and disguised voice, and to find the invariable phonetic features of the speaker. In this paper we examined the formants of monophthongs /a, u, i, o, {$\omega},{\;}{\varepsilon},{\;}{\Lambda}$/. The results of the present are as follows : $\circled1$ Speakers change their vocal tract features. $\circled2$ Vowels /a, ${\varepsilon}$, i/ appear to be proper for speaker recognition since they show invariable acoustic feature during voice modulation. $\circled3$ F1 does not change easily compared to higher formants. $\circled4$ F3-F2 appears to be constituent for a speaker identification in vowel /a/ and /$\varepsilon$/, and F4-F2 in vowel /i/. $\circled5$ Resulting of F-ratio, differences of each formants were more useful than individual formant of a vowel to speaker recognition.
PDF

신경회로망을 이용한 채터진동의 인프로세스 감시

Park, Chul;Kang, Myung-Chang;Kim, Jung-Suk
- Proceedings of the Korean Society of Precision Engineering Conference
- /
- 1993.10a
- /
- pp.70-75
- /
- 1993
Chatter vibration is an unwanted phenomenon in metal cutting and it always affects surface finish, tool life machine life and the productivity of machining process. The In-process monitoring & control of chatter vibration is necessarily required to automation system. In this study, we constructed the multi-sensing system using Tool Dynamometer,Accelerometer and AE(Acoustic Emission) sensor for the credible detection of chatter vibration. And a new approach using a neural network to process the features of multi-sensor for the recognition of chatter vibration in turning operation is proposed. With the back propagation training process, the neural network memorize and classify the feature difference of multi-sensor signals.
PDF

Search Result 238, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)