• Title/Summary/Keyword: Acoustic Feature

Search Result 238, Processing Time 0.026 seconds

Identification and Detection of Emotion Using Probabilistic Output SVM (확률출력 SVM을 이용한 감정식별 및 감정검출)

  • Cho, Hoon-Young;Jung, Gue-Jun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.375-382
    • /
    • 2006
  • This paper is about how to identify emotional information and how to detect a specific emotion from speech signals. For emotion identification and detection task. we use long-term acoustic feature parameters and select the optimal Parameters using the feature selection technique based on F-score. We transform the conventional SVM into probabilistic output SVM for our emotion identification and detection system. In this paper we propose three approximation methods for log-likelihoods in a hypothesis test and compare the performance of those three methods. Experimental results using the SUSAS database showed the effectiveness of both feature selection and Probabilistic output SVM in the emotion identification task. The proposed methods could detect anger emotion with 91.3% correctness.

Electromyographic evidence for a gestural-overlap analysis of vowel devoicing in Korean

  • Jun, Sun-A;Beckman, M.;Niimi, Seiji;Tiede, Mark
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.153-200
    • /
    • 1997
  • In languages such as Japanese, it is very common to observe that short peripheral vowel are completely voiceless when surrounded by voiceless consonants. This phenomenon has been known as Montreal French, Shanghai Chinese, Greek, and Korean. Traditionally this phenomenon has been described as a phonological rule that either categorically deletes the vowel or changes the [+voice] feature of the vowel to [-voice]. This analysis was supported by Sawashima (1971) and Hirose (1971)'s observation that there are two distinct EMG patterns for voiced and devoiced vowel in Japanese. Close examination of the phonetic evidence based on acoustic data, however, shows that these phonological characterizations are not tenable (Jun & Beckman 1993, 1994). In this paper, we examined the vowel devoicing phenomenon in Korean using data from ENG fiberscopic and acoustic recorders of 100 sentences produced by one Korean speaker. The results show that there is variability in the 'degree of devoicing' in both acoustic and EMG signals, and in the patterns of glottal closing and opening across different devoiced tokens. There seems to be no categorical difference between devoiced and voiced tokens, for either EMG activity events or glottal patterns. All of these observations support the notion that vowel devoicing in Korean can not be described as the result of the application of a phonological rule. Rather, devoicing seems to be a highly variable 'phonetic' process, a more or less subtle variation in the specification of such phonetic metrics as degree and timing of glottal opening, or of associated subglottal pressure or intra-oral airflow associated with concurrent tone and stricture specifications. Some of token-pair comparisons are amenable to an explanation in terms of gestural overlap and undershoot. However, the effect of gestural timing on vocal fold state seems to be a highly nonlinear function of the interaction among specifications for the relative timing of glottal adduction and abduction gestures, of the amplitudes of the overlapped gestures, of aerodynamic conditions created by concurrent oral tonal gestures, and so on. In summary, to understand devoicing, it will be necessary to examine its effect on phonetic representation of events in many parts of the vocal tracts, and at many stages of the speech chain between the motor intent and the acoustic signal that reaches the hearer's ear.

  • PDF

Study on the Acoustic Behaviour Pattern of Fish Shool and Species Identification 1. Shoal Behaviour pattern of anchovy (Engraulis japonicus) in Korean waters and Species Identification Test. (어군의 음향학적 형태 및 분포특성과 어종식별에 관한 연구 1.한국 연근해 멸치어군의 형태 및 분포특성과 종식별 실험)

  • 김장근
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.34 no.1
    • /
    • pp.52-61
    • /
    • 1998
  • We studied behaviour pattern of anchovy (Engraulis japonicus) shoal by a method of shoal echo integration and tested species identification by a method of artificial neural network using the acoustic data collected in the East China Sea in March 1994 and in the southern coastal waters of the East Sea of Korea in March 1995. Between areas, frequency distribution of 10 shoal descriptors was different, which showed characteristics of shoal behaviour in size, bathymetric position and acoustic strength. The range and mean of shoal size distribution in length and height was wider and bigger in the southern coastal waters of the East Sea than in the East China Sea. Relative shoal size of China Sea. Fractal dimension of shoal was almost same in both areas. Mean volume reverbration index of shoal was 3 dB higher in the southern coastal waters of the East Sea than in the East China Sea. The depth layer of shoal distribution was related to bottom depth in the southern coastal waters of the East Sea, while it was between near surface and central layer in the East China Sea. Principal component analysis of shoal descriptors showed the correlation between shoal size and acoustic strength which was higher in the southern coastal waters of the East Sea, than in the East China Sea. Correlation was also found among the bathymetric positions of shoal to some degree higher in the southern coastal waters of the East Sea than in the East China Sea. The anchovy shoal of two areas was identified by artificial neural network. The contribution factor index (Cio) of the shoal descriptors between two areas were almost identical feature. The shoal volume reverberation index (Rv) was showed the highest contribution to the species identification, while shoal length and shoal height showed relatively high negative contribution to the species identification.

  • PDF

Segmentation of Continuous Speech based on PCA of Feature Vectors (주요고유성분분석을 이용한 연속음성의 세그멘테이션)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.2
    • /
    • pp.40-45
    • /
    • 2000
  • In speech corpus generation and speech recognition, it is sometimes needed to segment the input speech data without any prior knowledge. A method to accomplish this kind of segmentation, often called as blind segmentation, or acoustic segmentation, is to find boundaries which minimize the Euclidean distances among the feature vectors of each segments. However, the use of this metric alone is prone to errors because of the fluctuations or variations of the feature vectors within a segment. In this paper, we introduce the principal component analysis method to take the trend of feature vectors into consideration, so that the proposed distance measure be the distance between feature vectors and their projected points on the principal components. The proposed distance measure is applied in the LBDP(level building dynamic programming) algorithm for an experimentation of continuous speech segmentation. The result was rather promising, resulting in 3-6% reduction in deletion rate compared to the pure Euclidean measure.

  • PDF

A STUDY ON THE IMPLEMENTATION OF ARTIFICIAL NEURAL NET MODELS WITH FEATURE SET INPUT FOR RECOGNITION OF KOREAN PLOSIVE CONSONANTS (한국어 파열음 인식을 위한 피쳐 셉 입력 인공 신경망 모델에 관한 연구)

  • Kim, Ki-Seok;Kim, In-Bum;Hwang, Hee-Yeung
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.535-538
    • /
    • 1990
  • The main problem in speech recognition is the enormous variability in acoustic signals due to complex but predictable contextual effects. Especially in plosive consonants it is very difficult to find invariant cue due to various contextual effects, but humans use these contextual effects as helpful information in plosive consonant recognition. In this paper we experimented on three artificial neural net models for the recognition of plosive consonants. Neural Net Model I used "Multi-layer Perceptron ". Model II used a variation of the "Self-organizing Feature Map Model". And Model III used "Interactive and Competitive Model" to experiment contextual effects. The recognition experiment was performed on 9 Korean plosive consonants. We used VCV speech chains for the experiment on contextual effects. The speech chain consists of Korean plosive consonants /g, d, b, K, T, P, k, t, p/ (/ㄱ, ㄷ, ㅂ, ㄲ, ㄸ, ㅃ, ㅋ, ㅌ, ㅍ/) and eight Korean monothongs. The inputs to Neural Net Models were several temporal cues - duration of the silence, transition and vot -, and the extent of the VC formant transitions to the presence of voicing energy during closure, burst intensity, presence of asperation, amount of low frequency energy present at voicing onset, and CV formant transition extent from the acoustic signals. Model I showed about 55 - 67 %, Model II showed about 60%, and Model III showed about 67% recognition rate.

  • PDF

Analysis of Feature Extraction Methods for Distinguishing the Speech of Cleft Palate Patients (구개열 환자 발음 판별을 위한 특징 추출 방법 분석)

  • Kim, Sung Min;Kim, Wooil;Kwon, Tack-Kyun;Sung, Myung-Whun;Sung, Mee Young
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1372-1379
    • /
    • 2015
  • This paper presents an analysis of feature extraction methods used for distinguishing the speech of patients with cleft palates and people with normal palates. This research is a basic study on the development of a software system for automatic recognition and restoration of speech disorders, in pursuit of improving the welfare of speech disabled persons. Monosyllable voice data for experiments were collected for three groups: normal speech, cleft palate speech, and simulated clef palate speech. The data consists of 14 basic Korean consonants, 5 complex consonants, and 7 vowels. Feature extractions are performed using three well-known methods: LPC, MFCC, and PLP. The pattern recognition process is executed using the acoustic model GMM. From our experiments, we concluded that the MFCC method is generally the most effective way to identify speech distortions. These results may contribute to the automatic detection and correction of the distorted speech of cleft palate patients, along with the development of an identification tool for levels of speech distortion.

Linear prediction analysis-based method for detecting snapping shrimp noise (선형 예측 분석 기반의 딱총 새우 잡음 검출 기법)

  • Jinuk Park;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.3
    • /
    • pp.262-269
    • /
    • 2023
  • In this paper, we propose a Linear Prediction (LP) analysis-based feature for detecting Snapping Shrimp (SS) Noise (SSN) in underwater acoustic data. SS is a species that creates high amplitude signals in shallow, warm waters, and its frequent and loud sound is a major source of noise. The proposed feature takes advantage of the characteristic of SSN, which is sudden and rapidly disappearing, by using LP analysis to detect the exact noise interval and reduce the effects of SSN. The error between the predicted and measured value is large and results in effective SSN detection. To further improve performance, a constant false alarm rate detector is incorporated into the proposed feature. Our evaluation shows that the proposed methods outperform the state-of-the-art MultiLayer-Wavelet Packet Decomposition (ML-WPD) in terms of receiver operating characteristic curve and Area Under the Curve (AUC), with the LP analysis-based feature achieving a higher AUC by 0.12 on average and lower computational complexity.

Tool Condition Monitoring using AE Signal in Micro Endmilling (마이크로 엔드밀링에서 AE 신호를 이용한 공구상태 감시)

  • Kang Ik Soo;Jeong Yun Sik;Kwon Dong Hee;Kim Jeon Ha;Kim Jeong Suk;Ahn Jung Hwan
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.23 no.1 s.178
    • /
    • pp.64-71
    • /
    • 2006
  • Ultraprecision machining and MEMS technology have been taken more and more important position in machining of microparts. Micro endmilling is one of the prominent technology that has wide spectrum of application field ranging from macro parts to micro products. Also, the method of micro-grooving using micro endmill is used widely owing to many merit, but has problems of precision and quality of products due to tool wear and tool fracture. This investigation deals with state monitoring using acoustic emission(AE) signal in the micro-grooving. Characteristic evaluation of AE raw signal, AE hit and frequency analysis for condition monitoring is presented. Also, the feature extraction of AE signal directly related to machining process is executed. Then, the distinctive micro endmill state according to the each tool condition is classified by the fuzzy C-means algorithm.

Early warning of hazard for pipelines by acoustic recognition using principal component analysis and one-class support vector machines

  • Wan, Chunfeng;Mita, Akira
    • Smart Structures and Systems
    • /
    • v.6 no.4
    • /
    • pp.405-421
    • /
    • 2010
  • This paper proposes a method for early warning of hazard for pipelines. Many pipelines transport dangerous contents so that any damage incurred might lead to catastrophic consequences. However, most of these damages are usually a result of surrounding third-party activities, mainly the constructions. In order to prevent accidents and disasters, detection of potential hazards from third-party activities is indispensable. This paper focuses on recognizing the running of construction machines because they indicate the activity of the constructions. Acoustic information is applied for the recognition and a novel pipeline monitoring approach is proposed. Principal Component Analysis (PCA) is applied. The obtained Eigenvalues are regarded as the special signature and thus used for building feature vectors. One-class Support Vector Machine (SVM) is used for the classifier. The denoising ability of PCA can make it robust to noise interference, while the powerful classifying ability of SVM can provide good recognition results. Some related issues such as standardization are also studied and discussed. On-site experiments are conducted and results prove the effectiveness of the proposed early warning method. Thus the possible hazards can be prevented and the integrity of pipelines can be ensured.

Implementation of HMM Based Speech Recognizer with Medium Vocabulary Size Using TMS320C6201 DSP (TMS320C6201 DSP를 이용한 HMM 기반의 음성인식기 구현)

  • Jung, Sung-Yun;Son, Jong-Mok;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1E
    • /
    • pp.20-24
    • /
    • 2006
  • In this paper, we focused on the real time implementation of a speech recognition system with medium size of vocabulary considering its application to a mobile phone. First, we developed the PC based variable vocabulary word recognizer having the size of program memory and total acoustic models as small as possible. To reduce the memory size of acoustic models, linear discriminant analysis and phonetic tied mixture were applied in the feature selection process and training HMMs, respectively. In addition, state based Gaussian selection method with the real time cepstral normalization was used for reduction of computational load and robust recognition. Then, we verified the real-time operation of the implemented recognition system on the TMS320C6201 EVM board. The implemented recognition system uses memory size of about 610 kbytes including both program memory and data memory. The recognition rate was 95.86% for ETRI 445DB, and 96.4%, 97.92%, 87.04% for three kinds of name databases collected through the mobile phones.