• Title/Summary/Keyword: acoustic features

Search Result 329, Processing Time 0.027 seconds

Performance Improvement of Continuous Digits Speech Recognition Using the Transformed Successive State Splitting and Demi-syllable Pair (반음절쌍과 변형된 연쇄 상태 분할을 이용한 연속 숫자 음 인식의 성능 향상)

  • Seo Eun-Kyoung;Choi Gab-Keun;Kim Soon-Hyob;Lee Soo-Jeong
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.1
    • /
    • pp.23-32
    • /
    • 2006
  • This paper describes the optimization of a language model and an acoustic model to improve speech recognition using Korean unit digits. Since the model is composed of a finite state network (FSN) with a disyllable, recognition errors of the language model were reduced by analyzing the grammatical features of Korean unit digits. Acoustic models utilize a demisyllable pair to decrease recognition errors caused by inaccurate division of a phone or monosyllable due to short pronunciation time and articulation. We have used the K-means clustering algorithm with the transformed successive state splitting in the feature level for the efficient modelling of feature of the recognition unit. As a result of experiments, 10.5% recognition rate is raised in the case of the proposed language model. The demi-syllable fair with an acoustic model increased 12.5% recognition rate and 1.5% recognition rate is improved in transformed successive state splitting.

  • PDF

Estimation of the zone of excavation disturbance around tunnels, using resistivity and acoustic tomography

  • Suzuki Koichi;Nakata Eiji;Minami Masayuki;Hibino Etsuhisa;Tani Tomonori;Sakakibara Jyunichi;Yamada Naouki
    • Geophysics and Geophysical Exploration
    • /
    • v.7 no.1
    • /
    • pp.62-69
    • /
    • 2004
  • The objective of this study is to estimate the distribution of a zone disturbed by excavation (EDZ) around tunnels that have been excavated at about 500 m depth in pre-Tertiary hard sedimentary rock. One of the most important tasks is to evaluate changes in the dynamic stability and permeability of the rock around the tunnels, by investigating the properties of the rock after the excavation. We performed resistivity and acoustic tomography using two boreholes, 5 m in length, drilled horizontally from the wall of a tunnel in pre-Tertiary hard conglomerate. By these methods, we detected a low-resistivity and low-velocity zone 1 m in thickness around the wall of the tunnel. The resulting profiles were verified by permeability and evaporation tests performed at the same boreholes. This anomalous zone matched a high-permeability zone caused by open fractures. Next, we performed resistivity monitoring along annular survey lines in a tunnel excavated in pre-Tertiary hard shale by a tunnel-boring machine (TBM). We detected anomalous zones in 2D resistivity profiles surrounding the tunnel. A low-resistivity zone 1 m in thickness was detected around the tunnel when one year had passed after the excavation. However, two years later, the resistivity around the tunnel had increased in a portion, about 30 cm in thickness, of this zone. To investigate this change, we studied the relationship between groundwater flow from the surroundings and evaporation from the wall around the tunnel. These features were verified by the relationship between the resistivity and porosity of rocks obtained by laboratory tests on core samples. Furthermore, the profiles matched well with highly permeable zones detected by permeability and evaporation tests at a horizontal borehole drilled near the survey line. We conclude that the anomalous zones in these profiles indicate the EDZ around the tunnel.

Temporal attention based animal sound classification (시간 축 주의집중 기반 동물 울음소리 분류)

  • Kim, Jungmin;Lee, Younglo;Kim, Donghyeon;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.406-413
    • /
    • 2020
  • In this paper, to improve the classification accuracy of bird and amphibian acoustic sound, we utilize GLU (Gated Linear Unit) and Self-attention that encourages the network to extract important features from data and discriminate relevant important frames from all the input sequences for further performance improvement. To utilize acoustic data, we convert 1-D acoustic data to a log-Mel spectrogram. Subsequently, undesirable component such as background noise in the log-Mel spectrogram is reduced by GLU. Then, we employ the proposed temporal self-attention to improve classification accuracy. The data consist of 6-species of birds, 8-species of amphibians including endangered species in the natural environment. As a result, our proposed method is shown to achieve an accuracy of 91 % with bird data and 93 % with amphibian data. Overall, an improvement of about 6 % ~ 7 % accuracy in performance is achieved compared to the existing algorithms.

Two Dimensional Numerical Study in Gangway of Next Generation High Speed Train For Reduction of Aero-acoustic Noise (차세대 고속전철 차량연결부의 저소음 형상설계를 위한 차량연결부의 2차원적 수치해석 연구)

  • Kang, Hyung-Min;Kim, Cheol-Wan;Cho, Tae-Hwan;Jeon, Wan-Ho;Yun, Su-Hwan;Kwon, Hyeok-Bin;Park, Chun-Su
    • Journal of the Korean Society for Railway
    • /
    • v.14 no.4
    • /
    • pp.327-332
    • /
    • 2011
  • As the preceding research for the design of gangway in the next generation high speed train, the aero-acoustic noise at the gangway is calculated. For this purpose, the shape of gangway with mud flaps is assumed as the two-dimensional cavity. Then, 5 gap sizes between mud flaps of gangway are selected and parametric study is performed according to the gap sizes. From this study, the aerodynamic features such as vortex shedding, pressure, etc. are computed. Also, the aero-acoustic properties of tonal noise and overall noise are analyzed at the 3 locations of microphone and the relation between the gap size of mud flap and the noise level is assessed. Through this study, it is shown that the noise characteristics of base and specific models are better than those of other models.

Analysis and Classification of Acoustic Emission Signals During Wood Drying Using the Principal Component Analysis (주성분 분석을 이용한 목재 건조 중 발생하는 음향방출 신호의 해석 및 분류)

  • Kang, Ho-Yang;Kim, Ki-Bok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.23 no.3
    • /
    • pp.254-262
    • /
    • 2003
  • In this study, acoustic emission (AE) signals due to surface cracking and moisture movement in the flat-sawn boards of oak (Quercus Variablilis) during drying under the ambient conditions were analyzed and classified using the principal component analysis. The AE signals corresponding to surface cracking showed higher in peak amplitude and peak frequency, and shorter in rise time than those corresponding to moisture movement. To reduce the multicollinearity among AE features and to extract the significant AE parameters, correlation analysis was performed. Over 99% of the variance of AE parameters could be accounted for by the first to the fourth principal components. The classification feasibility and success rate were investigated in terms of two statistical classifiers having six independent variables (AE parameters) and six principal components. As a result, the statistical classifier having AE parameters showed the success rate of 70.0%. The statistical classifier having principal components showed the success rate of 87.5% which was considerably than that of the statistical classifier having AE parameters.

Acoustic features of diphthongs produced by children with speech sound disorders (말소리장애 아동이 산출한 이중모음의 음향학적 특성)

  • Cho, Yoon Soo;Pyo, Hwa Young;Han, Jin Soon;Lee, Eun Ju
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.65-72
    • /
    • 2021
  • The aim of this study is to prepare basic data that can be used for evaluation and intervention by investigating the characteristics of diphthongs produced by children with speech sound disorders. To confirm this, two groups of 10 children each, with and without speech sound disorders were asked to imitate the meaningless two-syllable 'diphthongs + da'. The slope of F1 and F2, amount of change of formant, and duration of glide were analyzed by Praat (version 6.1.16). As a result, the difference between the two groups was found in the slope of F1 of /ju/. Children with speech sound disorders had smaller changes in formants and shorter duration time values compared to normal children, and there were statistically significant differences. The amount of change in formant in the glide was found in F1 of /ju, jɛ/, F2 of /jɑ, jɛ/, and there were significant differences in the duration of glide in /ju, jɛ/. The results of this study showed that the range of articulation of diphthongs in children with speech sound disorders is relatively smaller than that of normal children, thus the time it takes to articulate was reduced. These results suggest that the range of articulation and acoustic analysis should be further investigated for evaluation and intervention regarding diphthongs of children with speech sound disorders.

Bird sounds classification by combining PNCC and robust Mel-log filter bank features (PNCC와 robust Mel-log filter bank 특징을 결합한 조류 울음소리 분류)

  • Badi, Alzahra;Ko, Kyungdeuk;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.1
    • /
    • pp.39-46
    • /
    • 2019
  • In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.

Passive sonar signal classification using attention based gated recurrent unit (어텐션 기반 게이트 순환 유닛을 이용한 수동소나 신호분류)

  • Kibae Lee;Guhn Hyeok Ko;Chong Hyun Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.345-356
    • /
    • 2023
  • Target signal of passive sonar shows narrow band harmonic characteristic with a variation in intensity within a few seconds and long term frequency variation due to the Lloyd's mirror effect. We propose a signal classification algorithm based on Gated Recurrent Unit (GRU) that learns local and global time series features. The algorithm proposed implements a multi layer network using GRU and extracts local and global time series features via dilated connections. We learns attention mechanism to weight time series features and classify passive sonar signals. In experiments using public underwater acoustic data, the proposed network showed superior classification accuracy of 96.50 %. This result is 4.17 % higher classification accuracy compared to existing skip connected GRU network.

A Phase-related Feature Extraction Method for Robust Speaker Verification (열악한 환경에 강인한 화자인증을 위한 위상 기반 특징 추출 기법)

  • Kwon, Chul-Hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.3
    • /
    • pp.613-620
    • /
    • 2010
  • Additive noise and channel distortion strongly degrade the performance of speaker verification systems, as it introduces distortion of the features of speech. This distortion causes a mismatch between the training and recognition conditions such that acoustic models trained with clean speech do not model noisy and channel distorted speech accurately. This paper presents a phase-related feature extraction method in order to improve the robustness of the speaker verification systems. The instantaneous frequency is computed from the phase of speech signals and features from the histogram of the instantaneous frequency are obtained. Experimental results show that the proposed technique offers significant improvements over the standard techniques in both clean and adverse testing environments.

Human Laughter Generation using Hybrid Generative Models

  • Mansouri, Nadia;Lachiri, Zied
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1590-1609
    • /
    • 2021
  • Laughter is one of the most important nonverbal sound that human generates. It is a means for expressing his emotions. The acoustic and contextual features of this specific sound are different from those of speech and many difficulties arise during their modeling process. During this work, we propose an audio laughter generation system based on unsupervised generative models: the autoencoder (AE) and its variants. This procedure is the association of three main sub-process, (1) the analysis which consist of extracting the log magnitude spectrogram from the laughter database, (2) the generative models training, (3) the synthesis stage which incorporate the involvement of an intermediate mechanism: the vocoder. To improve the synthesis quality, we suggest two hybrid models (LSTM-VAE, GRU-VAE and CNN-VAE) that combine the representation learning capacity of variational autoencoder (VAE) with the temporal modelling ability of a long short-term memory RNN (LSTM) and the CNN ability to learn invariant features. To figure out the performance of our proposed audio laughter generation process, objective evaluation (RMSE) and a perceptual audio quality test (listening test) were conducted. According to these evaluation metrics, we can show that the GRU-VAE outperforms the other VAE models.