Search | Korea Science

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

Kim, Min Sik;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.12 no.3
- /
- pp.65-71
- /
- 2020
Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.
https://doi.org/10.13064/KSSS.2020.12.3.065 인용 PDF KSCI

The basic experiments for the fabrication of the SPUDT type Inter using the SFIT type filter (SFIT형태를 이용한 SPUDT형 필터제작에 관한 기초실험)

You, Il-Hyun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.10
- /
- pp.1916-1923
- /
- 2007
We have studied to obtain the SAW filter for the passband was formed on the Langasite substrate and was evaporated by Aluminum-Copper alloy and thin we performed computer-simulated by simulator. We cm fabricate that the block weighted type IDT as an input transducer of the filter and the withdrawal weighted type IDT as an output transducer of the filter from the results of our computer-simulation. Also, we have performed to obtain the properly design conditions about phase shift of the SAW filter for WCDMA. We have employed that the number of pairs of the input and output IDT are 50 pairs and the thickness and the width of reflector are $5000\;{\AA}$ and $3.6{\mu}m$ respectively. And we have employed that the distances from the hot electrode to the reflector are $2.0{\mu}m$, $2.4{\mu}m$ and the distance from the hot electrode to the ground is $1.5{\mu}m$ respectively. Frequency response of the fabricated SAW filter has the property that the center frequency is about 190MHz and bandwidth at the 3dB is probably 7,8MHz. And we could obtain that return loss is less then -18dB, ripple characteristics is probably 3dB and triple transit echo is less then -25dB after when we have matched impedance.
https://doi.org/10.6109/jkiice.2007.11.10.1916 인용 PDF KSCI

Comparison of target classification accuracy according to the aspect angle and the bistatic angle in bistatic sonar (양상태 소나에서의 자세각과 양상태각에 따른 표적 식별 정확도 비교)

Choo, Yeon-Seong;Byun, Sung-Hoon;Choo, Youngmin;Choi, Giyung
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.4
- /
- pp.330-336
- /
- 2021
In bistatic sonar operation, the scattering strength of a sonar target is characterized by the probe signal frequency, the aspect angle and the bistatic angle. Therefore, the target detection and identification performance of the bistatic sonar may vary depending on how the positions of the target, sound source, and receiver are changed during sonar operation. In this study, it was evaluated which variable is advantageous to change by comparing the target identification performance between the case of changing the aspect angle and the case of changing the bistatic angle during the operation. A scenario of identifying a hollow sphere and a cylinder was assumed, and performance was compared by classifying two targets with a support vector machine and comparing their accuracy using a finite element method-based acoustic scattering simulation. As a result of comparison, using the scattering strength defined by the frequency and the bistatic angle with the aspect angle fixed showed superior average classification accuracy. It means that moving the receiver to change the bistatic angle is more effective than moving the sound source to change the aspect angle for target identification.
https://doi.org/10.7776/ASK.2021.40.4.330 인용 PDF KSCI

Investigation of the listening environment of classrooms for elderly people using speech intelligibility tests (음성명료도 시험에 의한 노인 교육시설의 청취환경 조사)

Park, Chan-Jae;Kim, Bo-Gyeong;Haan, Chan-Hoon
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.1
- /
- pp.18-30
- /
- 2021
The ultimate goal of the present study is to establish the acoustical performance standards of classroom for the elderly who are incomplete hearing people. As a pilot survey, the present study was conducted to investigate the listening environment and the actual condition of speech perception performance of education facilities for elderly, Acoustic performances of two education facilities for elderly in Cheongju were measured and questionnaire survey was done to elderly people. Also, speech intelligibility tests were undertaken by Consonant Vowel Consonant (CVC) and Phonetically Balanced Words (PBW) methods. The questionnaire survey showed that the elderly were satisfied with the listening environment of the educational facilities in general. Also, it was found that acoustical performances satisfy with the acoustic criteria of general classrooms in Korea. However, the results of the speech intelligibility test showed that the scores of elderly were significantly lower than twenties with normal hearing. It was also revealed that the scores are reduced as the age increases. Thus, it was concluded that the acoustical performance standards of educational facilities for the normal hearing were not suitable for educational facilities for the elderly.
https://doi.org/10.7776/ASK.2021.40.1.018 인용 PDF KSCI

Investigation of the listening environment for lower grade students in elementary school using subjective tests (주관적 평가법을 이용한 초등학교 저학년 교실의 청취환경 조사)

Park, Chan-Jae;Haan, Chan-Hoon
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.3
- /
- pp.201-212
- /
- 2021
The present study was conducted as a pilot investigation to suggest the standards of acoustic performance for classrooms suitable for incomplete hearing people such as children under 9 years of age. Subjective evaluations such as questionnaire and speech intelligibility test were conducted to 264 students at two elementary schools in Cheong-ju in order to analyze the characteristics of the listening environment in the classrooms of the lower grades in elementary school. The survey was undertaken with a total of 264 students at two elementary schools in Cheong-ju, and investigated their satisfaction with the classroom listening environment. As a result, students responded that the most helpful information type for understanding class content is the voice of teacher. In addition, the volume of the current teacher's voice is normal, and the level of clarity is highly satisfactory. As for the acoustic performance of the classroom, the opinion that the noise was normal and the reverberation was very short was found to be dominant in overall satisfaction with the listening environment. Meanwhile, as a result of speech intelligibility test using the word list selected for the lower grade students of elementary school, it could be inferred that the longitudinal axis distance from the sound source in the case of 8-year-olds is a factor that affects speech recognition.
https://doi.org/10.7776/ASK.2021.40.3.201 인용 PDF KSCI

Underwater object radial velocity estimation method using two different band hyperbolic frequency modulation pulses with opposite sweep directions and its performance analysis (두 대역 상반된 스윕방향 hyperbolic frequency modulation 펄스로 수중물체 시선속도추정 기법 및 성능분석)

Chomgun Cho;Euicheol Jeong
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.1
- /
- pp.25-31
- /
- 2023
In order to estimate the radial speed of an underwater object so-called target with active sonar, Continuous Wave (CW) pulse is generally used, but if a target is slow and at near distance, it is not easy to estimate the radial velocity of the target due to acoustic reverberation in the ocean. In 2017, Wang et al. utilized broadband signal of two Hyperbolic Frequency Modulation (HFM) pulses, which is known as a doppler-invariant pulse, with equal frequency band and in opposite sweep directions to overcome this problem and successfully estimate the radial speed of slow-moving nearby target. They demonstrated the estimation of the radial velocity with computer simulation using the parameters of two HFM starting time differences and receiving times. However, for it uses two HFM pulses with equal frequency, cross-correlation between the two pulses negatively affect the detection performance. To mitigate this cross-correlation effect, we suggest using two different band HFM with the opposite sweep directions. In this paper, a method of radial velocity estimation is derived and simulated using two HFM pulses with the pulse length of 1 second and bandwidth of 400 Hz. Applying the suggested method, the radial velocity was estimated with approximately 6 % of relative error in the simulation.
https://doi.org/10.7776/ASK.2023.42.1.025 인용 PDF

High-resolution range and velocity estimation method based on generalized sinusoidal frequency modulation for high-speed underwater vehicle detection (고속 수중운동체 탐지를 위한 일반화된 사인파 주파수 변조 기반 고해상도 거리 및 속도 추정 기법)

Jinuk Park;Geunhwan Kim;Jongwon Seok;Jungpyo Hong
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.4
- /
- pp.320-328
- /
- 2023
Underwater active target detection is vital for defense systems, requiring accurate detection and estimation of distance and velocity. Sequential transmission is necessary at each beam angle, but divided pulse length leads to range ambiguity. Multi-frequency transmission results in time-bandwidth product losses when bandwidth is divided. To overcome these problem, we propose a novel method using Generalized Sinusoidal Frequency Modulation (GSFM) for rapid target detection, enabling low-correlation pulses between subpulses without bandwidth division. The proposed method allows for rapid updates of the distance and velocity of target by employing GSFM with minimized pulse length. To evaluate our method, we simulated an underwater environment with reverberation. In the simulation, a linear frequency modulation of 0.05 s caused an average distance estimation error of 50 % and a velocity estimation error of 103 % due to limited frequency band. In contrast, GSFM accurately and quickly tracked targets with distance and velocity estimation errors of 10 % and 14 %, respectively, even with pulses of the same length. Furthermore, GSFM provided approximate azimuth information by transmitting highly orthogonal subpulses for each azimuth.
https://doi.org/10.7776/ASK.2023.42.4.320 인용 PDF

Evaluation of floor impact sound and airborne sound insulation performance of cross laminated timber slabs and their toppings (구조용 직교 집성판 슬래브와 상부 토핑 조건에 따른 바닥충격음 및 공기전달음 평가)

Hyo-Jin Lee;Yeon-Su Ha;Sang-Joon Lee
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.6
- /
- pp.572-583
- /
- 2023
Demand for wood in construction is increasing worldwide. In Korea, technical reviews of high-rise Cross Laminated Timber (CLT) buildings are under way. In this paper, Floor Impact Sound Insulation Performance (FISIP) and Transmission Loss (TL) of 150 mm thick CLT floor panels made of two domestic species, Larix kaempferi and Pinus densiflora, are investigated. The CLT slabs were tested in reverberation chambers connected vertically. When comparing Single Number Quantity (SNQ) of FISIP of the bare panels, the Larix CLT is 3 dB lower in heavy-weight and 1 dB in light-weight than the Pinus CLT. However, there was no difference when concrete toppings were added to improve the performance. As the concrete toppings became thicker, the heavy-weight was reduced by 9 dB ~ 20 dB, and the light-weight by 20 dB ~ 30 dB. And the analysis of these results with area density has confirmed that the area densities are highly correlated (R² = 0.94 ~ 0.99) to the FISIP of the CLT. The types of CLT didn't affect the TL. Comparison of theoretical TL values with measured TL values has shown that the frequency characteristics are similar but 8 dB ~ 12 dB lower in measured values. The relationship between the TL and frequency characteristics of the tested CLT slabs was derived by using the correction value.
https://doi.org/10.7776/ASK.2023.42.6.572 인용 PDF

A Study on the Measurement Method for Improvement of Reliability for Heavy-Weight Floor Impact Sound Measurement (중량 바닥충격음 측정의 신뢰성 향상을 위한 측정방법 검토)

Joo, Moon-Ki;Park, Jong-Young;Yang, Kwan-Seop;Oh, Yang-Ki
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.4
- /
- pp.163-170
- /
- 2008
Most of receiving rooms for the measurement of floor impact sound have rectangular shapes with couple of meters of dimension, with reflective finishing, no furniture, no curtains. Modal overlaps in those condition are the major reason for the low reproducibility, and as a matter of course, the low credibility. It is the major purpose of this study that searching for a better measurement method which mitigate the effect of modal overlap on measurement. Two ways of methods are tested. One is the way described in ISO standards which enables controlling the room modes of receiving rooms, the other is the way which enables to get more precise spatial averages in receiving rooms with room modes. It is not easy maintaining the reverberation time of low frequency bands in the range between 1s and 2s, though it is proven to be effective controlling the room modes with base traps. Space-time average SPL's through combinations of rotating microphones are easy to measure, and have good consistencies with average SPL of entire receiving room.
https://doi.org/10.7776/ASK.2008.27.4.163 인용 PDF KSCI

Improving target recognition of active sonar multi-layer processor through deep learning of a small amounts of imbalanced data (소수 불균형 데이터의 심층학습을 통한 능동소나 다층처리기의 표적 인식성 개선)

Young-Woo Ryu;Jeong-Goo Kim
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.225-233
- /
- 2024
Active sonar transmits sound waves to detect covertly maneuvering underwater objects and detects the signals reflected back from the target. However, in addition to the target's echo, the active sonar's received signal is mixed with seafloor, sea surface reverberation, biological noise, and other noise, making target recognition difficult. Conventional techniques for detecting signals above a threshold not only cause false detections or miss targets depending on the set threshold, but also have the problem of having to set an appropriate threshold for various underwater environments. To overcome this, research has been conducted on automatic calculation of threshold values through techniques such as Constant False Alarm Rate (CFAR) and application of advanced tracking filters and association techniques, but there are limitations in environments where a significant number of detections occur. As deep learning technology has recently developed, efforts have been made to apply it in the field of underwater target detection, but it is very difficult to acquire active sonar data for discriminator learning, so not only is the data rare, but there are only a very small number of targets and a relatively large number of non-targets. There are difficulties due to the imbalance of data. In this paper, the image of the energy distribution of the detection signal is used, and a classifier is learned in a way that takes into account the imbalance of the data to distinguish between targets and non-targets and added to the existing technique. Through the proposed technique, target misclassification was minimized and non-targets were eliminated, making target recognition easier for active sonar operators. And the effectiveness of the proposed technique was verified through sea experiment data obtained in the East Sea.
https://doi.org/10.7776/ASK.2024.43.2.225 인용 PDF

Search Result 391, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)