• Title/Summary/Keyword: 음향데이터

Search Result 943, Processing Time 0.023 seconds

Image enhancement in ultrasound passive cavitation imaging using centroid and flatness of received channel data (수신 채널 신호의 무게중심과 평탄도를 이용한 초음파 수동 공동 영상의 화질 개선)

  • Jeong, Mok Kun;Kwon, Sung Jae;Choi, Min Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.4
    • /
    • pp.450-458
    • /
    • 2019
  • Passive cavitation imaging method is used to observe the ultrasonic waves generated when a group of bubbles collapses. A problem with passive cavitation imaging is a low resolution and large side lobe levels. Since ultrasound signals generated by passive cavitation take the form of a pulse, the amplitude distribution of signals received across the receive channels varies depending on the direction of incidence. Both the centroid and flatness were calculated to determine weights at imaging points in order to discriminate between the main and side lobe signals from the signal amplitude distribution of the received channel data and to reduce the side lobe levels. The centroid quantifies how the channel data are distributed across the receive channel, and the flatness measures the variance of the channel data. We applied the centroid weight and the flatness to the passive cavitation image constructed using the delay-and-sum focusing and minimum variance beamforming methods to improve the image quality. Using computer simulation and experiment, we show that the application of weighting in delay-and-sum and minimum variance beamforming reduces side lobe levels.

Direct blast suppression for bi-static sonar systems with high duty cycle based on adaptive filters (고반복률을 갖는 양상태 소나 시스템에서의 적응형 필터를 이용한 송신 직접파 제거 연구)

  • Lee, Wonnyoung;Jeong, Euicheol;Yoon, Kyungsik;Kim, Geunhwan;Kim, Dohyung;You, Yena;Lee, Seokjin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.4
    • /
    • pp.446-460
    • /
    • 2022
  • In this paper, we propose an algorithm to improve target detection rate degradation due to direct blast in a bi-static sonar systems with high duty cycle using an adaptive filters. It is very important to suppress the direct blast in the aforementioned sonar systems because it has a fatal effect on the actual system operation. In this paper, the performance was evaluated by applying the Normalized Least Mean Square (NLMS) and Recursive Least Square (RLS) algorithms to the simulation and sea experimental data. The beam signals of the target and direct blast bearings were used as the input and desired signals, respectively. By optimizing the difference between the two signals, the direct blast is removed and only the target signal is remained. As a result of evaluating the results of the matched filter in the simulation, it was confirmed that the direct blast was removed to the noise level in both Linear Frequency Modultated (LFM) and Generalized Sinusoidal Frequency Modulated (GSFM), and in the case of GSFM, the target sidelobe decreased by more than 20 dB, thereby improving performance. In the sea experiment, it was confirmed that the LFM reduced the level of the transmitted direct wave by 10 dB, the GSFM reduced the level of the transmitted direct wave by about 4 dB, and the side lobe of the target decreased by about 4 dB, thereby improving the performance.

Feasibility of hearing aid gain self-adjustment using speech recognition (말소리 인지를 이용한 보청기 이득 자가 조절의 실현)

  • Yun, Donghyeon;Shen, Yi;Zhang, Zhuohuang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.76-86
    • /
    • 2022
  • Personal hearing devices, such as hearing aids, may be fine-tuned by allowing the users to conduct self-adjustment. Two self-adjustment procedures were developed to collect the listener preferred gains in six octave-frequency bands from 0.25 kHz to 8 kHz. These procedures were designed to allow rapid exploration of a multi-dimensional parameter space using a simple, one-dimensional user control interface (i.e., a programmable knob). The two procedures differ in whether the user interface controls the gains in all frequency bands simultaneously (Procedure A) or only the gain in one frequency band (Procedure B) on a given trial. Monte-Carlo simulations suggested that for both procedures the gain preference identified by simulated listeners rapidly converged to the ground-truth preferred gain profile over the first 20 trials. Initial behavioral evaluations of the self-adjustment procedures, in terms of test-retest reliability, were conducted using 20 young, normal-hearing listeners. Each estimate of the preferred gain profile took less than 20 minutes. The deviation between two separate estimates of the preferred gain profile, conducted at least a week apart, was about 10 dB ~ 15 dB.

A study on fault diagnosis of marine engine using a neural network with dimension-reduced vibration signals (차원 축소 진동 신호를 이용한 신경망 기반 선박 엔진 고장진단에 관한 연구)

  • Sim, Kichan;Lee, Kangsu;Byun, Sung-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.5
    • /
    • pp.492-499
    • /
    • 2022
  • This study experimentally investigates the effect of dimensionality reduction of vibration signal on fault diagnosis of a marine engine. By using the principal component analysis, a vibration signal having the dimension of 513 is converted into a low-dimensional signal having the dimension of 1 to 15, and the variation in fault diagnosis accuracy according to the dimensionality change is observed. The vibration signal measured from a full-scale marine generator diesel engine is used, and the contribution of the dimension-reduced signal is quantitatively evaluated using two kinds of variable importance analysis algorithms which are the integrated gradients and the feature permutation methods. As a result of experimental data analysis, the accuracy of the fault diagnosis is shown to improve as the number of dimensions used increases, and when the dimension approaches 10, near-perfect fault classification accuracy is achieved. This shows that the dimension of the vibration signal can be considerably reduced without degrading fault diagnosis accuracy. In the variable importance analysis, the dimension-reduced principal components show higher contribution than the conventional statistical features, which supports the effectiveness of the dimension-reduced signals on fault diagnosis.

Speech extraction based on AuxIVA with weighted source variance and noise dependence for robust speech recognition (강인 음성 인식을 위한 가중화된 음원 분산 및 잡음 의존성을 활용한 보조함수 독립 벡터 분석 기반 음성 추출)

  • Shin, Ui-Hyeop;Park, Hyung-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.326-334
    • /
    • 2022
  • In this paper, we propose speech enhancement algorithm as a pre-processing for robust speech recognition in noisy environments. Auxiliary-function-based Independent Vector Analysis (AuxIVA) is performed with weighted covariance matrix using time-varying variances with scaling factor from target masks representing time-frequency contributions of target speech. The mask estimates can be obtained using Neural Network (NN) pre-trained for speech extraction or diffuseness using Coherence-to-Diffuse power Ratio (CDR) to find the direct sounds component of a target speech. In addition, outputs for omni-directional noise are closely chained by sharing the time-varying variances similarly to independent subspace analysis or IVA. The speech extraction method based on AuxIVA is also performed in Independent Low-Rank Matrix Analysis (ILRMA) framework by extending the Non-negative Matrix Factorization (NMF) for noise outputs to Non-negative Tensor Factorization (NTF) to maintain the inter-channel dependency in noise output channels. Experimental results on the CHiME-4 datasets demonstrate the effectiveness of the presented algorithms.

Case study on frequency bands contributing the single number quantity for heavy-weight impact sound based on assessment method changes (중량충격음 평가방법 변화에 따른 단일수치평가량 기여 주파수 대역 사례 분석)

  • Hye-kyung Shin;Sang Hee Park;Kyoung-woo Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.565-571
    • /
    • 2023
  • With the introduction of the post-verification system, the measurement of floor impact noise performance on-site has become mandatory, and the evaluation method has changed. To track the performance changes since the policy implementation, research is needed on how the characteristics of heavyweight impact sound change according to the varied evaluation method. In this study, we analyzed the contribution rate of the frequency band-specific sound pressure level on the single-number quantity for a multi-family housing unit with the same floor plan and floor structure, comprising 59 households, based on the changed impact sources and evaluation indicators. It is difficult to compare simply because the method of calculating contributions by frequency band according to the single-day evaluation is different, but the average contribution rate of 63 Hz was 80.8 % in the evaluation method before the introduction of the post-confirmation system (Tire measurement and evaluated as L'i,Fmax,AW), and the average contribution rate of 125 Hz was 19.2 %. The current evaluation method (rubber ball measurement and evaluation as L'iA,Fmax) shows that the contribution rate has decreased to 33.1 % on average at 50 Hz ~ 80 Hz, 58.7 % on average at 100 Hz ~ 160 Hz, 6.9 % on average at 200 Hz ~ 315 Hz, and 1.3 % on average at 400 Hz ~ 630 Hz. This result is a case analysis for the target apartment house, and it is necessary to analyze measurement data for more diverse apartment houses.

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate (특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.544-551
    • /
    • 2023
  • Speech enhancement used to improve the perceptual quality and intelligibility of noise speech has been studied as a method using a complex-valued spectrum that can improve both magnitude and phase in a method using a magnitude spectrum. In this paper, a study was conducted on how to apply attention mechanism to complex-valued spectrum-based speech enhancement systems to further improve the intelligibility and quality of noise speech. The attention is performed based on additive attention and allows the attention weight to be calculated in consideration of the complex-valued spectrum. In addition, the global average pooling was used to consider the importance of the feature map. Complex-valued spectrum-based speech enhancement was performed based on the Deep Complex U-Net (DCUNET) model, and additive attention was conducted based on the proposed method in the Attention U-Net model. The results of the experiments on noise speech in a living room environment showed that the proposed method is improved performance over the baseline model according to evaluation metrics such as Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Object Intelligence (STOI), and consistently improved performance across various background noise environments and low Signal-to-Noise Ratio (SNR) conditions. Through this, the proposed speech enhancement system demonstrated its effectiveness in improving the intelligibility and quality of noisy speech.

Modified AWSSDR method for frequency-dependent reverberation time estimation (주파수 대역별 잔향시간 추정을 위한 변형된 AWSSDR 방식)

  • Min Sik Kim;Hyung Soon Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.91-100
    • /
    • 2023
  • Reverberation time (T60) is a typical acoustic parameter that provides information about reverberation. Since the impacts of reverberation vary depending on the frequency bands even in the same space, frequency-dependent (FD) T60, which offers detailed insights into the acoustic environments, can be useful. However, most conventional blind T60 estimation methods, which estimate the T60 from speech signals, focus on fullband T60 estimation, and a few blind FDT60 estimation methods commonly show poor performance in the low-frequency bands. This paper introduces a modified approach based on Attentive pooling based Weighted Sum of Spectral Decay Rates (AWSSDR), previously proposed for blind T60 estimation, by extending its target from fullband T60 to FDT60. The experimental results show that the proposed method outperforms conventional blind FDT60 estimation methods on the acoustic characterization of environments (ACE) challenge evaluation dataset. Notably, it consistently exhibits excellent estimation performance in all frequency bands. This demonstrates that the mechanism of the AWSSDR method is valuable for blind FDT60 estimation because it reflects the FD variations in the impact of reverberation, aggregating information about FDT60 from the speech signal by processing the spectral decay rates associated with the physical properties of reverberation in each frequency band.

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

  • Suk, Soo-Young;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.250-258
    • /
    • 2007
  • Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.