• Title/Summary/Keyword: 음향 신호 생성

Search Result 136, Processing Time 0.021 seconds

I-vector similarity based speech segmentation for interested speaker to speaker diarization system (화자 구분 시스템의 관심 화자 추출을 위한 i-vector 유사도 기반의 음성 분할 기법)

  • Bae, Ara;Yoon, Ki-mu;Jung, Jaehee;Chung, Bokyung;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.461-467
    • /
    • 2020
  • In noisy and multi-speaker environments, the performance of speech recognition is unavoidably lower than in a clean environment. To improve speech recognition, in this paper, the signal of the speaker of interest is extracted from the mixed speech signals with multiple speakers. The VoiceFilter model is used to effectively separate overlapped speech signals. In this work, clustering by Probabilistic Linear Discriminant Analysis (PLDA) similarity score was employed to detect the speech signal of the interested speaker, which is used as the reference speaker to VoiceFilter-based separation. Therefore, by utilizing the speaker feature extracted from the detected speech by the proposed clustering method, this paper propose a speaker diarization system using only the mixed speech without an explicit reference speaker signal. We use phone-dataset consisting of two speakers to evaluate the performance of the speaker diarization system. Source to Distortion Ratio (SDR) of the operator (Rx) speech and customer speech (Tx) are 5.22 dB and -5.22 dB respectively before separation, and the results of the proposed separation system show 11.26 dB and 8.53 dB respectively.

Comprehensive analysis of deep learning-based target classifiers in small and imbalanced active sonar datasets (소량 및 불균형 능동소나 데이터세트에 대한 딥러닝 기반 표적식별기의 종합적인 분석)

  • Geunhwan Kim;Youngsang Hwang;Sungjin Shin;Juho Kim;Soobok Hwang;Youngmin Choo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.329-344
    • /
    • 2023
  • In this study, we comprehensively analyze the generalization performance of various deep learning-based active sonar target classifiers when applied to small and imbalanced active sonar datasets. To generate the active sonar datasets, we use data from two different oceanic experiments conducted at different times and ocean. Each sample in the active sonar datasets is a time-frequency domain image, which is extracted from audio signal of contact after the detection process. For the comprehensive analysis, we utilize 22 Convolutional Neural Networks (CNN) models. Two datasets are used as train/validation datasets and test datasets, alternatively. To calculate the variance in the output of the target classifiers, the train/validation/test datasets are repeated 10 times. Hyperparameters for training are optimized using Bayesian optimization. The results demonstrate that shallow CNN models show superior robustness and generalization performance compared to most of deep CNN models. The results from this paper can serve as a valuable reference for future research directions in deep learning-based active sonar target classification.

Implementation of Parallel Processor for Sound Synthesis of Guitar (기타의 음 합성을 위한 병렬 프로세서 구현)

  • Choi, Ji-Won;Kim, Yong-Min;Cho, Sang-Jin;Kim, Jong-Myon;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.191-199
    • /
    • 2010
  • Physical modeling is a synthesis method of high quality sound which is similar to real sound for musical instruments. However, since physical modeling requires a lot of parameters to synthesize sound of a musical instrument, it prevents real-time processing for the musical instrument which supports a large number of sounds simultaneously. To solve this problem, this paper proposes a single instruction multiple data (SIMD) parallel processor that supports real-time processing of sound synthesis of guitar, a representative plucked string musical instrument. To control six strings of guitar, we used a SIMD parallel processor which consists of six processing elements (PEs). Each PE supports modeling of the corresponding string. The proposed SIMD processor can generate synthesized sounds of six strings simultaneously when a parallel synthesis algorithm receives excitation signals and parameters of each string as an input. Experimental results using a sampling rate 44.1 kHz and 16 bits quantization indicate that synthesis sounds using the proposed parallel processor were very similar to original sound. In addition, the proposed parallel processor outperforms commercial TI's TMS320C6416 in terms of execution time (8.9x better) and energy efficiency (39.8x better).

Implementation of Non-Stringed Guitar Based on Physical Modeling Synthesis (물리적 모델링 합성법에 기반을 둔 줄 없는 기타 구현)

  • Kang, Myeong-Su;Cho, Sang-Jin;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.119-126
    • /
    • 2009
  • This paper describes the non-stringed guitar composed of laser strings, frets, sound synthesis algorithm and a processor. The laser strings that can depict stroke and playing arpeggios comprise laser modules and photo diodes. Frets are implemented by voltage divider. The guitar body does not need to implement physically because commuted waveguide synthesis is used. The proposed frets enable; players to represent all of chords by the chord glove as well as guitar solo. Sliding, hammering-on and pulling-off sounds are synthesized by using parameters from the voltage divider. Because the pitch shifting corresponds to the time-varying propagation speed in the digital waveguide model, the proposed model can synthesize vibrato as well. After transformation of signals from the laser strings and frets into parameters for synthesis algorithm, the digital signal processor, TMS320F2812, performs the real-time synthesis algorithm and communicates with the DAC. The demonstration movieclip available via the Internet shows one to play a song, 'Arirang', synthesized by proposed algorithm and interfaces in real-time. Consequently, we can conclude that the proposed synthesis algorithm is efficient in guitar solo and there is no problem to play the non-stringed guitar in real-time.

A study on DEMONgram frequency line extraction method using deep learning (딥러닝을 이용한 DEMON 그램 주파수선 추출 기법 연구)

  • Wonsik Shin;Hyuckjong Kwon;Hoseok Sul;Won Shin;Hyunsuk Ko;Taek-Lyul Song;Da-Sol Kim;Kang-Hoon Choi;Jee Woong Choi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.1
    • /
    • pp.78-88
    • /
    • 2024
  • Ship-radiated noise received by passive sonar that can measure underwater noise can be identified and classified ship using Detection of Envelope Modulation on Noise (DEMON) analysis. However, in a low Signal-to-Noise Ratio (SNR) environment, it is difficult to analyze and identify the target frequency line containing ship information in the DEMONgram. In this paper, we conducted a study to extract target frequency lines using semantic segmentation among deep learning techniques for more accurate target identification in a low SNR environment. The semantic segmentation models U-Net, UNet++, and DeepLabv3+ were trained and evaluated using simulated DEMONgram data generated by changing SNR and fundamental frequency, and the DEMONgram prediction performance of DeepShip, a dataset of ship-radiated noise recordings on the strait of Georgia in Canada, was compared using the trained models. As a result of evaluating the trained model with the simulated DEMONgram, it was confirmed that U-Net had the highest performance and that it was possible to extract the target frequency line of the DEMONgram made by DeepShip to some extent.

A Study on Performance Evaluation of Hidden Markov Network Speech Recognition System (Hidden Markov Network 음성인식 시스템의 성능평가에 관한 연구)

  • 오세진;김광동;노덕규;위석오;송민규;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.4
    • /
    • pp.30-39
    • /
    • 2003
  • In this paper, we carried out the performance evaluation of HM-Net(Hidden Markov Network) speech recognition system for Korean speech databases. We adopted to construct acoustic models using the HM-Nets modified by HMMs(Hidden Markov Models), which are widely used as the statistical modeling methods. HM-Nets are carried out the state splitting for contextual and temporal domain by PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) algorithm, which is modified the original SSS algorithm. Especially it adopted the phonetic decision tree to effectively express the context information not appear in training speech data on contextual domain state splitting. In case of temporal domain state splitting, to effectively represent information of each phoneme maintenance in the state splitting is carried out, and then the optimal model network of triphone types are constructed by in the parameter. Speech recognition was performed using the one-pass Viterbi beam search algorithm with phone-pair/word-pair grammar for phoneme/word recognition, respectively and using the multi-pass search algorithm with n-gram language models for sentence recognition. The tree-structured lexicon was used in order to decrease the number of nodes by sharing the same prefixes among words. In this paper, the performance evaluation of HM-Net speech recognition system is carried out for various recognition conditions. Through the experiments, we verified that it has very superior recognition performance compared with the previous introduced recognition system.

  • PDF