Sources separation of passive sonar array signal using recurrent neural network-based deep neural network with 3-D tensor

Sangheon Lee;Dongku Jung;Jaesok Yu;

doi:10.7776/ASK.2023.42.4.357

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 42 Issue 4
/
Pages.357-363
/
2023
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Sources separation of passive sonar array signal using recurrent neural network-based deep neural network with 3-D tensor

3-D 텐서와 recurrent neural network기반 심층신경망을 활용한 수동소나 다중 채널 신호분리 기술 개발

Sangheon Lee ;
Dongku Jung ;
Jaesok Yu (Department of Robotics & Mechatronics Engineering, DGIST)

이상헌 (대구경북과학기술원 로봇및기계전자공학과) ;
정동규 (대구경북과학기술원 로봇및기계전자공학과) ;
유재석 (대구경북과학기술원 로봇및기계전자공학과)

Received : 2023.05.09
Accepted : 2023.07.18
Published : 2023.07.31

https://doi.org/10.7776/ASK.2023.42.4.357 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In underwater signal processing, separating individual signals from mixed signals has long been a challenge due to low signal quality. The common method using Short-time Fourier transform for spectrogram analysis has faced criticism for its complex parameter optimization and loss of phase data. We propose a Triple-path Recurrent Neural Network, based on the Dual-path Recurrent Neural Network's success in long time series signal processing, to handle three-dimensional tensors from multi-channel sensor input signals. By dividing input signals into short chunks and creating a 3D tensor, the method accounts for relationships within and between chunks and channels, enabling local and global feature learning. The proposed technique demonstrates improved Root Mean Square Error and Scale Invariant Signal to Noise Ratio compared to the existing method.

다양한 신호가 혼합된 수중 신호로부터 각각의 신호를 분리하는 기술은 오랫동안 연구되어왔지만, 낮은 품질의 수중 신호의 특성 상 쉽게 해결되지 않는 문제이다. 현재 주로 사용되는 방법은 Short-time Fourier transform을 사용하여 수신된 음향신호의 스펙트로그램을 얻은 뒤, 주파수의 특성을 분석하여 신호를 분리하는 기술이다. 하지만 매개변수의 최적화가 까다롭고, 스펙트로그램으로 변환하는 과정에서 위상 정보들이 손실되는 한계점이 지적되었다. 본 연구에서는 이러한 문제를 해결하기 위해 긴 시계열 신호 처리에서 좋은 성능을 보인 Dual-path Recurrent Neural Network을 기반으로, 다중 채널 센서로부터 생성된 입력신호인 3차원 텐서를 처리할 수 있도록 변형된 Tripple-path Recurrent Neural Network을 제안한다. 제안하는 기술은 먼저 다중 채널 입력 신호를 짧은 조각으로 분할하고 조각 내 신호 간, 구성된 조각간, 그리고 채널 신호 간의 각각의 관계를 고려한 3차원 텐서를 생성하여 로컬 및 글로벌 특성을 학습한다. 제안된 기법은, 기존 방법에 비해 개선된 Root Mean Square Error 값과 Scale Invariant Signal to Noise Ratio을 가짐을 확인하였다.

Keywords

Acknowledgement

이 논문은 2023년 정부(방위사업청)의 재원으로 국방기술진흥연구소의 지원을 받아 수행된 연구임(20-106-00-003).

References

F. Bahmaninezhad, J. Wu, R. Gu, S.-X. Zhang, Y. Xu, M. Yu, and D. Yu, "A comprehensive study of speech separation: spectrogram vs waveform separation," Proc. Interspeech, 4574-4578 (2019).
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Singing-voice separation from monaural recordings using deep recurrent neural networks," Proc. ISMIR, 477-482 (2014).
B. Gao, W. L. Woo, and S. S. Dlay, "Adaptive sparsity non-negative matrix factorization for single-channel source separation," IEEE J. Sel. Top. Signal Process, 5, 989-1001 (2011). https://doi.org/10.1109/JSTSP.2011.2160840
N. mitianoudis and M. E. davies, "Audio source separation of convolutive mixtures," IEEE trans. Speech, Audio, Process. 11, 489-497 (2003). https://doi.org/10.1109/TSA.2003.815820
D. Stoller, S. Ewert, and S. Dixon, "Wave-u-net: A multi-scale neural network for end-to-end audio source separation," Proc. ISMIR, 1-7 (2018).
Y. Luo, Z. Chen, and T. Yoshioka, "Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation," Proc. ICASSP, 46-50 (2020).
S. Venkataramani, J. Casebeer, and P. Smaragdis, "End-to-end source separation with adaptive frontends," Proc. 52nd Asilomar Conf. Sig. Sys. Comput. 684-688 (2018).
F. Lluis, J. Pons, and X. Serra, "End-to-end music source separation: Is it possible in the waveform domain?," Proc. Interspeech, 4619-4623 (2018).
I. Kavalerov, S. Wisdom, H. Erdogan, B. Patton, K. Wilson, J. Le Roux, and J. R. Hershey, "Universal sound separation," Proc. IEEE WASPAA, 175-179 (2019).
K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," Proc. ECCV, 630-645 (2016).
Y. Luo and N. Mesgarani, "Tasnet: time-domain audio separation network for real-time, single-channel speech separation," IEEE ICASSP, 696-700 (2018).
D. Santos-Dominguez, S. Torres-Guijarro, A. Cardenal-Lopez, A. Pena-Gimenez, "ShipsEar: An underwater vessel noise database," Appl. Acoust. 113, 64-69 (2016). https://doi.org/10.1016/j.apacoust.2016.06.008
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," Proc. NeurIPS, 1-12 (2019).
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," Proc. ICLR, 1-15 (2014).
Y Luo and N. Mesgarani, "Conv-tasnet: surpassing ideal time-frequency magnitude masking for speech separation," IEEE/ACM Trans. Audio, Speech, and Lang. Process. 27, 1256-1266 (2019). https://doi.org/10.1109/TASLP.2019.2915167
M. Kolbaek, D. Yu, Z.-H. Tan, and J. Jensen, "Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks," IEEE/ACM Trans. Audio, Speech, and Lang. Process. 25, 1901-1913 (2017). https://doi.org/10.1109/TASLP.2017.2726762

The Journal of the Acoustical Society of Korea (한국음향학회지)

Sources separation of passive sonar array signal using recurrent neural network-based deep neural network with 3-D tensor

3-D 텐서와 recurrent neural network기반 심층신경망을 활용한 수동소나 다중 채널 신호분리 기술 개발

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)