• Title/Summary/Keyword: 스펙트로그램

Search Result 135, Processing Time 0.033 seconds

Preprocessing performance of convolutional neural networks according to characteristic of underwater targets (수중 표적 분류를 위한 합성곱 신경망의 전처리 성능 비교)

  • Kyung-Min, Park;Dooyoung, Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.6
    • /
    • pp.629-636
    • /
    • 2022
  • We present a preprocessing method for an underwater target detection model based on a convolutional neural network. The acoustic characteristics of the ship show ambiguous expression due to the strong signal power of the low frequency. To solve this problem, we combine feature preprocessing methods with various feature scaling methods and spectrogram methods. Define a simple convolutional neural network model and train it to measure preprocessing performance. Through experiment, we found that the combination of log Mel-spectrogram and standardization and robust scaling methods gave the best classification performance.

Porcine Wasting Diseases Detection using Light Weight Deep Learning (경량 딥러닝 기반의 돼지 호흡기 질병 탐지)

  • Hong, Minki;Ahn, Hanse;Lee, Jonguk;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.964-966
    • /
    • 2020
  • 전염성이 매우 강한 돼지 호흡기 질병을 빠른 시간 내에 정확하게 탐지하지 못한다면 해당 돈사는 물론 타지역으로 전파되어 심각한 경제적 손실이 발생한다. 본 논문은 이와 같은 돼지 호흡기 질병을 저가격의 임베디드 보드에서도 탐지가 가능한 시스템을 제안한다. 해당 시스템은 돈사에 설치한 소리센서로부터 돼지의 이상 소리를 자동으로 탐지한 후, 탐지한 소리 시그널을 스펙트로그램으로 변환한다. 마지막으로, 스펙트로그램은 딥러닝 알고리즘에 적용되어 돼지 호흡기 질병을 탐지 및 식별한다. 이 때, 일반 컴퓨터 환경에 비해 비용 부담이 적은 임베디드 환경에서 실행되기 위하여 경량 딥러닝 모델인 MnasNet 을 사용하였으며, 임베디드 보드인 NVIDIA TX-2 에서 해당 시스템의 호흡기 질병 식별 성능을 확인한 결과 높은 탐지 성능과 실시간 탐지가 가능함을 확인하였다.

Passive sonar signal classification using graph neural network based on image patch (영상 패치 기반 그래프 신경망을 이용한 수동소나 신호분류)

  • Guhn Hyeok Ko;Kibae Lee;Chong Hyun Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.234-242
    • /
    • 2024
  • We propose a passive sonar signal classification algorithm using Graph Neural Network (GNN). The proposed algorithm segments spectrograms into image patches and represents graphs through connections between adjacent image patches. Subsequently, Graph Convolutional Network (GCN) is trained using the represented graphs to classify signals. In experiments with publicly available underwater acoustic data, the proposed algorithm represents the line frequency features of spectrograms in graph form, achieving an impressive classification accuracy of 92.50 %. This result demonstrates a 8.15 % higher classification accuracy compared to conventional Convolutional Neural Network (CNN).

A Study on the English Pronunciation for English-related Industry (교육산업 활성화를 위한 영어발음 연구)

  • Park, Hee-Suk
    • Journal of Convergence for Information Technology
    • /
    • v.8 no.1
    • /
    • pp.37-42
    • /
    • 2018
  • This study focuses on investigating and comparing the lengths of the five words, vowels, and the ratio of the length of vowels to that of words among the Korean college students with the English native speaker. English sentences were read and recorded by Korean subjects to do this experiment. The vowel lengths were measured from a sound spectrogram, the Praat software program, and these data were analyzed through statistical analysis. I could easily tell that there were differences between the groups and they were significant. In the English front low vowel /${\ae}$/, I was able to find out that native subjects pronounced differently from Korean subjects, and the differences were significant. However, the pronunciation of the English diphthong /ai/, native subjects pronounced significantly shorter than Korean subjects.

Principal component analysis based frequency-time feature extraction for seismic wave classification (지진파 분류를 위한 주성분 기반 주파수-시간 특징 추출)

  • Min, Jeongki;Kim, Gwantea;Ku, Bonhwa;Lee, Jimin;Ahn, Jaekwang;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.6
    • /
    • pp.687-696
    • /
    • 2019
  • Conventional feature of seismic classification focuses on strong seismic classification, while it is not suitable for classifying micro-seismic waves. We propose a feature extraction method based on histogram and Principal Component Analysis (PCA) in frequency-time space suitable for classifying seismic waves including strong, micro, and artificial seismic waves, as well as noise classification. The proposed method essentially employs histogram and PCA based features by concatenating the frequency and time information for binary classification which consist strong-micro-artificial/noise and micro/noise and micro/artificial seismic waves. Based on the recent earthquake data from 2017 to 2018, effectiveness of the proposed feature extraction method is demonstrated by comparing it with existing methods.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

A Study on the Foreign Accent of English Stressed Syllables (영어강세음절의 외국인어투에 관한 연구)

  • Park, Hee-Suk
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.51-57
    • /
    • 2016
  • This study aims at investigating and comparing the vowel lengths of the eight stressed syllable vowels among the Korean college students with the English native speakers. To do this English sentences were uttered and recorded by twenty Korean subjects. Acoustic features were measured from a sound spectrogram with the help of the Praat software program and analyzed through statistical analysis. From the results of the experiment, I was able to find out that the differences of the lengths of the first syllable stressed vowels were significant. Especially in the pronunciation of the English front low vowel /${\ae}$/, native subjects pronounced significantly longer than Korean subjects, and this result could be used as a teaching material in pronunciation class.

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

A High Speed Data Acquisition System using FPGA for Filter Bank System in Radio Telescope. (EPGA를 이용한 전파망원경 필터뱅크의 고속 데이터 취득시스템 개발)

  • 위석오;이창훈;김효령;김광동
    • Proceedings of the IEEK Conference
    • /
    • 2003.07c
    • /
    • pp.2681-2684
    • /
    • 2003
  • 본 연구에서는 전파천문학에 있어서 스펙트로그램을 얻기 위한 장치인 필터뱅크의 고속 데이터 취득에 관한것이다. 여기서는 FPGA를 기반으로 데이터 취득시스템을 설계하였는데, 기존의 모노리틱 IC 를 기반으로 설계된 데이터 I/O 를 FPGA 로 대체함으로써 부피를 적게하고 데이터의 고속처리를 가능하게 하였다. 우주현상을 관측함에 있어 고속으로 데이터를 처리함은 대기중의 불안정한 상태나 시스템의 불안정에 의한 좋지 않은 데이터를 정확히 선택하여 제거할 수 있는 데이터 시간 분활이 가능하게 한다. 본 논문에서 개발된 시스템을 적용하여 기존 시스템에 비하여 약 15 배 정도의 고속 데이터 처리가 가능하게 되었다.

  • PDF

The Pitch detection of 3 Level Clipping Algorithm using by Pre-Post Processing (전.후 처리를 이용한 3 레벨 클리핑 알고리즘의 피치검출)

  • 최승영
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.167-170
    • /
    • 1998
  • 음성신호의 특징적인 성분인 피치를 검출하는 알고리즘 중 실시산 구현이 손쉬운 3단계를 클리핑 알고리즘을 PC상에서의 처리를 위하여 구현하였다. 이 알고리즘을 통하여 검출되는 피치의 안정성 및 정확성을 높이기 위해서 적용된 창함수, LPF, 클리핑 자기상관값계산, 비선형 감쇄, 등의 전처리 필터링과, 배수피치 검출 및 정정, 메디언 필터링을 사용하여 피치를 검출하였다. 또한 이 알고리즘을 이용하여 DSP의 도움을 얻지 않고 PC상에서 음성을 분석하여 스펙트로그램, 파형, 에너지, 피치 등을 출력하는 프로그램인 Visual Analysis Tool for sounds(VAT)의 출력화면을 통하여 피치검출을 나타내었다.

  • PDF