Search | Korea Science

Preprocessing performance of convolutional neural networks according to characteristic of underwater targets (수중 표적 분류를 위한 합성곱 신경망의 전처리 성능 비교)

Kyung-Min, Park;Dooyoung, Kim
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.6
- /
- pp.629-636
- /
- 2022
We present a preprocessing method for an underwater target detection model based on a convolutional neural network. The acoustic characteristics of the ship show ambiguous expression due to the strong signal power of the low frequency. To solve this problem, we combine feature preprocessing methods with various feature scaling methods and spectrogram methods. Define a simple convolutional neural network model and train it to measure preprocessing performance. Through experiment, we found that the combination of log Mel-spectrogram and standardization and robust scaling methods gave the best classification performance.
https://doi.org/10.7776/ASK.2022.41.6.629 인용 PDF KSCI

Porcine Wasting Diseases Detection using Light Weight Deep Learning (경량 딥러닝 기반의 돼지 호흡기 질병 탐지)

Hong, Minki;Ahn, Hanse;Lee, Jonguk;Park, Daihee;Chung, Yongwha
- Proceedings of the Korea Information Processing Society Conference
- /
- 2020.11a
- /
- pp.964-966
- /
- 2020
전염성이 매우 강한 돼지 호흡기 질병을 빠른 시간 내에 정확하게 탐지하지 못한다면 해당 돈사는 물론 타지역으로 전파되어 심각한 경제적 손실이 발생한다. 본 논문은 이와 같은 돼지 호흡기 질병을 저가격의 임베디드 보드에서도 탐지가 가능한 시스템을 제안한다. 해당 시스템은 돈사에 설치한 소리센서로부터 돼지의 이상 소리를 자동으로 탐지한 후, 탐지한 소리 시그널을 스펙트로그램으로 변환한다. 마지막으로, 스펙트로그램은 딥러닝 알고리즘에 적용되어 돼지 호흡기 질병을 탐지 및 식별한다. 이 때, 일반 컴퓨터 환경에 비해 비용 부담이 적은 임베디드 환경에서 실행되기 위하여 경량 딥러닝 모델인 MnasNet 을 사용하였으며, 임베디드 보드인 NVIDIA TX-2 에서 해당 시스템의 호흡기 질병 식별 성능을 확인한 결과 높은 탐지 성능과 실시간 탐지가 가능함을 확인하였다.
https://doi.org/10.3745/PKIPS.y2020m11a.964 인용 PDF

Passive sonar signal classification using graph neural network based on image patch (영상 패치 기반 그래프 신경망을 이용한 수동소나 신호분류)

Guhn Hyeok Ko;Kibae Lee;Chong Hyun Lee
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.234-242
- /
- 2024
We propose a passive sonar signal classification algorithm using Graph Neural Network (GNN). The proposed algorithm segments spectrograms into image patches and represents graphs through connections between adjacent image patches. Subsequently, Graph Convolutional Network (GCN) is trained using the represented graphs to classify signals. In experiments with publicly available underwater acoustic data, the proposed algorithm represents the line frequency features of spectrograms in graph form, achieving an impressive classification accuracy of 92.50 %. This result demonstrates a 8.15 % higher classification accuracy compared to conventional Convolutional Neural Network (CNN).
https://doi.org/10.7776/ASK.2024.43.2.234 인용 PDF

A Study on the English Pronunciation for English-related Industry (교육산업 활성화를 위한 영어발음 연구)

Park, Hee-Suk
- Journal of Convergence for Information Technology
- /
- v.8 no.1
- /
- pp.37-42
- /
- 2018
This study focuses on investigating and comparing the lengths of the five words, vowels, and the ratio of the length of vowels to that of words among the Korean college students with the English native speaker. English sentences were read and recorded by Korean subjects to do this experiment. The vowel lengths were measured from a sound spectrogram, the Praat software program, and these data were analyzed through statistical analysis. I could easily tell that there were differences between the groups and they were significant. In the English front low vowel /${\ae}$/, I was able to find out that native subjects pronounced differently from Korean subjects, and the differences were significant. However, the pronunciation of the English diphthong /ai/, native subjects pronounced significantly shorter than Korean subjects.
https://doi.org/10.22156/CS4SMB.2018.8.1.037 인용 PDF KSCI

Principal component analysis based frequency-time feature extraction for seismic wave classification (지진파 분류를 위한 주성분 기반 주파수-시간 특징 추출)

Min, Jeongki;Kim, Gwantea;Ku, Bonhwa;Lee, Jimin;Ahn, Jaekwang;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.6
- /
- pp.687-696
- /
- 2019
Conventional feature of seismic classification focuses on strong seismic classification, while it is not suitable for classifying micro-seismic waves. We propose a feature extraction method based on histogram and Principal Component Analysis (PCA) in frequency-time space suitable for classifying seismic waves including strong, micro, and artificial seismic waves, as well as noise classification. The proposed method essentially employs histogram and PCA based features by concatenating the frequency and time information for binary classification which consist strong-micro-artificial/noise and micro/noise and micro/artificial seismic waves. Based on the recent earthquake data from 2017 to 2018, effectiveness of the proposed feature extraction method is demonstrated by comparing it with existing methods.
https://doi.org/10.7776/ASK.2019.38.6.687 인용 PDF KSCI

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

KwanYeol Park;Il-Youp Kwak
- The Korean Journal of Applied Statistics
- /
- v.36 no.2
- /
- pp.101-114
- /
- 2023
The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.
https://doi.org/10.5351/KJAS.2023.36.2.101 인용 PDF

Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network (CNN 기반 스펙트로그램을 이용한 자유발화 음성감정인식)

Guiyoung Son;Soonil Kwon
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.6
- /
- pp.284-290
- /
- 2024
Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.
https://doi.org/10.3745/TKIPS.2024.13.6.284 인용 PDF

A Study on the Foreign Accent of English Stressed Syllables (영어강세음절의 외국인어투에 관한 연구)

Park, Hee-Suk
- Journal of Convergence Society for SMB
- /
- v.6 no.4
- /
- pp.51-57
- /
- 2016
This study aims at investigating and comparing the vowel lengths of the eight stressed syllable vowels among the Korean college students with the English native speakers. To do this English sentences were uttered and recorded by twenty Korean subjects. Acoustic features were measured from a sound spectrogram with the help of the Praat software program and analyzed through statistical analysis. From the results of the experiment, I was able to find out that the differences of the lengths of the first syllable stressed vowels were significant. Especially in the pronunciation of the English front low vowel /${\ae}$/, native subjects pronounced significantly longer than Korean subjects, and this result could be used as a teaching material in pronunciation class.
https://doi.org/10.22156/CS4SMB.2016.6.4.051 인용 PDF

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

Oh, Wongeun
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.3
- /
- pp.143-149
- /
- 2020
This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.
https://doi.org/10.7776/ASK.2020.39.3.143 인용 PDF KSCI

A High Speed Data Acquisition System using FPGA for Filter Bank System in Radio Telescope. (EPGA를 이용한 전파망원경 필터뱅크의 고속 데이터 취득시스템 개발)

위석오;이창훈;김효령;김광동
- Proceedings of the IEEK Conference
- /
- 2003.07c
- /
- pp.2681-2684
- /
- 2003
본 연구에서는 전파천문학에 있어서 스펙트로그램을 얻기 위한 장치인 필터뱅크의 고속 데이터 취득에 관한것이다. 여기서는 FPGA를 기반으로 데이터 취득시스템을 설계하였는데, 기존의 모노리틱 IC 를 기반으로 설계된 데이터 I/O 를 FPGA 로 대체함으로써 부피를 적게하고 데이터의 고속처리를 가능하게 하였다. 우주현상을 관측함에 있어 고속으로 데이터를 처리함은 대기중의 불안정한 상태나 시스템의 불안정에 의한 좋지 않은 데이터를 정확히 선택하여 제거할 수 있는 데이터 시간 분활이 가능하게 한다. 본 논문에서 개발된 시스템을 적용하여 기존 시스템에 비하여 약 15 배 정도의 고속 데이터 처리가 가능하게 되었다.
PDF

Search Result 136, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)