• Title/Summary/Keyword: Sound classification

Search Result 300, Processing Time 0.022 seconds

Classification of Whale Sounds using LPC and Neural Networks (신경망과 LPC 계수를 이용한 고래 소리의 분류)

  • An, Woo-Jin;Lee, Eung-Jae;Kim, Nam-Gyu;Chong, Ui-Pil
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.18 no.2
    • /
    • pp.43-48
    • /
    • 2017
  • The underwater transients signals contain the characteristics of complexity, time varying, nonlinear, and short duration. So it is very hard to model for these signals with reference patterns. In this paper we separate the whole length of signals into some short duration of constant length with overlapping frame by frame. The 20th LPC(Linear Predictive Coding) coefficients are extracted from the original signals using Durbin algorithm and applied to neural network. The 65% of whole signals were learned and 35% of the signals were tested in the neural network with two hidden layers. The types of the whales for sound classification are Blue whale, Dulsae whale, Gray whale, Humpback whale, Minke whale, and Northern Right whale. Finally, we could obtain more than 83% of classification rate from the test signals.

  • PDF

Classification of bearded seals signal based on convolutional neural network (Convolutional neural network 기법을 이용한 턱수염물범 신호 판별)

  • Kim, Ji Seop;Yoon, Young Geul;Han, Dong-Gyun;La, Hyoung Sul;Choi, Jee Woong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.235-241
    • /
    • 2022
  • Several studies using Convolutional Neural Network (CNN) have been conducted to detect and classify the sounds of marine mammals in underwater acoustic data collected through passive acoustic monitoring. In this study, the possibility of automatic classification of bearded seal sounds was confirmed using a CNN model based on the underwater acoustic spectrogram images collected from August 2017 to August 2018 in East Siberian Sea. When only the clear seal sound was used as training dataset, overfitting due to memorization was occurred. By evaluating the entire training data by replacing some training data with data containing noise, it was confirmed that overfitting was prevented as the model was generalized more than before with accuracy (0.9743), precision (0.9783), recall (0.9520). As a result, the performance of the classification model for bearded seals signal has improved when the noise was included in the training data.

Sound event detection model using self-training based on noisy student model (잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지)

  • Kim, Nam Kyun;Park, Chang-Soo;Kim, Hong Kook;Hur, Jin Ook;Lim, Jeong Eun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.479-487
    • /
    • 2021
  • In this paper, we propose an Sound Event Detection (SED) model using self-training based on a noisy student model. The proposed SED model consists of two stages. In the first stage, a mean-teacher model based on an Residual Convolutional Recurrent Neural Network (RCRNN) is constructed to provide target labels regarding weakly labeled or unlabeled data. In the second stage, a self-training-based noisy student model is constructed by applying different noise types. That is, feature noises, such as time-frequency shift, mixup, SpecAugment, and dropout-based model noise are used here. In addition, a semi-supervised loss function is applied to train the noisy student model, which acts as label noise injection. The performance of the proposed SED model is evaluated on the validation set of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4. The experiments show that the single model and ensemble model of the proposed SED based on the noisy student model improve F1-score by 4.6 % and 3.4 % compared to the top-ranked model in DCASE 2020 challenge Task 4, respectively.

Sound Visualization based on Emotional Analysis of Musical Parameters (음악 구성요소의 감정 구조 분석에 기반 한 시각화 연구)

  • Kim, Hey-Ran;Song, Eun-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.6
    • /
    • pp.104-112
    • /
    • 2021
  • In this study, emotional analysis was conducted based on the basic attribute data of music and the emotional model in psychology, and the result was applied to the visualization rules in the formative arts. In the existing studies using musical parameter, there were many cases with more practical purposes to classify, search, and recommend music for people. In this study, the focus was on enabling sound data to be used as a material for creating artworks and used for aesthetic expression. In order to study the music visualization as an art form, a method that can include human emotions should be designed, which is the characteristics of the arts itself. Therefore, a well-structured basic classification of musical attributes and a classification system on emotions were provided. Also, through the shape, color, and animation of the visual elements, the visualization of the musical elements was performed by reflecting the subdivided input parameters based on emotions. This study can be used as basic data for artists who explore a field of music visualization, and the analysis method and work results for matching emotion-based music components and visualizations will be the basis for automated visualization by artificial intelligence in the future.

Development of On-line Sorting System for Detection of Infected Seed Potatoes Using Visible Near-Infrared Transmittance Spectral Technique (가시광 및 근적외선 투과분광법을 이용한 감염 씨감자 온라인 선별시스템 개발)

  • Kim, Dae Yong;Mo, Changyeun;Kang, Jun-Soon;Cho, Byoung-Kwan
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.35 no.1
    • /
    • pp.1-11
    • /
    • 2015
  • In this study, an online seed potato sorting system using a visible and near infrared (40 1100 nm) transmittance spectral technique and statistical model was evaluated for the nondestructive determination of infected and sound seed potatoes. Seed potatoes that had been artificially infected with Pectobacterium atrosepticum, which is known to cause a soil borne disease infection, were prepared for the experiments. After acquiring transmittance spectra from sound and infected seed potatoes, a determination algorithm for detecting infected seed potatoes was developed using the partial least square discriminant analysis method. The coefficient of determination($R^2_p$) of the prediction model was 0.943, and the classification accuracy was above 99% (n = 80) for discriminating diseased seed potatoes from sound ones. This online sorting system has good potential for developing a technique to detect agricultural products that are infected and contaminated by pathogens.

Portable Piezoelectric Film-based Glove Sensor System for Detecting Internal Defects of Watermelon (수박 내부결함판정을 위한 휴대형 압전형 장갑 센서시스템)

  • Choi, Dong-Soo;Lee, Young-Hee;Choi, Seung-Ryul;Kim, Hak-Jin;Park, Jong-Min;Kato, Koro
    • Journal of Biosystems Engineering
    • /
    • v.33 no.1
    • /
    • pp.30-37
    • /
    • 2008
  • Dynamic excitation and response analysis is an acceptable method to determine some of physical properties of agricultural product for quality evaluation. There is a difference in the internal viscoelasticity between sound and defective fruits due to the difference of geometric structures, thereby showing different vibration characteristics. This study was carried out to develop a portable piezoelectric film-based glove sensor system that can separate internally damaged watermelons from sound ones using an acoustic impulse response technique. Two piezoelectric sensors based on polyvinylidene fluoride (PVDF) films to measure an impact force and vibration response were separately mounted on each glove. Various signal parameters including number of peaks, energy ratio, standard deviation of peak to peak distance, zero-crossing rate, and integral value of peaks were examined to develop a regression-estimated model. When using SMLR (Stepwise Multiple Linear Regression) analysis in SAS, three parameters, i.e., zeros value, number of peaks, and standard deviation of peaks were selected as usable factors with a coefficient of determination ($r^2$) of 0.92 and a standard error of calibration (SEC) of 0.15. In the validation tests using twenty watermelon samples (sound 9, defective 11), the developed model provided good capability showing a classification accuracy of 95%.

An Study on the Correlation between Sound Characteristics and Sasang Constitution by CSL (CSL을 통한 음향특성과 사상체질간의 상관성 연구)

  • Shin, Mi-ran;Kim, Dal-lae
    • Journal of Sasang Constitutional Medicine
    • /
    • v.11 no.1
    • /
    • pp.137-157
    • /
    • 1999
  • The purpose of this study is to help classifying Sasang Constitution through correlation with sound characteristic. This study was done it under the suppose that Sasang Constitution has correlation with sound spectrogram. The following result were obtained about correlation between sound spectrogram and Sasang Constitution by comparison and analysis 1. Soeumin answered his voice low tone, smooth and quiet in the survey. Soyangin answered his voice high, clear, fast and speaking random. Taeumin answered his voice low, thick and muddy. 2. Taeyangin was significantly slow compared with the others in the time of reading composition. Taeyangin was significantly slow compared with the others in Formant frequency 1. Taeyangin was significantly discriminated from Soeumin in Formant frequency 5. Taeyangin was significantly low compared with the others in Bandwidth 2. Soeumln was significantly low compared with Taeyangin in Pitch Maximum and Pitch Maximum-Pitch Minimum. Taeyangin was significantly high compared with the others in Energy mean. 3. In list of specification, the discrimination rate was higher than that by lists of 13 in the results of Multi-dimensional 4-class minimum-distance. The discrimination rate of three disposition except Soyangin was higher than that of four disposition in the results of One way ANOVA and Analysis of dis crimination in SPSS/PC+. In CART, the estimate rate of Sasang Constitution discrimination was higher than any other method. It is considered that there is a correlation between sound spectrogram and Sasang constitution according to the results. And method of Sasang constitution classification through sound spectrogram analysis can be one method as assistant for the objectification of Sasang constitution classification.

  • PDF

A COVID-19 Diagnosis Model based on Various Transformations of Cough Sounds (기침 소리의 다양한 변환을 통한 코로나19 진단 모델)

  • Minkyung Kim;Gunwoo Kim;Keunho Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.57-78
    • /
    • 2023
  • COVID-19, which started in Wuhan, China in November 2019, spread beyond China in 2020 and spread worldwide in March 2020. It is important to prevent a highly contagious virus like COVID-19 in advance and to actively treat it when confirmed, but it is more important to identify the confirmed fact quickly and prevent its spread since it is a virus that spreads quickly. However, PCR test to check for infection is costly and time consuming, and self-kit test is also easy to access, but the cost of the kit is not easy to receive every time. Therefore, if it is possible to determine whether or not a person is positive for COVID-19 based on the sound of a cough so that anyone can use it easily, anyone can easily check whether or not they are confirmed at anytime, anywhere, and it can have great economic advantages. In this study, an experiment was conducted on a method to identify whether or not COVID-19 was confirmed based on a cough sound. Cough sound features were extracted through MFCC, Mel-Spectrogram, and spectral contrast. For the quality of cough sound, noisy data was deleted through SNR, and only the cough sound was extracted from the voice file through chunk. Since the objective is COVID-19 positive and negative classification, learning was performed through XGBoost, LightGBM, and FCNN algorithms, which are often used for classification, and the results were compared. Additionally, we conducted a comparative experiment on the performance of the model using multidimensional vectors obtained by converting cough sounds into both images and vectors. The experimental results showed that the LightGBM model utilizing features obtained by converting basic information about health status and cough sounds into multidimensional vectors through MFCC, Mel-Spectogram, Spectral contrast, and Spectrogram achieved the highest accuracy of 0.74.

Binary Tree Architecture Design for Support Vector Machine Using Dynamic Time Warping (DTW를 이용한 SVM 기반 이진트리 구조 설계)

  • Kang, Youn Joung;Lee, Jaeil;Bae, Jinho;Lee, Seung Woo;Lee, Chong Hyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.6
    • /
    • pp.201-208
    • /
    • 2014
  • In this paper, we propose the classifier structure design algorithm using DTW. Proposed algorithm uses DTW result to design the binary tree architecture based on the SVM which classify the multi-class data. Design the binary tree architecture for Support Vector Machine(SVM-BTA) using the threshold criterion calculated by the sum columns in square matrix which components are the reference data from each class. For comparison the performance of the proposed algorithm, compare the results of classifiers which binary tree structure are designed based on database and k-means algorithm. The data used for classification is 333 signals from 18 classes of underwater transient noise. The proposed classifier has been improved classification performance compared with classifier designed by database system, and probability of detection for non-biological transient signal has improved compare with classifiers using k-means algorithm. The proposed SVM-BTA classified 68.77% of biological sound(BO), 92.86% chain(CHAN) the mechanical sound, and 100% of the 6 kinds of the other classes.

Comparison of target classification accuracy according to the aspect angle and the bistatic angle in bistatic sonar (양상태 소나에서의 자세각과 양상태각에 따른 표적 식별 정확도 비교)

  • Choo, Yeon-Seong;Byun, Sung-Hoon;Choo, Youngmin;Choi, Giyung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.4
    • /
    • pp.330-336
    • /
    • 2021
  • In bistatic sonar operation, the scattering strength of a sonar target is characterized by the probe signal frequency, the aspect angle and the bistatic angle. Therefore, the target detection and identification performance of the bistatic sonar may vary depending on how the positions of the target, sound source, and receiver are changed during sonar operation. In this study, it was evaluated which variable is advantageous to change by comparing the target identification performance between the case of changing the aspect angle and the case of changing the bistatic angle during the operation. A scenario of identifying a hollow sphere and a cylinder was assumed, and performance was compared by classifying two targets with a support vector machine and comparing their accuracy using a finite element method-based acoustic scattering simulation. As a result of comparison, using the scattering strength defined by the frequency and the bistatic angle with the aspect angle fixed showed superior average classification accuracy. It means that moving the receiver to change the bistatic angle is more effective than moving the sound source to change the aspect angle for target identification.