• Title/Summary/Keyword: Spectrogram

Search Result 239, Processing Time 0.024 seconds

Preprocessing performance of convolutional neural networks according to characteristic of underwater targets (수중 표적 분류를 위한 합성곱 신경망의 전처리 성능 비교)

  • Kyung-Min, Park;Dooyoung, Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.6
    • /
    • pp.629-636
    • /
    • 2022
  • We present a preprocessing method for an underwater target detection model based on a convolutional neural network. The acoustic characteristics of the ship show ambiguous expression due to the strong signal power of the low frequency. To solve this problem, we combine feature preprocessing methods with various feature scaling methods and spectrogram methods. Define a simple convolutional neural network model and train it to measure preprocessing performance. Through experiment, we found that the combination of log Mel-spectrogram and standardization and robust scaling methods gave the best classification performance.

Infant cry recognition using a deep transfer learning method (딥 트랜스퍼 러닝 기반의 아기 울음소리 식별)

  • Bo, Zhao;Lee, Jonguk;Atif, Othmane;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.971-974
    • /
    • 2020
  • Infants express their physical and emotional needs to the outside world mainly through crying. However, most of parents find it challenging to understand the reason behind their babies' cries. Failure to correctly understand the cause of a baby' cry and take appropriate actions can affect the cognitive and motor development of newborns undergoing rapid brain development. In this paper, we propose an infant cry recognition system based on deep transfer learning to help parents identify crying babies' needs the same way a specialist would. The proposed system works by transforming the waveform of the cry signal into log-mel spectrogram, then uses the VGGish model pre-trained on AudioSet to extract a 128-dimensional feature vector from the spectrogram. Finally, a softmax function is used to classify the extracted feature vector and recognize the corresponding type of cry. The experimental results show that our method achieves a good performance exceeding 0.96 in precision and recall, and f1-score.

Analyzing performance of time series classification using STFT and time series imaging algorithms

  • Sung-Kyu Hong;Sang-Chul Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • In this paper, instead of using recurrent neural network, we compare a classification performance of time series imaging algorithms using convolution neural network. There are traditional algorithms that imaging time series data (e.g. GAF(Gramian Angular Field), MTF(Markov Transition Field), RP(Recurrence Plot)) in TSC(Time Series Classification) community. Furthermore, we compare STFT(Short Time Fourier Transform) algorithm that can acquire spectrogram that visualize feature of voice data. We experiment CNN's performance by adjusting hyper parameters of imaging algorithms. When evaluate with GunPoint dataset in UCR archive, STFT(Short-Time Fourier transform) has higher accuracy than other algorithms. GAF has 98~99% accuracy either, but there is a disadvantage that size of image is massive.

Classification of infant cries using 3D feature vectors (3D 특징 벡터를 이용한 영아 울음소리 분류)

  • Park, JeongHyeon;Kim, MinSeo;Choi, HyukSoon;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.597-599
    • /
    • 2022
  • 영아는 울음이라는 비언어적 의사 소통 방식을 사용하여 모든 욕구를 표현한다. 하지만 영아의 울음소리를 파악하는 것에는 어려움이 따른다. 영아의 울음소리를 해석하기 위해 많은 연구가 진행되었다. 이에 본 논문에서는 3D 특징 벡터를 이용한 영아의 울음소리 분류를 제안한다. Donate-a-corpus-cry 데이터 세트는 복통, 트림, 불편, 배고픔, 피곤으로 총 5 개의 클래스로 분류된 데이터를 사용한다. 데이터들은 원래 속도의 90%와 110%로 수정하는 방법인 템포조절을 통해 증강한다. Spectrogram, Mel-Spectrogram, MFCC 로 특징 벡터화를 시켜준 후, 각각의 2 차원 특징벡터를 묶어 3차원 특징벡터로 구성한다. 이후 3 차원 특징 벡터를 ResNet 과 EfficientNet 모델로 학습을 진행한다. 그 결과 2 차원 특징 벡터는 0.89(F1) 3 차원 특징 벡터의 경우 0.98(F1)으로 0.09 의 성능 향상을 보여주었다.

Real time instruction classification system

  • Sang-Hoon Lee;Dong-Jin Kwon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.3
    • /
    • pp.212-220
    • /
    • 2024
  • A recently the advancement of society, AI technology has made significant strides, especially in the fields of computer vision and voice recognition. This study introduces a system that leverages these technologies to recognize users through a camera and relay commands within a vehicle based on voice commands. The system uses the YOLO (You Only Look Once) machine learning algorithm, widely used for object and entity recognition, to identify specific users. For voice command recognition, a machine learning model based on spectrogram voice analysis is employed to identify specific commands. This design aims to enhance security and convenience by preventing unauthorized access to vehicles and IoT devices by anyone other than registered users. We converts camera input data into YOLO system inputs to determine if it is a person, Additionally, it collects voice data through a microphone embedded in the device or computer, converting it into time-domain spectrogram data to be used as input for the voice recognition machine learning system. The input camera image data and voice data undergo inference tasks through pre-trained models, enabling the recognition of simple commands within a limited space based on the inference results. This study demonstrates the feasibility of constructing a device management system within a confined space that enhances security and user convenience through a simple real-time system model. Finally our work aims to provide practical solutions in various application fields, such as smart homes and autonomous vehicles.

A COVID-19 Diagnosis Model based on Various Transformations of Cough Sounds (기침 소리의 다양한 변환을 통한 코로나19 진단 모델)

  • Minkyung Kim;Gunwoo Kim;Keunho Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.57-78
    • /
    • 2023
  • COVID-19, which started in Wuhan, China in November 2019, spread beyond China in 2020 and spread worldwide in March 2020. It is important to prevent a highly contagious virus like COVID-19 in advance and to actively treat it when confirmed, but it is more important to identify the confirmed fact quickly and prevent its spread since it is a virus that spreads quickly. However, PCR test to check for infection is costly and time consuming, and self-kit test is also easy to access, but the cost of the kit is not easy to receive every time. Therefore, if it is possible to determine whether or not a person is positive for COVID-19 based on the sound of a cough so that anyone can use it easily, anyone can easily check whether or not they are confirmed at anytime, anywhere, and it can have great economic advantages. In this study, an experiment was conducted on a method to identify whether or not COVID-19 was confirmed based on a cough sound. Cough sound features were extracted through MFCC, Mel-Spectrogram, and spectral contrast. For the quality of cough sound, noisy data was deleted through SNR, and only the cough sound was extracted from the voice file through chunk. Since the objective is COVID-19 positive and negative classification, learning was performed through XGBoost, LightGBM, and FCNN algorithms, which are often used for classification, and the results were compared. Additionally, we conducted a comparative experiment on the performance of the model using multidimensional vectors obtained by converting cough sounds into both images and vectors. The experimental results showed that the LightGBM model utilizing features obtained by converting basic information about health status and cough sounds into multidimensional vectors through MFCC, Mel-Spectogram, Spectral contrast, and Spectrogram achieved the highest accuracy of 0.74.

Effective brain-wave DB building system using the five senses stimulation (오감자극을 활용한 효율적인 뇌파 DB구축 시스템)

  • Shin, Jeong-Hoon;Jin, Sang-Hyeon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.8 no.4
    • /
    • pp.227-236
    • /
    • 2007
  • Ubiquitous systems have grown explosively over the few years. Nowadays users' needs for high qualify service lead a various type of user terminals. One of various type of user interface, various types of effective human computer interface methods have been developed. In many researches, researchers have focused on using brain-wave interface, that is to say, BCI. Nowadays, researches which are related to BCI are under way to find out effective methods. But, most researches which are related to BCI are not centralized and not systematic. These problems brought about ineffective results of researches. In most researches related in HCI, that is to say - pattern recognition, the most important foundation of the research is to build correct and sufficient DB. But there is no effective and reliable standard research conditions when researchers are gathering brain-wave in BCI. Subjects as well as researchers do not know effective methods for gathering DB. Researchers do not know how to instruct subjects and subjects also do not know how to follow researchers' instruction. To solve these kinds of problems, we propose effective brain-wave DB building system using the five senses stimulation. Researcher instructs the subject to use the five senses. Subjects imagine the instructed senses. It is also possible for researchers to distinguish whether brain-wave is right or not. In real time, researches verify gathered brain-wane data using spectrogram. To verify effectiveness of our proposed system, we analyze the spectrogram of gathered brain-wave DB and pattern. On the basis of spectrogram and pattern analysis, we propose an effective brain-wave DB building method using the five senses stimulation.

  • PDF

Impulse Response Filtration Technique for the Determination of Phase Velocities from SASW Measurements (SASW시험에 의한 위상속도 결정을 위한 임펄스 응답필터 기법)

  • ;Stokoe, K.H., Il
    • Geotechnical Engineering
    • /
    • v.13 no.1
    • /
    • pp.111-122
    • /
    • 1997
  • The calculation of phase velocities in Spectral-Analysis -of-Surface -Waves (SASW) meas urements requires unwrapping phase angles. In case of layered systems with strong stiffness contrast like a pavement system, conventional phase unwrapping algorithm to add in teger multiples of 2n to the principal value of a phase angle may lead to wrong phase volocities. This is because there is difficulty in counting the number of jumps in the phase spectrum especially at the receiver spacing where the measurements are in the transition Bone of defferent modes. A new phase interpretation scheme, called "Impulse Response Fil traction ( IRF) Technique," is proposed, which is based on the separation of wave groups by the filtration of the impulse response determinded between two receivers. The separation of a wave group is based on the impulse response filtered by using information from Gabor spectrogram, which visualizes the propagation of wave groups at the frequency -time space. The filtered impulse response leads to clear interpretation of phase spectrum, which eliminates difficulty in counting number of jumps in the phase spectrum. Verification of the IRF technique was performed by theoretical simulation of the SASW measurement on a pavement system which complicates wave propagation.opagation.

  • PDF

Underwater Target Localization Using the Interference Pattern of Broadband Spectrogram Estimated by Three Sensors (3개 센서의 광대역 신호 스펙트로그램에 나타나는 간섭패턴을 이용한 수중 표적의 위치 추정)

  • Kim, Se-Young;Chun, Seung-Yong;Kim, Ki-Man
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.173-181
    • /
    • 2007
  • In this paper, we propose a moving target localization algorithm using acoustic spectrograms. A time-versus-frequency spectrogram provide a information of trajectory of the moving target in underwater. For a source at sufficiently long range from a receiver, broadband striation patterns seen in spectrogram represents the mutual interference between modes which reflected by surface and bottom. The slope of the maximum intensity striation is influenced by waveguide invariant parameter ${\beta}$ and distance between target and sensor. When more than two sensors are applied to measure the moving ship-radited noise, the slope and frequency of the maximum intensity striation are depend on distance between target and receiver. We assumed two sensors to fixed point then form a circle of apollonios which set of all points whose distances from two fixed points are in a constant ratio. In case of three sensors are applied, two circle form an intersection point so coordinates of this point can be estimated as a position of target. To evaluates a performance of the proposed localization algorithm, simulation is performed using acoustic propagation program.

Objective Evaluation of Vehicle Interior Noise in Operation (주행중 차실 내부 소음의 평가)

  • Jeong, Hyuk;Ih, Jeong-Guon
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 1996.04a
    • /
    • pp.47-52
    • /
    • 1996
  • Interior noise, engine speed and vehicle speed are measured under road-load condition and interior noise signal is transformed by using the transient signal analysis methods such as the spectrogram and wavelet transform. Using the analyzed results, subjective noise criteria such as the loudness, noisiness and articulation index at each vehicle speed can be estimated and characteristics of interior noise for various running mode can be discussed in the viewpoint of noise quality.

  • PDF