• Title/Summary/Keyword: spectrogram

Search Result 241, Processing Time 0.03 seconds

Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern (음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할)

  • Park, Chang-Mok;Wang, Gi-Nam
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.10a
    • /
    • pp.801-804
    • /
    • 2001
  • This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.

  • PDF

Performance change of defect classification model of rotating machinery according to noise addition and denoising process (노이즈 추가와 디노이징 처리에 따른 회전 기계설비의 결함 분류 모델 성능 변화)

  • Se-Hoon Lee;Sung-Soo Kim;Bi-gun Cho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.1-2
    • /
    • 2023
  • 본 연구는 환경 요인이 통제되어 있는 실험실 데이터에 산업 현장에서 발생하는 유사 잡음을 노이즈로 추가하였을 때, SNR비에 따른 노이즈별 STFT Log Spectrogram, Mel-Spectrogram, CWT Spectrogram 총 3가지의 이미지를 생성하고, 각 이미지를 입력으로 한 CNN 결함 분류 모델의 성능 결과를 확인하였다. 원본 데이터의 영향력이 큰 0db 이상의 SNR비로 합성할 경우 원본 데이터와 분류 결과상 큰 차이가 존재하지 않았으며, 노이즈 데이터의 영향이 큰 0db 이하의 SNR비로 합성할 경우, -20db의 STFT 이미지 기준 약 26%의 성능 저하가 발생하였다. 또한, Wiener Filtering을 통한 디노이징 처리 이후, 노이즈를 효과적으로 제거하여 분류 성능의 결과가 높아지는 점을 확인하였다.

  • PDF

A Study on the Correlation between Sound Spectrogram and Sasang Constitution (성문(聲紋)과 사상체질(四象體質)과의 상관성(相關性)에 관(關)한 연구(硏究))

  • Yang, Seung-hyun;Kim, Dal Lae
    • Journal of Sasang Constitutional Medicine
    • /
    • v.8 no.2
    • /
    • pp.191-202
    • /
    • 1996
  • Sasang constitution classification is very important subject, so many medical men studied the Sasang constitution classification but there is no certain method to classify objectively. And the purpose of this study is to help classifying Sasang constitution through correlation with sound spectrogram. This study was done it under the suppose that Sasang costitution hag correlation with sound spectrogram. The following results were obtained about correlation between sound spectrogram and Sasang constitution by comparison and analysis the pitch and reading speed of Sasang constitutions; 1. There was a similar tendency in the composition reading speed between taeeumin, soeumin and soyangin. 2. Taeeumin's center was lower measured more than soeumin's and soyangin's in the pitch graph and graph by normal curve fit and there was a similar tendency between soeumin and soyangin. 3. There was a similar tendency in the pitch graph's width between all constitutions. 4. There was a significant difference between taeeumin and soeum in the mean of three constitution's pitch, this means that taeeumin uses lower voice more than soeumin. According to the results, it is considered that there is a correlation between pitch of sound spectrogram and Sasang constitution. And method of Sasang constitution classification through sound spectrogram analysis can be one method as assistant for the objectification of Sasang constitution classification.

  • PDF

Watermarking System That Inserts Copyright Holder′s Logo (저작권자의 로고를 워터 마킹하는 장치)

  • 남상엽;이천우;김형배;이상원;박인정
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1487-1490
    • /
    • 2003
  • This paper shows the watermarking system that inserts copyright holder's logo in music file. In other words, a sound file is able to have an image information like a logo or letters. The watermarking system converts a sound file into an image file using spectrogram. In the spectrogram domain, a logo is inserted using spread spectrum. The proposed technique shows that the verification of copyright is better than the method using PN-Sequence.

  • PDF

Objective Evaluation of Vehicle Interior Noise in Transient Operation (주행중 차실 내부 소음의 평가)

  • Jeong, Hyuk;Ih, Jeong-Guon
    • Journal of KSNVE
    • /
    • v.6 no.4
    • /
    • pp.499-502
    • /
    • 1996
  • Interior noise, engine speed and vehicle speed are measured under transient road-load condition and interior noise signal is transformed by using the transient signal analysis methods, such as the spectrogram and wavelet transform. Using the analyzed results, subjective noise metrics such as the loudness, sharpness and articulation index at each vehicle speed can be estimated and characteristics of interior noise for various running modes can be discussed in the viewpoint of noise quality.

  • PDF

Eddy Current Testing for Radiator Tubes Surrounded by Cooling Fins

  • Nagata, Shoichiro;Tsubusa, Yoshiaki;Enokizono, Masato
    • Journal of Magnetics
    • /
    • v.16 no.3
    • /
    • pp.276-280
    • /
    • 2011
  • This paper presents a non-destructive evaluation study on a radiator with cooling fins as a complex shaped specimen. Radiator structures are used in various heat exchangers, such as automobiles, air conditioners and refrigerators. An eddy current testing method, namely multi-frequency excitation and spectrogram method (MFES), was employed to detect a defect on the radiator tube surrounded by cooling fins. Overall, experimental results suggested that the influence of cooling fin is not as noticeable as that of the defect signals.

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

Experimental Study on Estimation of Flight Trajectory Using Ground Reflection and Comparison of Spectrogram and Cepstrogram Methods (지면 반사효과를 이용한 비행 궤적 추정에 대한 실험적 연구와 스펙트로그램 및 캡스트로그램 방법 비교)

  • Jung, Ookjin;Go, Yeong-Ju;Lee, Jaehyung;Choi, Jong-Soo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.18 no.2
    • /
    • pp.115-124
    • /
    • 2015
  • A methodology is proposed to estimate a trajectory of a flying target and its velocity using the time and frequency analysis of the acoustic signal. The measurement of sound emitted from a flying acoustic source with a microphone above a ground shall receive both direct and ground-reflected sound waves. For certain frequency contents, the destructive interference happens in received signal waveform reflected path lengths are in multiple integers of direct path length. This phenomenon is referred to as the acoustical mirror effect and it can be observed in a spectrogram plot. The spectrogram of acoustic measurement for a flying vehicle measurement shows several orders of destructive interference curves. The first or second order of curve is used to find the best approximate path by using nonlinear least-square method. Simulated acoustic signal is generated for the condition of known geometric of a sensor and a source in flight. The estimation based on cepstrogram analysis provides more accurate estimate than spectrogram.

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network (스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류)

  • Yun, Ho-Won;Shin, Seong-Hyeon;Jang, Woo-Jin;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.21 no.6
    • /
    • pp.977-985
    • /
    • 2016
  • In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.

Text-to-speech with linear spectrogram prediction for quality and speed improvement (음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech)

  • Yoon, Hyebin
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.71-78
    • /
    • 2021
  • Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.