• Title/Summary/Keyword: 음악음향학

Search Result 107, Processing Time 0.021 seconds

Automatic Indexing Algorithm of Golf Video Using Audio Information (오디오 정보를 이용한 골프 동영상 자동 색인 알고리즘)

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.441-446
    • /
    • 2009
  • This paper proposes an automatic indexing algorithm of golf video using audio information. In the proposed algorithm, the input audio stream is demultiplexed into the stream of video and audio. By means of Adaboost-cascade classifier, the continuous audio stream is classified into announcer's speech segment recorded in studio, music segment accompanied with players' names on TV screen, reaction segment of audience according to the play, reporter's speech segment with field background, filed noise segment like wind or waves. And golf swing sound including drive shot, iron shot, and putting shot is detected by the method of impulse onset detection and modulation spectrum verification. The detected swing and applause are used effectively to index action or highlight unit. Compared with video based semantic analysis, main advantage of the proposed system is its small computation requirement so that it facilitates to apply the technology to embedded consumer electronic devices for fast browsing.

MPEG-D USAC: Unified Speech and Audio Coding Technology (MPEG-D USAC: 통합 음성 오디오 부호화 기술)

  • Lee, Tae-Jin;Kang, Kyeong-Ok;Kim, Whan-Woo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.589-598
    • /
    • 2009
  • As mobile devices become multi-functional, and converge into a single platform, there is a strong need for a codec that is able to provide consistent quality for speech and music content MPEG-D USAC standardization activities started at the 82nd MPEG meeting with a CfP and approved WD3 at the 88th MPEG meeting. MPEG-D USAC is converged technology of AMR-WB+ and HE-AAC V2. Specifically, USAC utilizes three core codecs (AAC ACELP and TCX) for low frequency regions, SBR for high frequency regions and the MPEG Surround tool for stereo information. USAC can provide consistent sound quality for both speech and music content and can be applied to various applications such as multi-media download to mobile device Digital radio Mobile TV and audio books.

Mask Estimation Based on Band-Independent Bayesian Classifler for Missing-Feature Reconstruction (Missing-Feature 복구를 위한 대역 독립 방식의 베이시안 분류기 기반 마스크 예측 기법)

  • Kim Wooil;Stern Richard M.;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.78-87
    • /
    • 2006
  • In this paper. we propose an effective mask estimation scheme for missing-feature reconstruction in order to achieve robust speech recognition under unknown noise environments. In the previous work. colored noise is used for training the mask classifer, which is generated from the entire frequency Partitioned signals. However it gives a limited performance under the restricted number of training database. To reflect the spectral events of more various background noise and improve the performance simultaneously. a new Bayesian classifier for mask estimation is proposed, which works independent of other frequency bands. In the proposed method, we employ the colored noise which is obtained by combining colored noises generated from each frequency band in order to reflect more various noise environments and mitigate the 'sparse' database problem. Combined with the cluster-based missing-feature reconstruction. the performance of the proposed method is evaluated on a task of noisy speech recognition. The results show that the proposed method has improved performance compared to the Previous method under white noise. car noise and background music conditions.

Effect of noise and reverberation on subjective measure of speech transmission performance for elderly person with hearing loss in residential space (주거 공간에서 고령자 청력손실을 고려한 소음 및 잔향에 따른 음성 전송 성능의 주관적 평가)

  • Oh, Yang Ki;Ryu, Jong-Kwan;Song, Han-Sol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.5
    • /
    • pp.369-377
    • /
    • 2018
  • This study investigated the effect of noise and reverberation on subjective measure of speech transmission performance for elderly person with hearing loss in residential space through listening test. Floor impact, road traffic, airborne, and drainage noise were employed as the residential noise, and several impulse responses were obtained through room acoustical computer simulation for an apartment building. Sound sources for the listening test consisted of residential noises and speech sounds for boh the young (the original sound) and the aged (the sound filtered out by filters with frequency responses of hearing loss of 65 years elderly person). In the listening test, subjects evaluated speech intelligibility and listening difficulty for the presented word ($L_{Aeq}$ 55 dB) at three noise levels ($L_{Aeq}$ 30, 40, 50 dB) and three reverberation times (0.5, 1.0, 1.5 s). Results showed that the residential space with noise level lower than equal to 50 dB ($L_{i,Fmax,AW}$) for jumping noise and 40 dB ($L_{Aeq}$) for road traffic, airborne, and drainage noise had speech intelligibility of 90 % and over and listening difficulty of 30 % and below. Speech intelligibility and listening difficulty for the aged sound source was shown to be 0 % ~ 5 % lower and 2 % ~ 20 % higher than those for the young sound source, respectively.

Analysis of auditory temporal processing in within- and cross-channel gap detection thresholds for low-frequency pure tones (저주파수 순음에 대한 within- 및 cross-channel gap detectin thresholds를 이용한 auditory temporal processing 특성 연구)

  • Koo, Sungmin;Lim, Dukhwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.58-63
    • /
    • 2022
  • This study was conducted to examine the characteristics of pitch perception and temporal resolution through Within-/Cross-Channel Gap Detection Thresholds (WC/CC GDTs) using low-frequency pure tones (such as 264 Hz, 373 Hz and 528 Hz related to C4, C4#, and C5 musical tones. 40 young people and 20 elderly people with normal hearing participated in this study. The results of WC GDTs were approximately 2 ms ~ 4 ms threshold values regardless of frequencies in two groups. There was no statistically significant difference in WC GDTs between groups. In both groups, CC GDTs were larger than WC GDTs, and as the frequency difference increased, the CC GDTs also increased. In particular, in the comparison between groups of CC GDTs, the results of the elderly group were 8 times ~ 10 times larger than that of the young group, and there was a statistically significant difference between the groups. These data also showed a different trend of GDTs in comparison with the previous data obtained from musical stimuli.This study suggests that GDTs may influence pitch perception mechanisms and can be used as psychoacoustic evidence for nonlinear responses of auditory nervous system.

A study on combination of loss functions for effective mask-based speech enhancement in noisy environments (잡음 환경에 효과적인 마스크 기반 음성 향상을 위한 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.234-240
    • /
    • 2021
  • In this paper, the mask-based speech enhancement is improved for effective speech recognition in noise environments. In the mask-based speech enhancement, enhanced spectrum is obtained by multiplying the noisy speech spectrum by the mask. The VoiceFilter (VF) model is used as the mask estimation, and the Spectrogram Inpainting (SI) technique is used to remove residual noise of enhanced spectrum. In this paper, we propose a combined loss to further improve speech enhancement. In order to effectively remove the residual noise in the speech, the positive part of the Triplet loss is used with the component loss. For the experiment TIMIT database is re-constructed using NOISEX92 noise and background music samples with various Signal to Noise Ratio (SNR) conditions. Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) are used as the metrics of performance evaluation. When the VF was trained with the mean squared error and the SI model was trained with the combined loss, SDR, PESQ, and STOI were improved by 0.5, 0.06, and 0.002 respectively compared to the system trained only with the mean squared error.

A Study of Sound Expression in Webtoon (웹툰의 사운드 표현에 관한 연구)

  • Mok, Hae Jung
    • Cartoon and Animation Studies
    • /
    • s.36
    • /
    • pp.469-491
    • /
    • 2014
  • Webtoon has developed the method that makes it possible to express sound visually. Also we can also hear sound in webtoon through the development of web technology. It is natural that we analyze the sound that we can hear, but we can also analyze the sound that we can not hear. This study is based on 'dual code' in cognitive psychology. Cartoonists can make visual expression on the basis of auditive impression and memory, and readers can recall the sound through the process of memory and memory-retrieval. This study analyzes both audible sound and inaudable sound. Concise analysis owes the method to film sound theory. Three main factor, Volume, pitch, and tone are recognized by frequency in acoustics. On the other hand they are expressed by the thickness and site of line and image of sound source. The visual expression of in screen sound and off screen sound is related to the frame of comics. Generally the outside of frame means off sound, but some off sound is in the frame. In addition, horror comics use much sound for the effect of genre like horror film. When analyzing comics sound using this kinds of the method film sound analysis, we can find that webtoon has developed creative expression method comparing with simple ones of early comics. Especially arranging frames and expressing sound following and vertical moving are new ones in webtoon. Also types and arrangement of frame has been varied. BGM is the first in using audible sound and recently BGM composed mixing sound effect is being used. In addition, the program which makes it possible for readers to hear sound according to scroll moving. Especially horror genre raise the genre effects using this technology. Various methods of visualizing sound are being created, and the change shows that webtoon could be the model of convergence in contents.