• Title/Summary/Keyword: Non-speech Audio

Search Result 21, Processing Time 0.023 seconds

Speech Enhancement for Voice commander in Car environment (차량환경에서 음성명령어기 사용을 위한 음성개선방법)

  • 백승권;한민수;남승현;이봉호;함영권
    • Journal of Broadcast Engineering
    • /
    • v.9 no.1
    • /
    • pp.9-16
    • /
    • 2004
  • In this paper, we present a speech enhancement method as a pre-processor for voice commander under car environment. For the friendly and safe use of voice commander in a running car, non-stationary audio signals such as music and non-candidate speech should be reduced. Ow technique is a two microphone-based one. It consists of two parts Blind Source Separation (BSS) and Kalman filtering. Firstly, BSS is operated as a spatial filter to deal with non-stationary signals and then car noise is reduced by kalman filtering as a temporal filter. Algorithm Performance is tested for speech recognition. And the results show that our two microphone-based technique can be a good candidate to a voice commander.

A Study of Automatic Detection of Music Signal from Broadcasting Audio Signal (방송 오디오 신호로부터 음악 신호 검출에 관한 연구)

  • Yoon, Won-Jung;Park, Kyu-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.81-88
    • /
    • 2010
  • In this paper, we proposed an automatic music/non-music signal discrimination system from broadcasting audio signal as a preliminary study of building a sound source monitoring system in real broadcasting environment. By reflecting human speech articulation characteristics, we used three simple time-domain features such as energy standard deviation, log energy standard deviation and log energy mean. Based on the experimental threshold values of each feature, we developed a rule-based algorithm to classify music portion of the input audio signal. For the verification of the proposed algorithm, actual FM broadcasting signal was recorded for 24 hours and used as source input audio signal. From the experimental results, the proposed system can effectively recognize music section with the accuracy of 96% and non-music section with that of 87%, where the performance is good enough to be used as a pre-process module for the a sound source monitoring system.

Intelligibility Enhancement of Multimedia Contents Using Spectral Shaping (스펙트럼 성형기법을 이용한 멀티미디어 콘텐츠의 명료도 향상)

  • Ji, Youna;Park, Young-cheol;Hwang, Young-su
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.82-88
    • /
    • 2016
  • In this paper, we propose an intelligibility enhancement algorithm for multimedia contents using spectral shaping. The dialogue signals is essential to understand the plot of audio-visual media contents such as movie and TV. However, the non-dialogue components as like sound effects and background music often degrade the dialogue clarity. To overcome this problem, this paper tries to improves the dialogue clarity of audio soundtracks which contain important cues for the visual scenes. In the proposed method, the dialogue components are first detected by soft masker based on speech presence probability (SPP) which is widely used in speech enhancement field. Then, extracted dialogue signals are applied to the spectral shaping method. It reallocate the spectral-temporal energy of speech to enhanced the intelligibility. The total energy is maintained as unchanged via a loudness normalization process to prevent saturation. The algorithm was evaluated using the modeled and real movie soundtracks and it was shown that the proposed algorithm enhances the dialogue clarity while preserving the total audio power.

Frequency Band Selection Exited Linear Prediction Wideband Speech/Audio Coding Using SBR (SBR을 이용한 주파수 밴드선택 여기 선형예측 광대역 음성/오디오 부호화)

  • Jang, Sunghoon;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.556-562
    • /
    • 2013
  • This paper is aimed to improve performance of Band-Selection speech/audio Coder reconstucted band spectrum that is not sent by the comfort noise. To improve the performance, we use the Spectral Band Replication(SBR) technique instead of substitution of Comfort noise. To synthesize SBR signal, the SBR algorithm is referenced in selected signals and the spectrum synthesized by SBR is injected to non-selected band. Each sub-band spectrum has been energy-weighted by real audio signal. We propose the enhanced the Band-Selection Coder that utilizes synthesized SBR signal from selected signal instead of comfort noise.

Classification of Phornographic Videos Based on the Audio Information (오디오 신호에 기반한 음란 동영상 판별)

  • Kim, Bong-Wan;Choi, Dae-Lim;Lee, Yong-Ju
    • MALSORI
    • /
    • no.63
    • /
    • pp.139-151
    • /
    • 2007
  • As the Internet becomes prevalent in our lives, harmful contents, such as phornographic videos, have been increasing on the Internet, which has become a very serious problem. To prevent such an event, there are many filtering systems mainly based on the keyword-or image-based methods. The main purpose of this paper is to devise a system that classifies pornographic videos based on the audio information. We use the mel-cepstrum modulation energy (MCME) which is a modulation energy calculated on the time trajectory of the mel-frequency cepstral coefficients (MFCC) as well as the MFCC as the feature vector. For the classifier, we use the well-known Gaussian mixture model (GMM). The experimental results showed that the proposed system effectively classified 98.3% of pornographic data and 99.8% of non-pornographic data. We expect the proposed method can be applied to the more accurate classification system which uses both video and audio information.

  • PDF

Auto Frame Extraction Method for Video Cartooning System (동영상 카투닝 시스템을 위한 자동 프레임 추출 기법)

  • Kim, Dae-Jin;Koo, Ddeo-Ol-Ra
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.28-39
    • /
    • 2011
  • While the broadband multimedia technologies have been developing, the commercial market of digital contents has also been widely spreading. Most of all, digital cartoon market like internet cartoon has been rapidly large so video cartooning continuously has been researched because of lack and variety of cartoon. Until now, video cartooning system has been focused in non-photorealistic rendering and word balloon. But the meaningful frame extraction must take priority for cartooning system when applying in service. In this paper, we propose new automatic frame extraction method for video cartooning system. At frist, we separate video and audio from movie and extract features parameter like MFCC and ZCR from audio data. Audio signal is classified to speech, music and speech+music comparing with already trained audio data using GMM distributor. So we can set speech area. In the video case, we extract frame using general scene change detection method like histogram method and extract meaningful frames in the cartoon using face detection among the already extracted frames. After that, first of all existent face within speech area image transition frame extract automatically. Suitable frame about movie cartooning automatically extract that extraction image transition frame at continuable period of time domain.

Fillers in the Hong Kong Corpus of Spoken English (HKCSE)

  • Seto, Andy
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.13-22
    • /
    • 2021
  • The present study employed an analytical framework that is characterised by a synthesis of quantitative and qualitative analyses with a specially designed computer software SpeechActConc to examine speech acts in business communication. The naturally occurring data from the audio recordings and the prosodic transcriptions of the business sub-corpora of the HKCSE (prosodic) are manually annotated with a speech act taxonomy for finding out the frequency of fillers, the co-occurring patterns of fillers with other speech acts, and the linguistic realisations of fillers. The discoursal function of fillers to sustain the discourse or to hold the floor has diverse linguistic realisations, ranging from a sound (e.g. 'uhuh') and a word (e.g. 'well') to sounds (e.g. 'um er') and words, namely phrase ('sort of') and clause (e.g. 'you know'). Some are even combinations of sound(s) and word(s) (e.g. 'and um', 'yes er um', 'sort of erm'). Among the top five frequent linguistic realisations of fillers, 'er' and 'um' are the most common ones found in all the six genres with relatively higher percentages of occurrence. The remaining more frequent realisations consist of clause ('you know'), word ('yeah') and sound ('erm'). These common forms are syntactically simpler than the less frequent realisations found in the genres. The co-occurring patterns of fillers and other speech acts are diverse. The more common co-occurring speech acts with fillers include informing and answering. The findings show that fillers are not only frequently used by speakers in spontaneous conversation but also mostly represented in sounds or non-linguistic realisations.

Low delay window switching modified discrete cosine transform for speech and audio coder (음성 및 오디오 부호화기를 위한 저지연 윈도우 스위칭 modified discrete cosine transform)

  • Kim, Young-Joon;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.2
    • /
    • pp.110-117
    • /
    • 2018
  • In this paper, we propose a low delay window switching MDCT (Modified Discrete Cosine Transform) method for speech/audio coder. The window switching algorithm is used to reduce the degradation of sound quality in non-stationary trasient duration and to reduce the algorithm delay by using the low delay TDAC (Time Domain Aliasing Cancellation). While the conventional window switching algorithms uses overlap-add with different lengths, the proposed method uses the fixed overlap add length. It results the reduction of algorithm delay by half and 1 bit reduction in frame indication information by using 2 window types. We apply the proposed algorithm to G.729.1 based on MDCT in order to evaluate the performance. The propose method shows the reduction of algorithm delay by half while speech quality of the proposed method maintains same as the conventional method.

A Study on the Acoustic Characteristics of Sexy Voice (섹시한 음성의 음향학적 특징 연구)

  • Jeong Ok-Ran;Jo Sung-Mi
    • MALSORI
    • /
    • no.57
    • /
    • pp.73-84
    • /
    • 2006
  • The purpose of this study was to explore the acoustic characteristics of sexy voice. In this study, we measured acoustic parameters (fundamental frequency, jitter, shimmer, and nasalance) of a sustained vowel sound produced by 40 actors (20 males and 20 females) and 40 non-actors (20 males and 20 females). Digital audio recordings were made in the sustained vowel |a| for acoustic analyses using Praat (version 4.1.9) and Nasal View (version 4.5). Twenty voice pathologists participated in the listening experiment and judged the degree of sexiness on a 7-point scale. The results showed that fundamental frequency, shimmer and nasalance had significant differences between actors and non-actors. The acoustic parameters of sexy voice matched perceptual aspects of a previous study: Low fundamental frequency-low pitch and high shimmer-husky voice. On the other hand, the nasalance score did not match that of the previous study: Decreased nasalance had a higher score on sexiness scale judged by the listeners. It would be desirable to study the voice quality by analyzing and controlling more acoustic and auditory parameters for practical applications in the future.

  • PDF

Design of Low Bits Rate Transform Excitation Wide Band Speech and Audio Coder of Analysis-by-Synthesis Structure (분석/합성 구조의 저 전송률 변환여기 광대역 음성/오디오 부호화기 설계)

  • Jang, Sunghoon;Hong, Kibong;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.7
    • /
    • pp.472-479
    • /
    • 2012
  • This paper is aimed to design 9.2 kbps low bits late transform excitation coder that target to voice and audio signal. To set up low bit rate, we used Band-selection in frequency domain and gain-shape quantization and AbS structure. To decrease lots of calculation from ABS structure, we used each band IDFT and synthesis. And we designed non-transfer band for performance by inserting comfort noise. We propose coder that has low bit rate and similar performance comparing with original 10.4 kbps AMR-WB+ TCX mode.