• Title/Summary/Keyword: Audio Generation

Search Result 103, Processing Time 0.023 seconds

Audio Format Comparative Study and Suggestion for Next Generation DTV (차세대 디지털 TV 방송을 위한 오디오 규격 비교 분석 및 제언)

  • Lee, Jae-Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.337-343
    • /
    • 2011
  • With commencing trial 3D digital broadcasting, the studies on next generation digital broadcasting technology for coming UHDTV era is being actively progressing. In this paper, I propose surround audio formats for next-generation digital TV broadcasting, along with comparative study of major surround audio formats in use or under development. I did comparative study on current major competing surround formats such as Dolby True HD and DTS HD MA, along with NHK proposed 22.2 channel surround format for UHDTV system. Upon this comparative study and our housing situation consideration, I propose lossy compression 3D surround 7.1 channel surround format along with loosless 2.0 and 4.0 hi-fi format as next generation digital TV broadcasting standard. In lieu with this, I also propose transmitting binaural 2 channel audio data as sub-audio. It will give holographic sound experience when properly processed with individual HRTF (Head Related Transfer Function) with headphone. The table for data rate of each proposed audio format is also presented.

Music Recognition Using Audio Fingerprint: A Survey (오디오 Fingerprint를 이용한 음악인식 연구 동향)

  • Lee, Dong-Hyun;Lim, Min-Kyu;Kim, Ji-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.77-87
    • /
    • 2012
  • Interest in music recognition has been growing dramatically after NHN and Daum released their mobile applications for music recognition in 2010. Methods in music recognition based on audio analysis fall into two categories: music recognition using audio fingerprint and Query-by-Singing/Humming (QBSH). While music recognition using audio fingerprint receives music as its input, QBSH involves taking a user-hummed melody. In this paper, research trends are described for music recognition using audio fingerprint, focusing on two methods: one based on fingerprint generation using energy difference between consecutive bands and the other based on hash key generation between peak points. Details presented in the representative papers of each method are introduced.

Research on Machine Learning Rules for Extracting Audio Sources in Noise

  • Kyoung-ah Kwon
    • International Journal of Advanced Culture Technology
    • /
    • v.12 no.3
    • /
    • pp.206-212
    • /
    • 2024
  • This study presents five selection rules for training algorithms to extract audio sources from noise. The five rules are Dynamics, Roots, Tonal Balance, Tonal-Noisy Balance, and Stereo Width, and the suitability of each rule for sound extraction was determined by spectrogram analysis using various types of sample sources, such as environmental sounds, musical instruments, human voice, as well as white, brown, and pink noise with sine waves. The training area of the algorithm includes both melody and beat, and with these rules, the algorithm is able to analyze which specific audio sources are contained in the given noise and extract them. The results of this study are expected to improve the accuracy of the algorithm in audio source extraction and enable automated sound clip selection, which will provide a new methodology for sound processing and audio source generation using noise.

A Study on the Development for 3D Audio Generation Machine

  • Kim Sung-Eun;Kim Myong-Hee;Park Man-Gon
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.6
    • /
    • pp.807-813
    • /
    • 2005
  • The production and authoring of digital multimedia contents are most important fields in multimedia technology. Nowadays web-based technology and related multimedia software technology are growing in the IT industry and these technologies are evolving most rapidly in our life. The technology of digital audio and video processing is utilizing rapidly to improve quality of our life, Also we are more interested in high sense and artistic feeling in the music and entertainment areas by use of three dimensional (3D) digital sound technology continuously as well as 3D digital video technology. The service field of digital audio contents is increasing rapidly through the Internet. And the society of Internet users wants the audio contents service with better quality. Recently Internet users are not satisfying the sound quality with 2 channels stereo but seeking the high quality of sound with 5,] channels such as 3D audio of the movie films. But it might be needed proper hardware equipments for the service of 3D sound to satisfy this demand. In this paper, we expand the simple 3D audio generator developed and propose a web-based music bank by the software development of 3D audio generation player in 3D sound environment with two speakers minimizing hardware equipments, Also we believe that this study would contribute greatly to digital 3D sound service of high quality for music and entertainment mania.

  • PDF

MPEG Surround Extension Technique for MPEG-H 3D Audio

  • Beack, Seungkwon;Sung, Jongmo;Seo, Jeongil;Lee, Taejin
    • ETRI Journal
    • /
    • v.38 no.5
    • /
    • pp.829-837
    • /
    • 2016
  • In this paper, we introduce extension tools for MPEG Surround, which were recently adopted as MPEG-H 3D Audio tools by the ISO/MPEG standardization group. MPEG-H 3D Audio is a next-generation technology for representing spatial audio in an immersive manner. However, considerably large numbers of input signals can degrade the compression performance during a low bitrate operation. The proposed extension of MPEG Surround was basically designed based on the original MPEG Surround technology, where the limitations of MPEG Surround were revised by adopting a new coding structure. The proposed MPEG-H 3D Audio technologies will play a pivotal role in dramatically improving the sound quality during a lower bitrate operation.

Serial Transmission of Audio Signals for Multi-channel Speaker Systems (다채널 스피커 시스템을 위한 오디오 신호지 직렬 전송)

  • Kwon, Oh-Kyun;Song, Moon-Vin;Lee, Seung-Won;Lee, Young-Won;Chung, Yun-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.7
    • /
    • pp.387-394
    • /
    • 2005
  • In this paper, we propose a new transmission technique of audio signals for the serial connection of the speakers of multiple-channel audio systems. Analog audio signals from a multi-channel audio system are converted into digital signals with signal processing steps and transferred to each speaker through a serial line. The signal processing steps contain data compression and packet generation in association with audio signal characteristics. Each speaker gets its corresponding digital audio signals from the transmitted packets and converts the signals into analog audio signals to make sounds with the speaker All the proposed functions in this paper are modeled in VHDL. implemented with FPGA chips, and tested for actual multi-channel audio systems.

Modeling and Analysis of Class D Audio Amplifiers using Control Theories (제어이론을 이용한 D급 디지털 오디오 증폭기의 모델링과 해석)

  • Ryu, Tae-Ha;Ryu, Ji-Yeol;Doh, Tae-Yong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.4
    • /
    • pp.385-391
    • /
    • 2007
  • A class D digital audio amplifier with small size, low cost, and high quality is positively necessary in the multimedia era. Since the digital audio amplifier is based on the PWM signal processing, it is improper to analyze the principle of signal generation using linear system theories. In this paper, a class D digital audio amplifier based ADSM (Advanced Delta-Sigma Modulation) is considered. We first model the digital audio amplifier and then explain the operation principle using variable structure control algorithm. Moreover, the ripple signal generated by the hysteresis in the comparator has a significant effect on the system performance. Thus, we present a method to find the magnitude and the frequency of the ripple signal using describing function. Finally, simulations and experiments are provided to show the validity of the proposed methods.

The Noise Influence of 4G Mobile Transmitter on Audio Devices (4G 휴대 단말기 송신에 의한 오디오 잡음 영향)

  • Yun, Hye-Ju;Lee, Il-Kyoo
    • Journal of Satellite, Information and Communications
    • /
    • v.8 no.1
    • /
    • pp.31-34
    • /
    • 2013
  • This paper deals with the interfering audio noise caused by LTE(Long Term Evolution) UE(User Equipment) which is 4th generation mobile communications on audio devices. At first, we realized that the interfering signal of the LTE UE is determined by the transmit power of the LTE UE through analysis and measurement. Then, we performed to measure audio noise level according to the variation of transmitting power level and separation distance between the LTE UE and an audio device. As a result, it is required that minimum separation distance should be 25 cm and above in order to protect audio device from the interference noise of the LTE UE with the maximum transmit power level of 22 dBm.

Human Laughter Generation using Hybrid Generative Models

  • Mansouri, Nadia;Lachiri, Zied
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1590-1609
    • /
    • 2021
  • Laughter is one of the most important nonverbal sound that human generates. It is a means for expressing his emotions. The acoustic and contextual features of this specific sound are different from those of speech and many difficulties arise during their modeling process. During this work, we propose an audio laughter generation system based on unsupervised generative models: the autoencoder (AE) and its variants. This procedure is the association of three main sub-process, (1) the analysis which consist of extracting the log magnitude spectrogram from the laughter database, (2) the generative models training, (3) the synthesis stage which incorporate the involvement of an intermediate mechanism: the vocoder. To improve the synthesis quality, we suggest two hybrid models (LSTM-VAE, GRU-VAE and CNN-VAE) that combine the representation learning capacity of variational autoencoder (VAE) with the temporal modelling ability of a long short-term memory RNN (LSTM) and the CNN ability to learn invariant features. To figure out the performance of our proposed audio laughter generation process, objective evaluation (RMSE) and a perceptual audio quality test (listening test) were conducted. According to these evaluation metrics, we can show that the GRU-VAE outperforms the other VAE models.

Multichannel Audio Reproduction Technology based on 10.2ch for UHDTV (UHDTV를 위한 10.2 채널 기반 다채널 오디오 재현 기술)

  • Lee, Tae-Jin;Yoo, Jae-Hyoun;Seo, Jeong-Il;Kang, Kyeong-Ok;Kim, Whan-Woo
    • Journal of Broadcast Engineering
    • /
    • v.17 no.5
    • /
    • pp.827-837
    • /
    • 2012
  • As broadcasting environments change rapidly to digital, user requirements for next-generation broadcasting service which surpass current HDTV service become bigger and bigger. The next-generation broadcasting service progress from 2D to 3D, from HD to UHD and from 5.1ch audio to more than 10ch audio for high quality realistic broadcasting service. In this paper, we propose 10.2ch based multichannel audio reproduction system for UHDTV. The 10.2ch-based audio reproduction system add two side loudspeakers to enhance the surround sound localization effect and add two height and one ceiling loudspeakers to enhance the elevation localization effect. To evaluate the proposed system, we used APM(Auditory Process Model) for objective localization test and conducted subjective localization test. As a result of objective/subjective localization test, the proposed system shows the statistically same performance compare with 22.2ch audio system and shows the significantly better performance compared with 5.1ch audio system.