• Title/Summary/Keyword: Audio Technology

Search Result 638, Processing Time 0.031 seconds

Audio-Visual Fusion for Sound Source Localization and Improved Attention (음성-영상 융합 음원 방향 추정 및 사람 찾기 기술)

  • Lee, Byoung-Gi;Choi, Jong-Suk;Yoon, Sang-Suk;Choi, Mun-Taek;Kim, Mun-Sang;Kim, Dai-Jin
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.35 no.7
    • /
    • pp.737-743
    • /
    • 2011
  • Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.

Towards Low Complexity Model for Audio Event Detection

  • Saleem, Muhammad;Shah, Syed Muhammad Shehram;Saba, Erum;Pirzada, Nasrullah;Ahmed, Masood
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.175-182
    • /
    • 2022
  • In our daily life, we come across different types of information, for example in the format of multimedia and text. We all need different types of information for our common routines as watching/reading the news, listening to the radio, and watching different types of videos. However, sometimes we could run into problems when a certain type of information is required. For example, someone is listening to the radio and wants to listen to jazz, and unfortunately, all the radio channels play pop music mixed with advertisements. The listener gets stuck with pop music and gives up searching for jazz. So, the above example can be solved with an automatic audio classification system. Deep Learning (DL) models could make human life easy by using audio classifications, but it is expensive and difficult to deploy such models at edge devices like nano BLE sense raspberry pi, because these models require huge computational power like graphics processing unit (G.P.U), to solve the problem, we proposed DL model. In our proposed work, we had gone for a low complexity model for Audio Event Detection (AED), we extracted Mel-spectrograms of dimension 128×431×1 from audio signals and applied normalization. A total of 3 data augmentation methods were applied as follows: frequency masking, time masking, and mixup. In addition, we designed Convolutional Neural Network (CNN) with spatial dropout, batch normalization, and separable 2D inspired by VGGnet [1]. In addition, we reduced the model size by using model quantization of float16 to the trained model. Experiments were conducted on the updated dataset provided by the Detection and Classification of Acoustic Events and Scenes (DCASE) 2020 challenge. We confirm that our model achieved a val_loss of 0.33 and an accuracy of 90.34% within the 132.50KB model size.

The Design of Vector Processor for MDCT/IMDCT of MPEG-II AAC (MPEG-II AAC의 MDCT/IMDCT를 위한 벡터 프로세서 설계)

  • 이강현
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.329-332
    • /
    • 1999
  • Currently, the most important technology is compression methods in the multimedia society. In audio compression, the method using human auditory nervous property is used. This method using psychoacoustical model is applied to perceptual audio coding, because human's audibility is limited. MPEG-II AAC(Advanced Audio Coding) is the most advanced coding scheme that is of benefit to high quality audio coding. The compression ratio is 1.4 times compared with MPEG-I layer-III. In this paper, the vector processor for MDCT/IMDCT(Modified Discrete Cosine Transform /Inverse Modified Discrete Cosine Transform) of MPEG-II AAC is designed.

  • PDF

Implementation of On-site Audio Center based on AoIP

  • Lee, Jaeho;Kwon, Soonchul;Lee, Seunghyun
    • International journal of advanced smart convergence
    • /
    • v.6 no.2
    • /
    • pp.51-58
    • /
    • 2017
  • Recently, rapid advances of Ethernet and IP technology have brought many changes in the sound industry. In addition, due to AoIP-based audio transmission technology, various problems of the acoustic system (sound quality deterioration due to long distance transmission, complicated wiring) have improved dramatically. However, when many distributed audio systems are connected with AoIP equipment, if there is a problem in the equipment, it is impossible to operate the connected system. AoIP equipment only can transmit audio signals but cannot adjust the system for acoustic environment. In this paper, AoIP equipment is to be installed with sound equipment on a one-to-one basis, so that various existing problems can be solved and adjustment of sound quality (reverberation, echo, delay and EQ) can be possible by AoIP-based OAC (On-site Audio Center) with built-in DSP function. As a result, uncompressed real-time transmission by distributed transmission/receipt module in OAC (On-site Audio Center) and high quality sound by adjustment of sound quality with built-in DSP can be acquired. It is expected that OAC based sound system will be the industry standard in ubiquitous environment.

A Study on analysis of digital TV loudness (디지털 TV 방송음량에 대한 연구)

  • Lee, SangWoon;Cho, YoungSeong;Kim, JaeKyung
    • Journal of Satellite, Information and Communications
    • /
    • v.8 no.4
    • /
    • pp.105-110
    • /
    • 2013
  • After analog broadcast changed to digital, the dynamic range of the broadcast audio became wider. As there are no regulation for digital broadcast audio level, the audio level of digital TV is gradually increasing, and this phenomenon is getting more serious. because of competition between broadcasters and programmes. To solve this problem, ITU-R legislated technical recommendations for digital TV audio level. In this paper, the audio levels of domestic TV channels are measured according to the algorithm of the ITU-R, and analyzed management method is suggested.

Status of 3D Audio Technology Development for the difference of Listening Environments (청취환경 차이에 따른 3차원 오디오 기술 개발 동향)

  • Seo, Jeong-Il;Lee, Yong-Ju;Jang, In-Seon;Yu, Jae-Hyeon;Gang, Gyeong-Ok
    • Broadcasting and Media Magazine
    • /
    • v.13 no.1
    • /
    • pp.82-96
    • /
    • 2008
  • 3D Audio Technologies include whole signal processing steps from acquisition to reproduction through encoding and transmitting technologies. However, there is a certain difference on adapted technologies according to audio presentation environments, because the presentation environment is the last step to provide 3D audio th listeners. In this paper, we describe variable 3D audio technologies to adapt variable audio presentation environments for consuming music contents.

Convolutional Neural Network based Audio Event Classification

  • Lim, Minkyu;Lee, Donghyun;Park, Hosung;Kang, Yoseb;Oh, Junseok;Park, Jeong-Sik;Jang, Gil-Jin;Kim, Ji-Hwan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2748-2760
    • /
    • 2018
  • This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.

The Noise Influence of 4G Mobile Transmitter on Audio Devices (4G 휴대 단말기 송신에 의한 오디오 잡음 영향)

  • Yun, Hye-Ju;Lee, Il-Kyoo
    • Journal of Satellite, Information and Communications
    • /
    • v.8 no.1
    • /
    • pp.31-34
    • /
    • 2013
  • This paper deals with the interfering audio noise caused by LTE(Long Term Evolution) UE(User Equipment) which is 4th generation mobile communications on audio devices. At first, we realized that the interfering signal of the LTE UE is determined by the transmit power of the LTE UE through analysis and measurement. Then, we performed to measure audio noise level according to the variation of transmitting power level and separation distance between the LTE UE and an audio device. As a result, it is required that minimum separation distance should be 25 cm and above in order to protect audio device from the interference noise of the LTE UE with the maximum transmit power level of 22 dBm.

Robust Audio Copyright Protection Technology to the Time Axis Attack (시간축 공격에 강인한 오디오 저작권보호 기술)

  • Bae, Kyoung-Yul
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.201-212
    • /
    • 2009
  • Even though the spread spectrum method is known as most robust algorithm to general attacks, it has a drawback to the time axis attack. In this paper, I proposed a robust audio copyright protection algorithm which is robust to the time axis attack and has advantages of the spread spectrum method. Time axis attack includes the audio length variation attack with same pitch and the audio frequency variation attack. In order to detect the embedded watermark by the spread spectrum method, the detection algorithm should know the exact rate of the time axis attack. Even if there is a method to know the rate, it needs heavy computational resource and it is not possible to implement. In this paper, solving this problem, the audio signal is transformed into time-invariant domain, and the spread spectrum watermark is embedded into the audio in the domain. Therefore the proposed algorithm has the advantages of the spread spectrum method and it is also robust to the time axis attack. The time-invariant domain process is that the audio is arranged by log scale time axis, and then, the Fourier transform is taken to the audio in the log scale time axis. As a result, the algorithm can get the time-invariant watermark signal.

  • PDF

Design of Music Learning Assistant Based on Audio Music and Music Score Recognition

  • Mulyadi, Ahmad Wisnu;Machbub, Carmadi;Prihatmanto, Ary S.;Sin, Bong-Kee
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.5
    • /
    • pp.826-836
    • /
    • 2016
  • Mastering a musical instrument for an unskilled beginning learner is not an easy task. It requires playing every note correctly and maintaining the tempo accurately. Any music comes in two forms, a music score and it rendition into an audio music. The proposed method of assisting beginning music players in both aspects employs two popular pattern recognition methods for audio-visual analysis; they are support vector machine (SVM) for music score recognition and hidden Markov model (HMM) for audio music performance tracking. With proper synchronization of the two results, the proposed music learning assistant system can give useful feedback to self-training beginners.