• Title/Summary/Keyword: Audio Data

Search Result 887, Processing Time 0.028 seconds

A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

  • Son, Ji Su;Choi, Seung Ho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.161-167
    • /
    • 2021
  • This paper proposes a new personalized HRTF estimation method which is based on a deep neural network (DNN) model and improved elevation reproduction using a notch filter. In the previous study, a DNN model was proposed that estimates the magnitude of HRTF by using anthropometric measurements [1]. However, since this method uses zero-phase without estimating the phase, it causes the internalization (i.e., the inside-the-head localization) of sound when listening the spatial sound. We devise a method to estimate both the magnitude and phase of HRTF based on the DNN model. Personalized HRIR was estimated using the anthropometric measurements including detailed data of the head, torso, shoulders and ears as inputs for the DNN model. After that, the estimated HRIR was filtered with an appropriate notch filter to improve elevation reproduction. In order to evaluate the performance, both of the objective and subjective evaluations are conducted. For the objective evaluation, the root mean square error (RMSE) and the log spectral distance (LSD) between the reference HRTF and the estimated HRTF are measured. For subjective evaluation, the MUSHRA test and preference test are conducted. As a result, the proposed method can make listeners experience more immersive audio than the previous methods.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

The Application of CSAMT to Deep-seated Coal Seams Exploration (심부 석탄층 탐사에 있어서 CSAMT 탐사법 적용)

  • Chung, Seung-Hwan;Kim, Jung-Ho;Jeon, Jeong-Soo
    • Economic and Environmental Geology
    • /
    • v.23 no.1
    • /
    • pp.73-79
    • /
    • 1990
  • Controlled source audio-frequency magnetotelluric (CSAMT) has the great advantage of efficient mapping resistivity distribution and relatively deep depth of investigation. Moreover, CSAMT may be regarded more attractive than audio-frequency magnetotelluric in the sense of the strong and controllable signal. However, it has the problem such as undershoot and/or near-field effect that is hard to be interpreted if the interpretation method of MT is directly applied. The problem arises from the existance of controlled source which makes CSAMT attractive. So the characteristics of CSAMT response should be thoroughly understood prior to interpretation stage. In this study, numerical modeling program for horizontally layered earth was developped for the interpretation of CSAMT field data. CSAMT field survey was run as a follow-up to resistivity dipole-dipole study over the same survey line at Bongmyung coal mine. The survey used a grounded dipole source 2 Km in length and located 7.5Km south in this study. A good agreement between field CSAMT data and calculated data was demonstrated even in geologically complex earth situations.

  • PDF

Analysis of Multiple Links Services for Development of Bluetooth Based Application (블루투스 기반 어플리케이션 개발을 위한 다중 링크 서비스 분석)

  • Song, Young-Ho;Lee, Tae-Yang;Yeo, Jong-Yun;Moon, Chan-Woo;Jeong, Gu-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.173-178
    • /
    • 2010
  • In this paper, we analyze the performance of multilink Bluetooth service and design the multilink application between various Bluetooth devices. There are many restrictions for the Bluetooth based multilink services. Especially, these restrictions occur due to SCO and ACL which construct logical links. In forming ake a scatternet, only one SCO link can be utilized. Also, the number of ACL link depends on the number of RFCOMM channels. High quality audio data streaming cannot be supported by using SCO link. In that case, ACL link should be used for the audio streaming and the data rate of other data links decreases. We implemented the multilink services to verify the considerations for the multilink service.

The Design of Chorus DSP Chip Using Psychoacoustic Model and SOLA Algorithm (심리음향모델과 SOLA 알고리즘을 이용한 코러스 칩 설계)

  • 김태훈;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.11-19
    • /
    • 2000
  • This research deals with the implementation procedures of a chorus processing DSP for karaoke system. It is necessary to compress the chorus data to store as many choruses as we can. We apply MPEG-1 audio algorithm to compress the chorus data. And the chorus system must be accompanied with the karaoke that can change the key and the tempo. So the chorus DSP must be able to change the key and tempo of the chorus data. We apply SOLA (Synchronized Overlap and Add) to do it. We designed the chorus DSP that can compress the chorus, change the key and tempo. And we verified the chorus DSP logic using FPGA. The used FPGA are two FLEX10K100s made by ALTERA. Finally we make the ASIC chip of chorus DSP and verify its operation.

  • PDF

A Study on the Development of Web-based Full Motion Video E-mail System using MPEG-4 (웹을 기반으로 한 MPEG-4 동영상 E-mail 시스템의 개발)

  • 고재승
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.3
    • /
    • pp.283-294
    • /
    • 2002
  • Now is the time for web-based video e-mail system because of world wide use of internet. But video data is so large, then data compression is much needed for transmission by web. In this paper, my colleagues and I implement full motion video e-mail system using MPEG-4, the international standards for audio-visual data. This video e-mail system is made of web-based active-X control, so easily accessible by web, and applies real-time audio-video compression. It's possible for everyone to send video e-mail for free to everywhere in the world if this system is used. The main application areas of this system are multimedia mailing service, web-based video advertisement, remote education, remote medical service and shopping mall construction, etc.

  • PDF

A Study on MOT Protocol for multimedia Service on Digital Audio Broadcasting Network (DAB망에서 멀티미디어 서비스를 위한 MOT 프로토콜 성능 최적화 방안에 관한 연구)

  • 고예윤;조규섭
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.7-11
    • /
    • 2003
  • Nowadays, as digital technologies are rapidly developed and requirements for the various types of broadband multimedia services increases, the radio broadcasting is moving to digitalization. DAB(Digital Audio Broadcasting), as an alternation of existing analog radio broadcasting, is a new type of multimedia broadcasting system. DAB supports not only high-quality audio broadcasting but also various types of multimedia data services. In this paper, we investigate the performance optimization method of MOT Protocol, as the standard for additional services, to support the multimedia services in the DAB network. Because the performance of the MOT protocol is dependent on various parameters such as segment size, segment repetition and so on, we find those by simulation for performance optimization. According to simulation results, the suitable segment size is about 2Kbyte and segment repetition is 4 times for performance optimization.

  • PDF

An MPEG-2 AAC Encoder Chip Design Operating under 70MIPS (70MIPS 이내에서 동작하는 MPEG-2 AAC 부호화 칩 설계)

  • Kang Hee-Chul;Park Ju-Sung;Jung Kab-Ju;Park Jong-In;Choi Byung-Gab;Kim Tae-Hoon;Kim Sung-Woo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.61-68
    • /
    • 2005
  • A chip, which can fast encoder the audio data to AAC (Advanced Audio Coding) LC(Low Complexity) that is MPEG-2 audio standard, has been designed on the basis of a 32 bits DSP core and fabricated with 0.25um CMOS technology. At first, the various optimization methods for implementing the algerian are devised to reduce the memory size and calculation cycles. FFT(Fast Fourier Transform) hardware block is added to the DSP core to get the more reduction of the calculation cycles. The chips has the size of $7.20\times7.20 mm^2$ and about 830,000 equivalent gates, can carry out AAC encoding under 70MIPS(Million Instructions per Second).

A research on the media player transferring vibrotactile stimulation from digital sound (디지털 음원의 촉각 자극 전이를 위한 미디어 플레이어에 대한 연구)

  • Lim, Young-Hoon;Lee, Su-Jin;Jung, Jong-Hwan;Ha, Ji-Min;Whang, Min-Cheol;Park, Jun-Seok
    • 한국HCI학회:학술대회논문집
    • /
    • 2007.02a
    • /
    • pp.881-886
    • /
    • 2007
  • This study was to develope a vibrotactile display system using windows media player from digital audio signal. WMPlayer10SDK system which was plug-in tool by microsoft windows media player provided its video and audio signal information. The audio signal was tried to be change into vibrotactile display. Audio signal had 4 sections such as 8bit, 16bit, 24bit, and 32bit. Each section was computed its frequency and vibrato scale. And data was transferred to 38400bps network port(COM1) for vibration. Using this system was able to develop the music suit which presented tactile feeling of music beyond sound. Therefore, it may provide cross modal technology for fusion technology of human senses.

  • PDF

A Study of DAB Tuner Module for ITS service (ITS서비스를 위한 DAB 튜너 모듈의 연구)

  • Kim Min-cheol;Sim Wan-ki;Kim Sang-woo;Kim Bok-ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.2 no.2 s.3
    • /
    • pp.1-12
    • /
    • 2003
  • DAB(Digital audio broadcasting) is a next generation radio broadcasting system which provides CD quality audio, various data services and superior reception ability when moving. Also, it can show traffic informations and news literally or graphically. In this paper, we design and fabricate the DAB tuner for ITS service that follows Eureka-147 and ETSI 300 401 specifications. This small-sized tuner can be adopted to mny electronic equipments such as a Hi-Fi audio, DVD player, car audio system etc.. The overall performance of the tuner depends on a phase noise of VCO and the sensitivity of the receiving system is influenced by LNA, image rejection filter and channel selection filter. All our measurement results satisfy the specification for a DAB system with the return loss of 9dB, the noise figure of 6dB for both Band 111 and L-band and the sensitivity of -97dBm.

  • PDF