• Title/Summary/Keyword: Audio Generation

Search Result 103, Processing Time 0.023 seconds

A Study of DAB Tuner Module for ITS service (ITS서비스를 위한 DAB 튜너 모듈의 연구)

  • Kim Min-cheol;Sim Wan-ki;Kim Sang-woo;Kim Bok-ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.2 no.2 s.3
    • /
    • pp.1-12
    • /
    • 2003
  • DAB(Digital audio broadcasting) is a next generation radio broadcasting system which provides CD quality audio, various data services and superior reception ability when moving. Also, it can show traffic informations and news literally or graphically. In this paper, we design and fabricate the DAB tuner for ITS service that follows Eureka-147 and ETSI 300 401 specifications. This small-sized tuner can be adopted to mny electronic equipments such as a Hi-Fi audio, DVD player, car audio system etc.. The overall performance of the tuner depends on a phase noise of VCO and the sensitivity of the receiving system is influenced by LNA, image rejection filter and channel selection filter. All our measurement results satisfy the specification for a DAB system with the return loss of 9dB, the noise figure of 6dB for both Band 111 and L-band and the sensitivity of -97dBm.

  • PDF

Improving Fidelity of Synthesized Voices Generated by Using GANs (GAN으로 합성한 음성의 충실도 향상)

  • Back, Moon-Ki;Yoon, Seung-Won;Lee, Sang-Baek;Lee, Kyu-Chul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.9-18
    • /
    • 2021
  • Although Generative Adversarial Networks (GANs) have gained great popularity in computer vision and related fields, generating audio signals independently has yet to be presented. Unlike images, an audio signal is a sampled signal consisting of discrete samples, so it is not easy to learn the signals using CNN architectures, which is widely used in image generation tasks. In order to overcome this difficulty, GAN researchers proposed a strategy of applying time-frequency representations of audio to existing image-generating GANs. Following this strategy, we propose an improved method for increasing the fidelity of synthesized audio signals generated by using GANs. Our method is demonstrated on a public speech dataset, and evaluated by Fréchet Inception Distance (FID). When employing our method, the FID showed 10.504, but 11.973 as for the existing state of the art method (lower FID indicates better fidelity).

A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model (벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구)

  • Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.243-252
    • /
    • 2024
  • Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.

Audio Quality Enhancement at a Low-bit Rate Perceptual Audio Coding (저비트율로 압축된 오디오의 음질 개선 방법)

  • 서정일;서진수;홍진우;강경옥
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.566-575
    • /
    • 2002
  • Low-titrate audio coding enables a number of Internet and mobile multimedia streaming service more efficiently. For the help of next-generation mobile telephone technologies and digital audio/video compression algorithm, we can enjoy the real-time multimedia contents on our mobile devices (cellular phone, PDA notebook, etc). But the limited available bandwidth of mobile communication network prohibits transmitting high-qualify AV contents. In addition, most bandwidth is assigned to transmit video contents. In this paper, we design a novel and simple method for reproducing high frequency components. The spectrum of high frequency components, which are lost by down-sampling, are modeled by the energy rate with low frequency band in Bark scale, and these values are multiplexed with conventional coded bitstream. At the decoder side, the high frequency components are reconstructed by duplicating with low frequency band spectrum at a rate of decoded energy rates. As a result of segmental SNR and MOS test, we convinced that our proposed method enhances the subjective sound quality only 10%∼20% additional bits. In addition, this proposed method can apply all kinds of frequency domain audio compression algorithms, such as MPEG-1/2, AAC, AC-3, and etc.

A study on implementing real time audio stream generation/restruction/sending system (실시간 오디오 스트림 생성/복원/전송 시스템 구현에 관한 연구)

  • 이경남;박인규
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1199-1202
    • /
    • 1998
  • 4채널 입력으로부터 입력되는 오디오를 압축,복원,저장, 전송하는 ㅅ스템을 설계한다. 이러한 시스템은 보안 시스템 중에서 특정 센서로부터 alarm 신호를 디지털 데이터로 변환한 후, 압축시켜 저장하고 동시에 압축된 오디오 데이터를 비디오 데이터와 통합하여 하나의 스트림으로 만들어 통신망으로 보내주는 시스템에 적용된다. 이러한 시스템의 구조를 간단히 설명하면 아날로그 음성 신호를 디지털 음성 data로 변환하기 위해 OKI사의 MSM 7570L-91이라는 ADPCM codec을 사용하였고 ADPCMcodec을 거쳐 나온 ADPCM 데이터를 64Mbyte SDRAM에 저장하였다가 FIFO를 거쳐서 통신망으로 전송을 한다. 복원은 SDRAM에 저장된 ADPCM 데이터를 MSM 7570L-01을 거쳐 아날로그 신호로 변환한 후 엠프를 거쳐 스피커로 출력을 하게 된다.

  • PDF

Sequence generation and measuring threshold of audio watermarking using sinusoidal function pattern (Sinusoidal Function Pattern을 이용한 오디오 워터마킹의 시퀀스 생성 및 Threshold 설정 방안)

  • 김태훈;김형중
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.87-90
    • /
    • 2003
  • 본 논문에서는 정현파를 이용한 spread-spectrum watermarking 에서 비가청성을 높이기 위한 방법과 효율적인 threshold 설정 방법을 제안한다. 제안하는 방법에서는 spread-spectrum 기법을 사용할 때 계산량이 많이 요구되는 심리음향모델 계산을 피하면서도 가청잡음을 줄이기 위한 방법을 제시한다. 또한 outlier 를 이용하여 워터마크 검출에서 적절한 threshold 설정방법을 제안한다.

  • PDF

Design and Implementation of the Interface between TS Demux and MPEG-4 System in DMB terminal (DMB 단말에서 TS Demux와 MPEG-4 시스템의 인터페이스 설계 및 구현)

  • 서주희;박주희;전종구
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.251-254
    • /
    • 2003
  • DMB is a next-generation multimedia broadcasting system that not only enables digital broadcasting services such as transmission of CD-duality audio, traffic information, and real-time stock information, but also allows reception of high-quality digital TV in high-speed driving conditions. In the DMB system, MPEG-2 TS(Transport Stream) multiplex method and MPEG-4 System SL(Sync Layer) have been selected as the delivery layer. In this paper, an efficient interface scheme between an MPEG-2 TS processing hardware and software-implemented MPEG-4 system within a DMB terminal device is proposed.

  • PDF

New Automatic Taxonomy Generation Algorithm for the Audio Genre Classification (음악 장르 분류를 위한 새로운 자동 Taxonomy 구축 알고리즘)

  • Choi, Tack-Sung;Moon, Sun-Kook;Park, Young-Cheol;Youn, Dae-Hee;Lee, Seok-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.3
    • /
    • pp.111-118
    • /
    • 2008
  • In this paper, we propose a new automatic taxonomy generation algorithm for the audio genre classification. The proposed algorithm automatically generates hierarchical taxonomy based on the estimated classification accuracy at all possible nodes. The estimation of classification accuracy in the proposed algorithm is conducted by applying the training data to classifier using k-fold cross validation. Subsequent classification accuracy is then to be tested at every node which consists of two clusters by applying one-versus-one support vector machine. In order to assess the performance of the proposed algorithm, we extracted various features which represent characteristics such as timbre, rhythm, pitch and so on. Then, we investigated classification performance using the proposed algorithm and previous flat classifiers. The classification accuracy reaches to 89 percent with proposed scheme, which is 5 to 25 percent higher than the previous flat classification methods. Using low-dimensional feature vectors, in particular, it is 10 to 25 percent higher than previous algorithms for classification experiments.

A Study of 3D Sound Modeling based on Geometric Acoustics Techniques for Virtual Reality (가상현실 환경에서 기하학적 음향 기술 기반의 3차원 사운드 모델링 기술에 관한 연구)

  • Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.4
    • /
    • pp.102-106
    • /
    • 2016
  • With the popularity of smart phones and the help of high-speed wireless communication technology, high-quality multimedia contents have become common in mobile devices. Especially, the release of Oculus Rift opens a new era of virtual reality technology in consumer market. At the same time, 3D audio technology which is currently used to make computer games more realistic will soon be applied to the next generation of mobile phone and expected to offer a more expansive experience than its visual counterpart. This paper surveys concepts, algorithms, and systems for modeling 3D sound virtual environment applications. To do this, we first introduce an important design principle for audio rendering based on physics-based geometric algorithms and multichannel technologies, and introduce an audio rendering pipeline to a scene graph-based virtual reality system and a hardware architecture to model sound propagation.

PC-based Control System of Serially Connected Multi-channel Speakers (직렬연결 다채널 스피커의 PC 기반 제어 시스템)

  • Lee, Sun-Yong;Kim, Tae-Wan;Byun, Ji-Sung;Song, Moon-Vin;Chung, Yun-Mo
    • The KIPS Transactions:PartA
    • /
    • v.15A no.6
    • /
    • pp.317-324
    • /
    • 2008
  • In this paper, we propose a system which easily controls the existing serially connected multi-channel speakers in a general personal computer by using a USB(Universal Serial Bus) interface. The personal computer as a host of the USB interface analyzes a sound source and sends audio data in a real-time fashion by the use of the isochronous transmission, one of four transmission methods provided by the USB interface. In addition, a channel is assigned by means of the bulk transmission, one of four transmission methods provided by the USB interface. Transmitted data from the USB host are sent to each speaker through compression and packet generation process. Each speaker detects corresponding digital data and regenerates audio signals through DAC(Digital-to-Analog Converter). A user can easily select a sound source file and a channel by the use of a GUI environment in a personal computer.