• Title/Summary/Keyword: Audio Enhancement

Search Result 59, Processing Time 0.025 seconds

Minimum Statistics-Based Noise Power Estimation for Parametric Image Restoration

  • Yoo, Yoonjong;Shin, Jeongho;Paik, Joonki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.2
    • /
    • pp.41-51
    • /
    • 2014
  • This paper describes a method to estimate the noise power using the minimum statistics approach, which was originally proposed for audio processing. The proposed minimum statistics-based method separates a noisy image into multiple frequency bands using the three-level discrete wavelet transform. By assuming that the output of the high-pass filter contains both signal detail and noise, the proposed algorithm extracts the region of pure noise from the high frequency band using an appropriate threshold. The region of pure noise, which is free from the signal detail part and the DC component, is well suited for minimum statistics condition, where the noise power can be extracted easily. The proposed algorithm reduces the computational load significantly through the use of a simple processing architecture without iteration with an estimation accuracy greater than 90% for strong noise at 0 to 40dB SNR of the input image. Furthermore, the well restored image can be obtained using the estimated noise power information in parametric image restoration algorithms, such as the classical parametric Wiener or ForWaRD image restoration filters. The experimental results show that the proposed algorithm can estimate the noise power accurately, and is particularly suitable for fast, low-cost image restoration or enhancement applications.

Blind Audio Source Separation Based On High Exploration Particle Swarm Optimization

  • KHALFA, Ali;AMARDJIA, Nourredine;KENANE, Elhadi;CHIKOUCHE, Djamel;ATTIA, Abdelouahab
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.5
    • /
    • pp.2574-2587
    • /
    • 2019
  • Blind Source Separation (BSS) is a technique used to separate supposed independent sources of signals from a given set of observations. In this paper, the High Exploration Particle Swarm Optimization (HEPSO) algorithm, which is an enhancement of the Particle Swarm Optimization (PSO) algorithm, has been used to separate a set of source signals. Compared to PSO algorithm, HEPSO algorithm depends on two additional operators. The first operator is based on the multi-crossover mechanism of the genetic algorithm while the second one relies on the bee colony mechanism. Both operators have been employed to update the velocity and the position of the particles respectively. Thus, they are used to find the optimal separating matrix. The proposed method enhances the overall efficiency of the standard PSO in terms of good exploration and performance. Based on many tests realized on speech and music signals supplied by the BSS demo, experimental results confirm the robustness and the accuracy of the introduced BSS technique.

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

  • HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.127-136
    • /
    • 2024
  • The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

Tone Quality Improvement Algorithm using Intelligent Estimation of Noise Pattern (잡음 패턴의 지능적 추정을 통한 음질 개선 알고리즘)

  • Seo, Joung-Kook;Cha, Hyung-Tai
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.230-235
    • /
    • 2005
  • In this paper, we propose an algorithm that improves a tone quality of a noisy audio signal in order to enhance a performance of perceptual filter using intelligent estimation of noise pattern from a band degraded by additive noise. The proposed method doesn't use the estimated noise which is obtained from silent range. Instead new estimated noise according to the power of signal and effect of noise variation is considered for each frame. So the noisy audio signal is enhanced by the method which controls a estimation of noise Pattern effectively in a noise corruption band. To show the performance of the proposed algorithm, various input signals which had a different signal-to-noise ratio(SNR) such as $5\cal{dB},\;10\cal{dB},\;15\cal{dB}\;and\;20\cal{dB}$ were used to test the proposed algorithm. we carry out SSNR and NMR of objective measurement and MOS test of subjective measurement. An approximate improvement of $7.4\cal{dB},\;6.8\cal{dB},\;5.7\cal{dB},\;5.1\cal{dB}$ in SSNR and $15.7\cal{dB},\;15.5\cal{dB},\;15.2\cal{dB},\;14.8\cal{dB}$ in NMR is achieved with the input signals, respectively. And we confirm the enhancement of tone quality in terms of mean opinion score(MOS) test which is result of subjective measurement.

A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model (벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구)

  • Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.243-252
    • /
    • 2024
  • Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.

User Perception of Olfactory Information for Video Reality and Video Classification (영상실감을 위한 후각정보에 대한 사용자 지각과 영상분류)

  • Lee, Guk-Hee;Li, Hyung-Chul O.;Ahn, Chung Hyun;Choi, Ji Hoon;Kim, Shin Woo
    • Journal of the HCI Society of Korea
    • /
    • v.8 no.2
    • /
    • pp.9-19
    • /
    • 2013
  • There has been much advancement in reality enhancement using audio-visual information. On the other hand, there is little research on provision of olfactory information because smell is difficult to implement and control. In order to obtain necessary basic data when intend to provide smell for video reality, in this research, we investigated user perception of smell in diverse videos and then classified the videos based on the collected user perception data. To do so, we chose five main questions which were 'whether smell is present in the video'(smell presence), 'whether one desire to experience the smell with the video'(preference for smell presence with the video), 'whether one likes the smell itself'(preference for the smell itself), 'desired smell intensity if it is presented with the video'(smell intensity), and 'the degree of smell concreteness'(smell concreteness). After sampling video clips of various genre which are likely to receive either high and low ratings in the questions, we had participants watch each video after which they provided ratings on 7-point scale for the above five questions. Using the rating data for each video clips, we constructed scatter plots by pairing the five questions and representing the rating scale of each paired questions as X-Y axes in 2 dimensional spaces. The video clusters and distributional shape in the scatter plots would provide important insight into characteristics of each video clusters and about how to present olfactory information for video reality.

  • PDF

A Study on the Enhancement for Satellite Digital Multimedia Broadcasting System E (위성 DMB 시스템 E의 고도화에 관한 연구)

  • Choi, Seung-Hyun;Oh, Doeck-Gil;Chang, Dae-Ig
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.44 no.3 s.357
    • /
    • pp.85-91
    • /
    • 2007
  • Satellite Digital Multimedia Broadcasting (S-DMB) is the digital convergence service of broadcasting and communication for mobility and portability. Broadcasting service of S-DMB can be taken by the mobile phone or vehicle terminals anytime and anywhere. S-DMB system is currently providing 11 video channels and 26 audio channels. As the demand of multimedia service is recently increasing, S-DMB system needs high quality and new contents service. Therefore we need to make efficient S-DMB with more channel ability and high transmission quality. In this paper, we propose new S-DMB system that can be applied to powerful channel coding scheme and hierarchical 8-PSK(8-Phase Shift Keying) modulation with Backwards Compatibility modes that simultaneously can support both current and new system. And we analyze the performance of current S-DMB system and verify a possibility of advanced S-DMB through computer simulation.

Artificial speech bandwidth extension technique based on opus codec using deep belief network (심층 신뢰 신경망을 이용한 오푸스 코덱 기반 인공 음성 대역 확장 기술)

  • Choi, Yoonsang;Li, Yaxing;Kang, Sangwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.1
    • /
    • pp.70-77
    • /
    • 2017
  • Bandwidth extension is a technique to improve speech quality, intelligibility and naturalness, extending from the 300 ~ 3,400 Hz narrowband speech to the 50 ~ 7,000 Hz wideband speech. In this paper, an Artificial Bandwidth Extension (ABE) module embedded in the Opus audio decoder is designed using the information of narrowband speech to reduce the computational complexity of LPC (Linear Prediction Coding) and LSF (Line Spectral Frequencies) analysis and the algorithm delay of the ABE module. We proposed a spectral envelope extension method using DBN (Deep Belief Network), one of deep learning techniques, and the proposed scheme produces better extended spectrum than the traditional codebook mapping method.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.