Search | Korea Science

Audio Event Detection Using Deep Neural Networks (깊은 신경망을 이용한 오디오 이벤트 검출)

Lim, Minkyu;Lee, Donghyun;Park, Hosung;Kim, Ji-Hwan
- Journal of Digital Contents Society
- /
- v.18 no.1
- /
- pp.183-190
- /
- 2017
This paper proposes an audio event detection method using Deep Neural Networks (DNN). The proposed method applies Feed Forward Neural Network (FFNN) to generate output probabilities of twenty audio events for each frame. Mel scale filter bank (FBANK) features are extracted from each frame, and its five consecutive frames are combined as one vector which is the input feature of the FFNN. The output layer of FFNN produces audio event probabilities for each input feature vector. More than five consecutive frames of which event probability exceeds threshold are detected as an audio event. An audio event continues until the event is detected within one second. The proposed method achieves as 71.8% accuracy for 20 classes of the UrbanSound8K and the BBC Sound FX dataset.
https://doi.org/10.9728/dcs.2017.18.1.183 인용 PDF KSCI

Audio Data Hiding Based on Sample Value Modification Using Modulus Function

Al-Hooti, Mohammed Hatem Ali;Djanali, Supeno;Ahmad, Tohari
- Journal of Information Processing Systems
- /
- v.12 no.3
- /
- pp.525-537
- /
- 2016
Data hiding is a wide field that is helpful to secure network communications. It is common that many data hiding researchers consider improving and increasing many aspects such as capacity, stego file quality, or robustness. In this paper, we use an audio file as a cover and propose a reversible steganographic method that is modifying the sample values using modulus function in order to make the reminder of that particular value to be same as the secret bit that is needed to be embedded. In addition, we use a location map that locates these modified sample values. This is because in reversible data hiding it needs to exactly recover both the secret message and the original audio file from that stego file. The experimental results show that, this method (measured by correlation algorithm) is able to retrieve exactly the same secret message and audio file. Moreover, it has made a significant improvement in terms of the following: the capacity since each sample value is carrying a secret bit. The quality measured by peak signal-to-noise ratio (PSNR), signal-to-noise ratio (SNR), Pearson correlation coefficient (PCC), and Similarity Index Modulation (SIM). All of them have proven that the quality of the stego audio is relatively high.
https://doi.org/10.3745/JIPS.03.0054 인용 PDF KSCI

A Color Image Watermarking Method for Embedding Audio Signal

Kim Sang Jin;Kim Chung Hwa
- Proceedings of the IEEK Conference
- /
- 2004.08c
- /
- pp.631-635
- /
- 2004
The rapid development of digital media and communication network urgently brings about the need of data certification technology to protect IPR (Intellectual property right). This paper proposed a new watermarking method for embedding contents owner's audio signal in order to protect color image IPR. Since this method evolves the existing static model and embeds audio signal of big data, it has the advantage of restoring signal transformed due to attacks. Three basic stages of watermarking include: 1) Encode analogue ID owner's audio signal using PCM and create new 3D audio watermark; 2) Interleave 3D audio watermark by linear bit expansion and 3) Transform Y signal of color image into wavelet and embed interleaved audio watermark in the low frequency band on the transform domain. The results demonstrated that the audio signal embedding in color image proposed in this paper enhanced robustness against lossy JPEG compression, standard image compression and image cropping and rotation which remove a part of image.
PDF

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

김명원;한문성;이순신;류정우
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.3
- /
- pp.67-77
- /
- 2004
The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.
PDF KSCI

Towards Low Complexity Model for Audio Event Detection

Saleem, Muhammad;Shah, Syed Muhammad Shehram;Saba, Erum;Pirzada, Nasrullah;Ahmed, Masood
- International Journal of Computer Science & Network Security
- /
- v.22 no.9
- /
- pp.175-182
- /
- 2022
In our daily life, we come across different types of information, for example in the format of multimedia and text. We all need different types of information for our common routines as watching/reading the news, listening to the radio, and watching different types of videos. However, sometimes we could run into problems when a certain type of information is required. For example, someone is listening to the radio and wants to listen to jazz, and unfortunately, all the radio channels play pop music mixed with advertisements. The listener gets stuck with pop music and gives up searching for jazz. So, the above example can be solved with an automatic audio classification system. Deep Learning (DL) models could make human life easy by using audio classifications, but it is expensive and difficult to deploy such models at edge devices like nano BLE sense raspberry pi, because these models require huge computational power like graphics processing unit (G.P.U), to solve the problem, we proposed DL model. In our proposed work, we had gone for a low complexity model for Audio Event Detection (AED), we extracted Mel-spectrograms of dimension 128×431×1 from audio signals and applied normalization. A total of 3 data augmentation methods were applied as follows: frequency masking, time masking, and mixup. In addition, we designed Convolutional Neural Network (CNN) with spatial dropout, batch normalization, and separable 2D inspired by VGGnet [1]. In addition, we reduced the model size by using model quantization of float16 to the trained model. Experiments were conducted on the updated dataset provided by the Detection and Classification of Acoustic Events and Scenes (DCASE) 2020 challenge. We confirm that our model achieved a val_loss of 0.33 and an accuracy of 90.34% within the 132.50KB model size.
https://doi.org/10.22937/IJCSNS.2022.22.9.26 인용 PDF KSCI

A Scalable Audio Coder for High-quality Speech and Audio Services

Lee, Gil-Ho;Lee, Young-Han;Kim, Hong-Kook;Kim, Do-Young;Lee, Mi-Suk
- MALSORI
- /
- no.61
- /
- pp.75-86
- /
- 2007
In this paper, we propose a scalable audio coder, which has a variable bandwidth from the narrowband speech bandwidth to the audio bandwidth and also has a bit-rate from 8 to 320 kbits/s, in order to cope with the quality of service(QoS) according to the network load. First of all, the proposed scalable coder splits bandwidth of the input audio into narrowband up to around 4 kHz and above. Next, the narrowband signals are compressed by a speech coding method compatible to an existing standard speech coder such as G.729, and the other signals whose bandwidth is above the narrowband are compressed on the basis of a psychoacoustic model. It is shown from the objective quality tests using the signal-to-noise ratio(SNR) and the perceptual evaluation of audio quality(PEAQ) that the proposed scalable audio coder provides a comparable quality to the MPEG-1 Layer III (MP3) audio coder.
PDF

LED Communication based Multi-hop Audio Data Transmission Network System (LED 통신 기반 멀티 홉 오디오 데이터 전송네트워크시스템)

Jo, Seung Wan;Le, The Dung;An, Beongku
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.6
- /
- pp.180-187
- /
- 2013
In this paper, we propose a LED communication based multi-hop audio data transmission network system. The main contribution and features of the proposed system are as follows. First, the contribution of this research is to develope the LED communication based multi-hop transmission network system which can transmit audio data signal with long distance via multi-hops. Second, the developed system has the following features: In transmitter, audio data is transmitted after encoding with S/PDIF format via a general LED. The relay receives digital audio signal by using photo diode and then transmits the signal to receiver after error checking and amplifying. The receiver receives the encoded audio data via photo diode and then converts to analog audio signal by using decoding and amplifying. The performance evaluation of the proposed system is conducted in the laboratory with fluorescent light source. The results of the performance evaluation confirm that the system can provide high quality audio transmission from transmiter to receiver via multi-hop relays in a long distance while we can see there are differences in the transmitted audio quality according to the used LED colors.
https://doi.org/10.5573/ieek.2013.50.6.180 인용 PDF KSCI

Network-based Digital Crossover for Active Speakers (능동스피커를 위한 네트워크기반 디지털 크로스오버)

Kim, Byun-Gon;Kim, Kwan-Woong;Kim, Dae-Ik
- The Journal of the Korea institute of electronic communication sciences
- /
- v.10 no.2
- /
- pp.227-232
- /
- 2015
Nowadays, there are many innovative products in the pro-audio market thanks to advanced IT technology, DSP is very important technology to process high quality audio signal in SR(Sound Reinforcement) system. Digital audio technology that converged with IT technology can give new user-experience. In this paper, we present a new digital crossover system for active speakers using DSP and network technology. The prototype of crossover module consists of various audio process module such as filters, delay, phase controls and also it provides user to remote monitoring and remote control features by internet connection.
https://doi.org/10.13067/JKIECS.2015.10.2.227 인용 PDF KSCI

Audio Steganography Method Using Least Significant Bit (LSB) Encoding Technique

Alarood, Alaa Abdulsalm;Alghamdi, Ahmed Mohammed;Alzahrani, Ahmed Omar;Alzahrani, Abdulrahman;Alsolami, Eesa
- International Journal of Computer Science & Network Security
- /
- v.22 no.7
- /
- pp.427-442
- /
- 2022
MP3 is one of the most widely used file formats for encoding and representing audio data. One of the reasons for this popularity is their significant ability to reduce audio file sizes in comparison to other encoding techniques. Additionally, other reasons also include ease of implementation, its availability and good technical support. Steganography is the art of shielding the communication between two parties from the eyes of attackers. In steganography, a secret message in the form of a copyright mark, concealed communication, or serial number can be embedded in an innocuous file (e.g., computer code, video film, or audio recording), making it impossible for the wrong party to access the hidden message during the exchange of data. This paper describes a new steganography algorithm for encoding secret messages in MP3 audio files using an improved least significant bit (LSB) technique with high embedding capacity. Test results obtained shows that the efficiency of this technique is higher compared to other LSB techniques.
https://doi.org/10.22937/IJCSNS.2022.22.7.53 인용 PDF KSCI

Search Result 357, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)