Search | Korea Science

Convolutional Neural Network based Audio Event Classification

Lim, Minkyu;Lee, Donghyun;Park, Hosung;Kang, Yoseb;Oh, Junseok;Park, Jeong-Sik;Jang, Gil-Jin;Kim, Ji-Hwan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.6
- /
- pp.2748-2760
- /
- 2018
This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.
https://doi.org/10.3837/tiis.2018.06.017 인용 PDF KSCI

Optimization of the Number of Filter in CNN Noise Attenuator (CNN 잡음감쇠기에서 필터 수의 최적화)

Lee, Haeng-Woo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.16 no.4
- /
- pp.625-632
- /
- 2021
This paper studies the effect of the number of filters in the CNN (Convolutional Neural Network) layer on the performance of a noise attenuator. Speech is estimated from a noised speech signal using a 64-neuron, 16-kernel CNN filter and an error back-propagation algorithm. In this study, in order to verify the performance of the noise attenuator with respect to the number of filters, a program using Keras library was written and simulation was performed. As a result of simulation, it can be seen that this system has the smallest MSE (Mean Squared Error) and MAE (Mean Absolute Error) values when the number of filters is 16, and the performance is the lowest when there are 4 filters. And when there are more than 8 filters, it was shown that the MSE and MAE values do not differ significantly depending on the number of filters. From these results, it can be seen that about 8 or more filters must be used to express the characteristics of the speech signal.
https://doi.org/10.13067/JKIECS.2021.16.4.625 인용 PDF KSCI

A New CSR-DCF Tracking Algorithm based on Faster RCNN Detection Model and CSRT Tracker for Drone Data

Farhodov, Xurshid;Kwon, Oh-Heum;Moon, Kwang-Seok;Kwon, Oh-Jun;Lee, Suk-Hwan;Kwon, Ki-Ryong
- Journal of Korea Multimedia Society
- /
- v.22 no.12
- /
- pp.1415-1429
- /
- 2019
Nowadays object tracking process becoming one of the most challenging task in Computer Vision filed. A CSR-DCF (channel spatial reliability-discriminative correlation filter) tracking algorithm have been proposed on recent tracking benchmark that could achieve stat-of-the-art performance where channel spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process with only two simple standard features, HoGs and Color names. However, there are some cases where this method cannot track properly, like overlapping, occlusions, motion blur, changing appearance, environmental variations and so on. To overcome that kind of complications a new modified version of CSR-DCF algorithm has been proposed by integrating deep learning based object detection and CSRT tracker which implemented in OpenCV library. As an object detection model, according to the comparable result of object detection methods and by reason of high efficiency and celerity of Faster RCNN (Region-based Convolutional Neural Network) has been used, and combined with CSRT tracker, which demonstrated outstanding real-time detection and tracking performance. The results indicate that the trained object detection model integration with tracking algorithm gives better outcomes rather than using tracking algorithm or filter itself.
https://doi.org/10.9717/kmms.2019.22.12.1415 인용 PDF KSCI HTML

A Study on Atmospheric Data Anomaly Detection Algorithm based on Unsupervised Learning Using Adversarial Generative Neural Network (적대적 생성 신경망을 활용한 비지도 학습 기반의 대기 자료 이상 탐지 알고리즘 연구)

Yang, Ho-Jun;Lee, Seon-Woo;Lee, Mun-Hyung;Kim, Jong-Gu;Choi, Jung-Mu;Shin, Yu-mi;Lee, Seok-Chae;Kwon, Jang-Woo;Park, Ji-Hoon;Jung, Dong-Hee;Shin, Hye-Jung
- Journal of Convergence for Information Technology
- /
- v.12 no.4
- /
- pp.260-269
- /
- 2022
In this paper, We propose an anomaly detection model using deep neural network to automate the identification of outliers of the national air pollution measurement network data that is previously performed by experts. We generated training data by analyzing missing values and outliers of weather data provided by the Institute of Environmental Research and based on the BeatGAN model of the unsupervised learning method, we propose a new model by changing the kernel structure, adding the convolutional filter layer and the transposed convolutional filter layer to improve anomaly detection performance. In addition, by utilizing the generative features of the proposed model to implement and apply a retraining algorithm that generates new data and uses it for training, it was confirmed that the proposed model had the highest performance compared to the original BeatGAN models and other unsupervised learning model like Iforest and One Class SVM. Through this study, it was possible to suggest a method to improve the anomaly detection performance of proposed model while avoiding overfitting without additional cost in situations where training data are insufficient due to various factors such as sensor abnormalities and inspections in actual industrial sites.
https://doi.org/10.22156/CS4SMB.2022.12.04.260 인용 PDF KSCI

Tracking Players in Broadcast Sports

Sudeep, Kandregula Manikanta;Amarnath, Voddapally;Pamaar, Angoth Rahul;De, Kanjar;Saini, Rajkumar;Roy, Partha Pratim
- Journal of Multimedia Information System
- /
- v.5 no.4
- /
- pp.257-264
- /
- 2018
Over the years application of computer vision techniques in sports videos for analysis have garnered interest among researchers. Videos of sports games like basketball, football are available in plenty due to heavy popularity and coverage. The goal of the researchers is to extract information from sports videos for analytics which requires the tracking of the players. In this paper, we explore use of deep learning networks for player spotting and propose an algorithm for tracking using Kalman filters. We also propose an algorithm for finding distance covered by players. Experiments on sports video datasets have shown promising results when compared with standard techniques like mean shift filters.
https://doi.org/10.9717/JMIS.2018.5.4.257 인용 PDF KSCI HTML

A Study on Hazardous Sound Detection Robust to Background Sound and Noise (배경음 및 잡음에 강인한 위험 소리 탐지에 관한 연구)

Ha, Taemin;Kang, Sanghoon;Cho, Seongwon
- Journal of Korea Multimedia Society
- /
- v.24 no.12
- /
- pp.1606-1613
- /
- 2021
Recently various attempts to control hardware through integration of sensors and artificial intelligence have been made. This paper proposes a smart hazardous sound detection at home. Previous sound recognition methods have problems due to the processing of background sounds and the low recognition accuracy of high-frequency sounds. To get around these problems, a new MFCC(Mel-Frequency Cepstral Coefficient) algorithm using Wiener filter, modified filterbank is proposed. Experiments for comparing the performance of the proposed method and the original MFCC were conducted. For the classification of feature vectors extracted using the proposed MFCC, DNN(Deep Neural Network) was used. Experimental results showed the superiority of the modified MFCC in comparison to the conventional MFCC in terms of 1% higher training accuracy and 6.6% higher recognition rate.
https://doi.org/10.9717/kmms.2021.24.12.1606 인용 PDF KSCI HTML

Dynamic Filter Pruning for Compression of Deep Neural Network. (동적 필터 프루닝 기법을 이용한 심층 신경망 압축)

Cho, InCheon;Bae, SungHo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2020.07a
- /
- pp.675-679
- /
- 2020
최근 이미지 분류의 성능 향상을 위해 깊은 레이어와 넓은 채널을 가지는 모델들이 제안되어져 왔다. 높은 분류 정확도를 보이는 모델을 제안하는 것은 과한 컴퓨팅 파워와 계산시간을 요구한다. 본 논문에서는 이미지 분류 기법에서 사용되는 딥 뉴럴 네트워크 모델에 있어, 프루닝 방법을 통해 상대적으로 불필요한 가중치를 제거함과 동시에 분류 정확도 하락을 최소로 하는 동적 필터 프루닝 방법을 제시한다. 원샷 프루닝 기법, 정적 필터 프루닝 기법과 다르게 제거된 가중치에 대해서 소생 기회를 제공함으로써 더 좋은 성능을 보인다. 또한, 재학습이 필요하지 않기 때문에 빠른 계산 속도와 적은 컴퓨팅 파워를 보장한다. ResNet20 에서 CIFAR10 데이터셋에 대하여 실험한 결과 약 50%의 압축률에도 88.74%의 분류 정확도를 보였다.
PDF

A Study on Application Method of Contour Image Learning to improve the Accuracy of CNN by Data (데이터별 딥러닝 학습 모델의 정확도 향상을 위한 외곽선 특징 적용방안 연구)

Kwon, Yong-Soo;Hwang, Seung-Yeon;Shin, Dong-Jin;Kim, Jeong-Joon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.22 no.4
- /
- pp.171-176
- /
- 2022
CNN is a type of deep learning and is a neural network used to process images or image data. The filter traverses the image and extracts features of the image to distinguish the image. Deep learning has the characteristic that the more data, the better models can be made, and CNN uses a method of artificially increasing the amount of data by means of data augmentation such as rotation, zoom, shift, and flip to compensate for the weakness of less data. When learning CNN, we would like to check whether outline image learning is helpful in improving performance compared to conventional data augmentation techniques.
https://doi.org/10.7236/JIIBC.2022.22.4.171 인용 PDF KSCI HTML

Voice-to-voice conversion using transformer network (Transformer 네트워크를 이용한 음성신호 변환)

Kim, June-Woo;Jung, Ho-Young
- Phonetics and Speech Sciences
- /
- v.12 no.3
- /
- pp.55-63
- /
- 2020
Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.
https://doi.org/10.13064/KSSS.2020.12.3.055 인용 PDF KSCI

A Novel RGB Channel Assimilation for Hyperspectral Image Classification using 3D-Convolutional Neural Network with Bi-Long Short-Term Memory

M. Preethi;C. Velayutham;S. Arumugaperumal
- International Journal of Computer Science & Network Security
- /
- v.23 no.3
- /
- pp.177-186
- /
- 2023
Hyperspectral imaging technology is one of the most efficient and fast-growing technologies in recent years. Hyperspectral image (HSI) comprises contiguous spectral bands for every pixel that is used to detect the object with significant accuracy and details. HSI contains high dimensionality of spectral information which is not easy to classify every pixel. To confront the problem, we propose a novel RGB channel Assimilation for classification methods. The color features are extracted by using chromaticity computation. Additionally, this work discusses the classification of hyperspectral image based on Domain Transform Interpolated Convolution Filter (DTICF) and 3D-CNN with Bi-directional-Long Short Term Memory (Bi-LSTM). There are three steps for the proposed techniques: First, HSI data is converted to RGB images with spatial features. Before using the DTICF, the RGB images of HSI and patch of the input image from raw HSI are integrated. Afterward, the pair features of spectral and spatial are excerpted using DTICF from integrated HSI. Those obtained spatial and spectral features are finally given into the designed 3D-CNN with Bi-LSTM framework. In the second step, the excerpted color features are classified by 2D-CNN. The probabilistic classification map of 3D-CNN-Bi-LSTM, and 2D-CNN are fused. In the last step, additionally, Markov Random Field (MRF) is utilized for improving the fused probabilistic classification map efficiently. Based on the experimental results, two different hyperspectral images prove that novel RGB channel assimilation of DTICF-3D-CNN-Bi-LSTM approach is more important and provides good classification results compared to other classification approaches.
https://doi.org/10.22937/IJCSNS.2023.23.3.19 인용 PDF

Search Result 59, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)