Search | Korea Science

Comparison of Audio Event Detection Performance using DNN (DNN을 이용한 오디오 이벤트 검출 성능 비교)

Chung, Suk-Hwan;Chung, Yong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.13 no.3
- /
- pp.571-578
- /
- 2018
Recently, deep learning techniques have shown superior performance in various kinds of pattern recognition. However, there have been some arguments whether the DNN performs better than the conventional machine learning techniques when classification experiments are done using a small amount of training data. In this study, we compared the performance of the conventional GMM and SVM with DNN, a kind of deep learning techniques, in audio event detection. When tested on the same data, DNN has shown superior overall performance but SVM was better than DNN in segment-based F-score.
https://doi.org/10.13067/JKIECS.2018.13.3.571 인용 PDF KSCI

Automatic Generation of Video Metadata for the Super-personalized Recommendation of Media

Yong, Sung Jung;Park, Hyo Gyeong;You, Yeon Hwi;Moon, Il-Young
- Journal of information and communication convergence engineering
- /
- v.20 no.4
- /
- pp.288-294
- /
- 2022
The media content market has been growing, as various types of content are being mass-produced owing to the recent proliferation of the Internet and digital media. In addition, platforms that provide personalized services for content consumption are emerging and competing with each other to recommend personalized content. Existing platforms use a method in which a user directly inputs video metadata. Consequently, significant amounts of time and cost are consumed in processing large amounts of data. In this study, keyframes and audio spectra based on the YCbCr color model of a movie trailer were extracted for the automatic generation of metadata. The extracted audio spectra and image keyframes were used as learning data for genre recognition in deep learning. Deep learning was implemented to determine genres among the video metadata, and suggestions for utilization were proposed. A system that can automatically generate metadata established through the results of this study will be helpful for studying recommendation systems for media super-personalization.
https://doi.org/10.56977/jicce.2022.20.4.288 인용 PDF KSCI

Convolutional Neural Network based Audio Event Classification

Lim, Minkyu;Lee, Donghyun;Park, Hosung;Kang, Yoseb;Oh, Junseok;Park, Jeong-Sik;Jang, Gil-Jin;Kim, Ji-Hwan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.6
- /
- pp.2748-2760
- /
- 2018
This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.
https://doi.org/10.3837/tiis.2018.06.017 인용 PDF KSCI

Analysis of deep learning-based deep clustering method (딥러닝 기반의 딥 클러스터링 방법에 대한 분석)

Hyun Kwon;Jun Lee
- Convergence Security Journal
- /
- v.23 no.4
- /
- pp.61-70
- /
- 2023
Clustering is an unsupervised learning method that involves grouping data based on features such as distance metrics, using data without known labels or ground truth values. This method has the advantage of being applicable to various types of data, including images, text, and audio, without the need for labeling. Traditional clustering techniques involve applying dimensionality reduction methods or extracting specific features to perform clustering. However, with the advancement of deep learning models, research on deep clustering techniques using techniques such as autoencoders and generative adversarial networks, which represent input data as latent vectors, has emerged. In this study, we propose a deep clustering technique based on deep learning. In this approach, we use an autoencoder to transform the input data into latent vectors, and then construct a vector space according to the cluster structure and perform k-means clustering. We conducted experiments using the MNIST and Fashion-MNIST datasets in the PyTorch machine learning library as the experimental environment. The model used is a convolutional neural network-based autoencoder model. The experimental results show an accuracy of 89.42% for MNIST and 56.64% for Fashion-MNIST when k is set to 10.
https://doi.org/10.33778/kcsa.2023.23.4.061 인용 PDF

Deep Learning-Based User Emergency Event Detection Algorithms Fusing Vision, Audio, Activity and Dust Sensors (영상, 음성, 활동, 먼지 센서를 융합한 딥러닝 기반 사용자 이상 징후 탐지 알고리즘)

Jung, Ju-ho;Lee, Do-hyun;Kim, Seong-su;Ahn, Jun-ho
- Journal of Internet Computing and Services
- /
- v.21 no.5
- /
- pp.109-118
- /
- 2020
Recently, people are spending a lot of time inside their homes because of various diseases. It is difficult to ask others for help in the case of a single-person household that is injured in the house or infected with a disease and needs help from others. In this study, an algorithm is proposed to detect emergency event, which are situations in which single-person households need help from others, such as injuries or disease infections, in their homes. It proposes vision pattern detection algorithms using home CCTVs, audio pattern detection algorithms using artificial intelligence speakers, activity pattern detection algorithms using acceleration sensors in smartphones, and dust pattern detection algorithms using air purifiers. However, if it is difficult to use due to security issues of home CCTVs, it proposes a fusion method combining audio, activity and dust pattern sensors. Each algorithm collected data through YouTube and experiments to measure accuracy.
https://doi.org/10.7472/jksii.2020.21.5.109 인용 PDF KSCI HTML

Generating Audio Adversarial Examples Using a Query-Efficient Decision-Based Attack (질의 효율적인 의사 결정 공격을 통한 오디오 적대적 예제 생성 연구)

Seo, Seong-gwan;Mun, Hyunjun;Son, Baehoon;Yun, Joobeom
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.32 no.1
- /
- pp.89-98
- /
- 2022
As deep learning technology was applied to various fields, research on adversarial attack techniques, a security problem of deep learning models, was actively studied. adversarial attacks have been mainly studied in the field of images. Recently, they have even developed a complete decision-based attack technique that can attack with just the classification results of the model. However, in the case of the audio field, research is relatively slow. In this paper, we applied several decision-based attack techniques to the audio field and improved state-of-the-art attack techniques. State-of-the-art decision-attack techniques have the disadvantage of requiring many queries for gradient approximation. In this paper, we improve query efficiency by proposing a method of reducing the vector search space required for gradient approximation. Experimental results showed that the attack success rate was increased by 50%, and the difference between original audio and adversarial examples was reduced by 75%, proving that our method could generate adversarial examples with smaller noise.
https://doi.org/10.13089/JKIISC.2022.32.1.89 인용 PDF KSCI HTML

Deep Learning based Singing Voice Synthesis Modeling (딥러닝 기반 가창 음성합성(Singing Voice Synthesis) 모델링)

Kim, Minae;Kim, Somin;Park, Jihyun;Heo, Gabin;Choi, Yunjeong
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.10a
- /
- pp.127-130
- /
- 2022
This paper is a study on singing voice synthesis modeling using a generator loss function, which analyzes various factors that may occur when applying BEGAN among deep learning algorithms optimized for image generation to Audio domain. and we conduct experiments to derive optimal quality. In this paper, we focused the problem that the L1 loss proposed in the BEGAN-based models degrades the meaning of hyperparameter the gamma(𝛾) which was defined to control the diversity and quality of generated audio samples. In experiments we show that our proposed method and finding the optimal values through tuning, it can contribute to the improvement of the quality of the singing synthesis product.
PDF

A Sparse Target Matrix Generation Based Unsupervised Feature Learning Algorithm for Image Classification

Zhao, Dan;Guo, Baolong;Yan, Yunyi
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.6
- /
- pp.2806-2825
- /
- 2018
Unsupervised learning has shown good performance on image, video and audio classification tasks, and much progress has been made so far. It studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. Many promising deep learning systems are commonly trained by the greedy layerwise unsupervised learning manner. The performance of these deep learning architectures benefits from the unsupervised learning ability to disentangling the abstractions and picking out the useful features. However, the existing unsupervised learning algorithms are often difficult to train partly because of the requirement of extensive hyperparameters. The tuning of these hyperparameters is a laborious task that requires expert knowledge, rules of thumb or extensive search. In this paper, we propose a simple and effective unsupervised feature learning algorithm for image classification, which exploits an explicit optimizing way for population and lifetime sparsity. Firstly, a sparse target matrix is built by the competitive rules. Then, the sparse features are optimized by means of minimizing the Euclidean norm ($L_2$) error between the sparse target and the competitive layer outputs. Finally, a classifier is trained using the obtained sparse features. Experimental results show that the proposed method achieves good performance for image classification, and provides discriminative features that generalize well.
https://doi.org/10.3837/tiis.2018.06.020 인용 PDF KSCI

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

Kim, Yun-Su;Seok, Jong-Won
- Journal of IKEEE
- /
- v.24 no.4
- /
- pp.1122-1128
- /
- 2020
Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.
https://doi.org/10.7471/ikeee.2020.24.4.1122 인용 PDF KSCI

Noise Canceler Based on Deep Learning Using Discrete Wavelet Transform (이산 Wavelet 변환을 이용한 딥러닝 기반 잡음제거기)

Haeng-Woo Lee
- The Journal of the Korea institute of electronic communication sciences
- /
- v.18 no.6
- /
- pp.1103-1108
- /
- 2023
In this paper, we propose a new algorithm for attenuating the background noises in acoustic signal. This algorithm improves the noise attenuation performance by using the FNN(: Full-connected Neural Network) deep learning algorithm instead of the existing adaptive filter after wavelet transform. After wavelet transforming the input signal for each short-time period, noise is removed from a single input audio signal containing noise by using a 1024-1024-512-neuron FNN deep learning model. This transforms the time-domain voice signal into the time-frequency domain so that the noise characteristics are well expressed, and effectively predicts voice in a noisy environment through supervised learning using the conversion parameter of the pure voice signal for the conversion parameter. In order to verify the performance of the noise reduction system proposed in this study, a simulation program using Tensorflow and Keras libraries was written and a simulation was performed. As a result of the experiment, the proposed deep learning algorithm improved Mean Square Error (MSE) by 30% compared to the case of using the existing adaptive filter and by 20% compared to the case of using the STFT(: Short-Time Fourier Transform) transform effect was obtained.
https://doi.org/10.13067/JKIECS.2023.18.6.1103 인용 PDF

Search Result 73, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)