Search | Korea Science

Audio genre classification using deep learning (딥 러닝을 이용한 오디오 장르 분류)

Shin, Seong-Hyeon;Jang, Woo-Jin;Yun, Ho-won;Park, Ho-Chong
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2016.06a
- /
- pp.80-81
- /
- 2016
본 논문에서는 딥 러닝을 이용한 오디오 장르 분류 기술을 제안한다. 장르는 music, speech, effect 3가지로 정의하여 분류한다. 기존의 GMM을 이용한 장르 분류 기술은 speech의 인식률에 비해 music과 effect에 대한 인식률이 낮아 각 장르에 대한 인식률의 차이를 보인다. 이러한 문제를 해결하기 위해 본 논문에서는 딥 러닝을 이용해 높은 수준의 추상화 과정을 거쳐 더 세분된 학습을 진행한다. 제안한 방법을 사용하면 미세한 차이의 특성까지 학습해 장르에 대한 인식률의 차이를 줄일 수 있으며, 각 장르에 대해 GMM을 이용한 오디오 장르 분류보다 높은 인식률을 얻을 수 있다.
PDF

Centroid-model based music similarity with alpha divergence (알파 다이버전스를 이용한 무게중심 모델 기반 음악 유사도)

Seo, Jin Soo;Kim, Jeonghyun;Park, Jihyun
- The Journal of the Acoustical Society of Korea
- /
- v.35 no.2
- /
- pp.83-91
- /
- 2016
Music-similarity computation is crucial in developing music information retrieval systems for browsing and classification. This paper overviews the recently-proposed centroid-model based music retrieval method and applies the distributional similarity measures to the model for retrieval-performance evaluation. Probabilistic distance measures (also called divergence) compute the distance between two probability distributions in a certain sense. In this paper, we consider the alpha divergence in computing distance between two centroid models for music retrieval. The alpha divergence includes the widely-used Kullback-Leibler divergence and Bhattacharyya distance depending on the values of alpha. Experiments were conducted on both genre and singer datasets. We compare the music-retrieval performance of the distributional similarity with that of the vector distances. The experimental results show that the alpha divergence improves the performance of the centroid-model based music retrieval.
https://doi.org/10.7776/ASK.2016.35.2.083 인용 PDF KSCI

Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns (발생/소멸 패턴을 이용한 비정형 혼합 오디오의 주성분 검출)

Kim, Samuel
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.12
- /
- pp.224-231
- /
- 2013
Defining the concept of prominent audio content as the most informative audio content from the users' perspective within a given unstructured audio segment, we propose a simple but robust intensity-based attack/release pattern features to detect the prominent audio content. We also propose a web-based annotation procedure to retrieve users' subjective perception and annotated 18 hours of video clips across various genres, such as cartoon, movie, news, etc. The experiments with a linear classification method whose models are trained for speech, music, and sound effect demonstrate promising - but varying across the genres of programs - results (e.g., 86.7% weighted accuracy for speech-oriented talk shows and 49.3% weighted accuracy for {action movies}).
https://doi.org/10.5573/ieek.2013.50.12.224 인용 PDF KSCI

Music Genre Classification based on Deep Neural Network using Spikegram (스파이크그램을 이용한 심층 신경망 기반의 음악 장르 분류)

Yun, Ho-Won;Jang, Woo-Jin;Shin, Seong-Hyeon;Jang, Won;Cho, Hyo-Jin;Park, Ho-Chong
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2017.06a
- /
- pp.29-30
- /
- 2017
본 논문에서는 인간의 청각 기관을 모델링 한 스파이크그램 (spikegram)을 이용한 심층 신경망 기반의 음악 장르 분류 기술을 제안한다. 분류 대상은 GTZAN 데이터 세트의 10개 장르로 정의한다. 본 논문에서는 청각 기관의 인식 방법을 모델링한 방법을 이용하여 스파이크그램을 구하고, 스파이크그램에서 새로운 특성 벡터를 추출하는 방법을 제안한다. 제안하는 방법을 통해 심층 신경망에 적합한 특성 벡터를 구하고 이렇게 구한 특성 벡터로 신경망을 학습시켜 기존에 사용하던 다양한 방법들보다 높은 성능을 얻을 수 있다.
PDF

The Content-based Genre Classification using Representative Part of Music (음악의 대표구간을 이용한 내용기반 장르 판별에 관한 연구)

Lee, Jong-In;Kim, Byeong-Man
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2008.04a
- /
- pp.211-214
- /
- 2008
일부 음악 장르분류에 관한 기존 연구에서는 특징 추출을 위한 구간 선택 시 사람이 직접 음악의 주요 구간을 지정하는 방법을 사용하였다. 이러한 방법은 분류 성능이 좋은 반면 수작업으로 인한 부담으로 새롭게 등록되는 음악들에 대해 지속적으로 적용하기가 곤란하다. 이러한 이유로 최근 음악 장르 분류와 관련된 연구에서는 자동으로 추출구간을 선정하는 방법을 사용하고 있는데 이러한 연구의 대부분이 고정된 구간 (예, 30초 이후의 30초 구간)에서 특징을 추출하는 관계로 분류의 정확도가 떨어지는 문제점을 갖고 있다. 본 논문에서는 이러한 문제점을 해결하기 위해 음악 전체 구간에 대하여 반복구간을 파악하고, 그 중 음악을 대표할 수 있는 단일 대표구간을 선정한 후, 대표구간으로 부터 특징을 추출하여 장르 분류 시스템에 적용하는 방법을 제안하였다. 실험 결과, 기존 고정구간을 사용한 방법에 비해 괄목할 만한 성능 향상을 얻을 수 있었다.
PDF

Automatic Genre Classification using Music Harmonic Detection (화성정보 추출을 이용한 음악 장르분류)

Son Woo-Ram;Jung Min-Seok;An Joo-Young;Yoon Kyoung-Ro
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.06b
- /
- pp.280-282
- /
- 2006
저장매체의 대용량화와 인터넷을 이용한 디지털 음원의 활성화로 개인이 소유하는 음원이 급속도로 증가하고 있다. 많은 양의 음원을 보유하고 있는 상황에서 사용자의 편의를 증가시키기 위하여 다양한 검색/분류 방법들이 개발되고 사용되고 있다. 본 논문에서는 음원에 사용된 표현방식이나 디렉토리 구조, 파일이름, 텍스트 태그 등에 독립적으로 적용될 수 있도록 디지털 신호처리 이론에 기반하여 파형데이터를 분석하고, 화성학 이론에 기반한 패턴매칭 기술을 응용하여 음악의 장르와 나아가 분위기를 기반으로 분류하는 방법을 제시한다.
PDF

Extraction of Temporal and Spectral Features based on Spikegram for Music Genre Classification (음악 장르 분류를 위한 스파이크그램 기반의 시간 및 주파수 특성 추출 기술)

Jang, Won;Cho, Hyo-Jin;Shin, Seong-Hyeon;Park, Hochong
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.06a
- /
- pp.49-50
- /
- 2018
본 논문에서는 음악 장르 분류를 위한 시간 및 주파수 기반 스파이크그램 특성 추출 기술을 제안한다. 기존의 음악 장르 분류 시스템에서는 푸리에 변환 기반의 입력 특성을 주로 사용해 왔다. 푸리에 변환은 시간 축에서 프레임 단위로 평균적인 주파수 정보를 취하므로 낮은 시간 해상도를 갖지만, 스파이크그램은 샘플 단위의 주파수 정보를 갖고 있어 고해상도의 특성을 추출할 수 있다. 제안하는 기술은 이러한 시간 기반 특성을 추출하여 주파수 기반 특성 및 SNR 특성과 함께 심층 신경망의 입력으로 사용한다. 제안하는 특성을 사용하여 시간 기반 특성을 사용하지 않은 기존 스파이크그램 특성 기반 분류기의 성능을 개선하였으며, 다른 특성 및 분류기에 비해 적은 수의 특성 입력으로도 우수한 성능을 얻는 것을 확인하였다.
PDF

Analysis of Correlation between Real-time Sales Ranking and Information Provided by Mobile Movie Platform: Focus on Non-descriptive Information in Google Play Store's Best-selling Movies

Nam, Sangzo
- Journal of Advanced Information Technology and Convergence
- /
- v.9 no.2
- /
- pp.41-54
- /
- 2019
The cinema circuit is facing a digital, network, and mobile age, which expands non-theater accessibility to movies. Application platforms are situated as the most competitive business model that provide digital content such as games, music, books, and movies. Consumers can acquire content-related information not just offline, but online as well. Therefore, item information provided by application platforms is required. The information provided by application platforms consists of richly descriptive information such as storyline summary, consumer reviews, and related articles, while non-descriptive normative information covers data such as sales ranking, release date, genre, rental or purchase cost, domestic/foreign classification, consumer rating, number of consumer ratings, film rating, and so on. In this study, we surveyed and analyzed statistically the correlation between real-time sales ranking and other comparable non-descriptive information.
https://doi.org/10.14801/JAITC.2019.9.2.41 인용

Image Quality for TV Genre Depending on Viewers Experience (시청자 경험에 의한 TV장르별 화질)

Park, YungKyung
- Journal of Broadcast Engineering
- /
- v.26 no.3
- /
- pp.308-320
- /
- 2021
Conventional image quality studies have been focused on 'naturalness' and has relied on memory color. Memory colors are mainly formed for familiar objects with prior experience, and the more faithfully these memories are reflected, the more naturalness of the reproduced image quality increases. In particular, the brightness and saturation of memory colors play an important role in increasing the preference of image quality as well as naturalness. Therefore, in the case of existing image quality studies, image quality characteristics were studied focusing on natural objects and people with memory. We extracted representative images of each genre (sports, documentaries, news, entertainment and music, and movies), adjusted the brightness, contrast, and saturation of each image, and conducted an experiment to evaluate perceived quality. Based on situational context, the results of this classification indicated that genres of television content can be divided into two categories: proximate and indirect experiences. Proximate experience best characterizes outdoor sports, dramas, and nature documentaries, where their image qualities have shown to have a strong correlation with brightness and contrast. On the other hand, indirect experience best characterizes news, music shows and SF/action movies. The image quality perception for indirect experiences was shown to be closely related to and optimized by contrast and saturation.
https://doi.org/10.5909/JBE.2021.26.3.308 인용 PDF KSCI KPUBS

Representative Melodies Retrieval using Waveform and FFT Analysis of Audio (오디오의 파형과 FFT 분석을 이용한 대표 선율 검색)

Chung, Myoung-Bum;Ko, Il-Ju
- Journal of KIISE:Software and Applications
- /
- v.34 no.12
- /
- pp.1037-1044
- /
- 2007
Recently, we extract the representative melody of the music and index the music to reduce searching time at the content-based music retrieval system. The existing study has used MIDI data to extract a representative melody but it has a weak point that can use only MIDI data. Therefore, this paper proposes a representative melody retrieval method that can be use at all audio file format and uses digital signal processing. First, we use Fast Fourier Transform (FFT) and find the tempo and node for the representative melody retrieval. And we measure the frequency of high value that appears from PCM Data of each node. The point which the high value is gathering most is the starting point of a representative melody and an eight node from the starting point is a representative melody section of the audio data. To verity the performance of the method, we chose a thousand of the song and did the experiment to extract a representative melody from the song. In result, the accuracy of the extractive representative melody was 79.5% among the 737 songs which was found tempo.
PDF KSCI

Search Result 53, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)