Search | Korea Science

Image Mood Classification Using Deep CNN and Its Application to Automatic Video Generation (심층 CNN을 활용한 영상 분위기 분류 및 이를 활용한 동영상 자동 생성)

Cho, Dong-Hee;Nam, Yong-Wook;Lee, Hyun-Chang;Kim, Yong-Hyuk
- Journal of the Korea Convergence Society
- /
- v.10 no.9
- /
- pp.23-29
- /
- 2019
In this paper, the mood of images was classified into eight categories through a deep convolutional neural network and video was automatically generated using proper background music. Based on the collected image data, the classification model is learned using a multilayer perceptron (MLP). Using the MLP, a video is generated by using multi-class classification to predict image mood to be used for video generation, and by matching pre-classified music. As a result of 10-fold cross-validation and result of experiments on actual images, each 72.4% of accuracy and 64% of confusion matrix accuracy was achieved. In the case of misclassification, by classifying video into a similar mood, it was confirmed that the music from the video had no great mismatch with images.
https://doi.org/10.15207/JKCS.2019.10.9.023 인용 PDF KSCI

Humming based High Quality Music Creation (허밍을 이용한 고품질 음악 생성)

Lee, Yoonjae;Kim, Sunmin
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2014.10a
- /
- pp.146-149
- /
- 2014
In this paper, humming based automatic music creation method is described. It is difficult for the general public which does not have music theory to compose the music in general. However, almost people can make the main melody by a humming. With this motivation, a melody and chord sequences are estimated by the humming analysis. In this paper, humming is generated without a metronome. Then based on the estimated chord sequence, accompaniment is generated using the MIDI template matched to each chord. The 5 Genre is supported in the music creation. The melody transcription is evaluated in terms of onset and pitch estimation accuracy and MOS evaluation is used for created music evaluation.
PDF

Development of Algorithms for Correcting and Mapping High-Resolution Side Scan Sonar Imagery (고해상도 사이드 스캔 소나 영상의 보정 및 매핑 알고리즘의 개발)

이동진;박요섭;김학일
- Korean Journal of Remote Sensing
- /
- v.17 no.1
- /
- pp.45-56
- /
- 2001
To acquire seabed information, the mosaic images of the seabed were generated using Side Scan Sonar. Short time energy function which is needed for slant range correction is proposed to get the height of Tow-Fish to the reflected acoustic amplitudes of each ping, and that leads to a mosaic image without water column. While generating mosaic image, maximum value, last value and average value are used for the measure of a pixel in the mosaic image and 3-D information was kept by using acoustic amplitudes which were heading for specific direction. As a generating method of mosaic image, low resolution mosaic image which is over 1m/pixel resolution was generated for whole survey area first, and then high resolution mosaic image which is generated under 0.1m/pixel resolution was generated for the selected area. Rocks, ripple mark, sand wave, tidal flat and artificial fish reef are found in the mosaic image.
https://doi.org/10.7780/kjrs.2001.17.1.45 인용 PDF

A Study on Enhanced XR Software Platform based on Superintelligence (한류문화 전수를 위한 수퍼인텔리전스 기반 확장현실 소프트웨어 플랫폼 설계)

Ji, Sumi;Kwak, Jeonghoon;Sung, Yunsick
- Proceedings of the Korea Information Processing Society Conference
- /
- 2020.11a
- /
- pp.43-44
- /
- 2020
최근 케이팝 문화의 확산으로 한류라는 브랜드가 구축되고, 이에 관심이 급증함에 따라 관련 콘텐츠 시장의 혁신이 요구되고 있다. 본 논문은 한류문화 전수를 위한 수퍼인텔리전스 기반의 확장현실(XR) 소프트웨어 플랫폼에 관한 것으로, 본 플랫폼을 통하여 한류 문화 체험 및 전수가 가능하다. 세부적으로는 한류 콘텐츠 전수를 위한 확장현실 기반 공간을 바탕으로 딥러닝 기반 영상 생성 및 동작 분석기술, 자동 음악생성 기술, 한류 문화 데이터 보안 기술을 포함한 통합적인 플랫폼 환경을 설계하여 제안한다. 또한 이 플랫폼의 3차원 동작 분류 및 예측을 향상 시킬 수 있는 방법을 제안한다.
https://doi.org/10.3745/PKIPS.y2020m11a.43 인용 PDF

Salient Region Detection Algorithm for Music Video Browsing (뮤직비디오 브라우징을 위한 중요 구간 검출 알고리즘)

Kim, Hyoung-Gook;Shin, Dong
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.2
- /
- pp.112-118
- /
- 2009
This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.
https://doi.org/10.7776/ASK.2009.28.2.112 인용 PDF KSCI

사이버음향(CyberSound) - 가상세계의 음향

김형교;이의택
- Broadcasting and Media Magazine
- /
- v.2 no.3
- /
- pp.23-31
- /
- 1997
컴퓨터 음향발생에 관한 연구는 컴퓨터 음악, 인간-컴퓨터 상호작용, 데이터 청각화등의 분야에서 오랫동안 진행되어 왔지만, 최근 들어 컴퓨터 애니메이션이나 가상세계등에서 시각적 효과와 함께 보다 입체감 있고 현실감 있는 가상환경을 제공하기 위해 더욱더 중요한 문제로 떠오르고 있다. 지금까지 음향발생을 위해 음향의 모델링이나 합성등 음향 자체에 대한 요소 기술들에 관해서는 많은 연구가 진행되었으나 컴퓨터 애니메이션, 가상세계등과 같이 영상내 동작 내지 사건과 음향이 서로 밀접하게 연관된 분야에서 필수적인 음향을 영상내 동작과 통합 처리할 수 있는 기술에 대한 연구는 초보적인 단계에 머무르고 있다. 최근 들어, 음향의 입체감과 임장감을 강화하기 위하여 3차원 음향이라는 개념이 도입되고 있고 이의 구현에 대한 연구가 활발하게 진행되고 있다. 여기서는 컴퓨터 애니메이션이나 가상현실등에서 영상내 물체의 움직임이나 사건 그와 동기된 음향의 자동생성 및 이의 3차원 음향효과 발생 기술 원리를 사이버음향(CyberSound)이라는 개념으로 묶어서 소개하면서, 이의 전망을 기술하고자 한다.
PDF

Development of Automative Loudness Control Technique based on Audio Contents Analysis using Deep Learning (딥러닝을 이용한 오디오 콘텐츠 분석 기반의 자동 음량 제어 기술 개발)

Lee, Young Han;Cho, Choongsang;Kim, Je Woo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2018.11a
- /
- pp.42-43
- /
- 2018
국내 디지털 방송 프로그램은 2016년 방송법 개정 이후, ITU-R / EBU에서 제안한 측정 방식을 활용하여 채널 및 프로그램 간의 음량을 맞추어 제공되고 있다. 일반적으로 뉴스나 중계와 같이 실시간으로 음량을 맞춰야 하는 분야를 제외하고는 평균 음량을 규정에 맞춰 송출하고 있다. 본 논문에서는 일괄적으로 평균 음량을 맞출 경우 발생하는 저음량의 명료도를 높이기 위한 기술을 제안한다. 즉, 방송 음량을 조절하는 기술 중의 하나로 오디오 콘텐츠를 분석하여 구간별 음량 조절 정도를 달리함으로써 저음량에서의 음성은 상대적으로 높은 음량을 가지고 배경음악 등을 상대적으로 낮음 음량을 가지도록 생성함으로써 명료도를 높이는 방식을 제안한다. 제안한 방식의 성능을 확인하기 위해 오디오 콘텐츠 분석 정확도 측정과 오디오 파형 분석을 실시하였으며 이를 통해 기존의 음량 제어 기술과 비교하여 음성 구간에 대해 음량을 증폭시키는 것을 확인하였다.
PDF

A study on customized ART convergence service for non-face-to-face art education service (비대면 미술교육 서비스를 위한 맞춤형 ART 융합 서비스 연구)

Kim, Hyeong-Gyun
- Journal of Digital Convergence
- /
- v.20 no.5
- /
- pp.275-280
- /
- 2022
This paper intends to propose a technology for recommending convergence/composite art playlist contents that are fused with tags suitable for the user's situational information and taste to users for non-face-to-face art appreciation education. For the implementation of the proposed technology, the characteristics of works of art are analyzed, and related music and works of art are matched based on the tags of the analyzed works. In addition, we would like to propose a technology that automatically creates content for fusion and complex art viewing playlists using matched works.
https://doi.org/10.14400/JDC.2022.20.5.275 인용 PDF KSCI

SAAnnot-C3Pap: Ground Truth Collection Technique of Playing Posture Using Semi Automatic Annotation Method (SAAnnot-C3Pap: 반자동 주석화 방법을 적용한 연주 자세의 그라운드 트루스 수집 기법)

Park, So-Hyun;Kim, Seo-Yeon;Park, Young-Ho
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.10
- /
- pp.409-418
- /
- 2022
In this paper, we propose SAAnnot-C3Pap, a semi-automatic annotation method for obtaining ground truth of a player's posture. In order to obtain ground truth about the two-dimensional joint position in the existing music domain, openpose, a two-dimensional posture estimation method, was used or manually labeled. However, automatic annotation methods such as the existing openpose have the disadvantages of showing inaccurate results even though they are fast. Therefore, this paper proposes SAAnnot-C3Pap, a semi-automated annotation method that is a compromise between the two. The proposed approach consists of three main steps: extracting postures using openpose, correcting the parts with errors among the extracted parts using supervisely, and then analyzing the results of openpose and supervisely. Perform the synchronization process. Through the proposed method, it was possible to correct the incorrect 2D joint position detection result that occurred in the openpose, solve the problem of detecting two or more people, and obtain the ground truth in the playing posture. In the experiment, we compare and analyze the results of the semi-automated annotation method openpose and the SAAnnot-C3Pap proposed in this paper. As a result of comparison, the proposed method showed improvement of posture information incorrectly collected through openpose.
https://doi.org/10.3745/KTSDE.2022.11.10.409 인용 PDF KSCI

Automatic Generation Subtitle Service with Kinetic Typography according to Music Sentimental Analysis (음악 감정 분석을 통한 키네틱 타이포그래피 자막 자동 생성 서비스)

Ji, Youngseo;Lee, Haram;Lim, SoonBum
- Journal of Korea Multimedia Society
- /
- v.24 no.8
- /
- pp.1184-1191
- /
- 2021
In a pop song, the creator's intention is communicated to the user through music and lyrics. Lyric meaning is as important as music, but in most cases lyrics are delivered to users in a static form without non-verbal cues. Providing lyrics in a static text format is inefficient in conveying the emotions of a music. Recently, lyrics video with kinetic typography are increasingly provided, but producing them requires expertise and a lot of time. Therefore, in this system, the emotions of the lyrics are found through the analysis of the text of the lyrics, and the deep learning model is trained with the data obtained by converting the melody into a Mel-spectrogram format to find the appropriate emotions for the music. It sets properties such as motion, font, and color using the emotions found in the music, and automatically creates a kinetic typography video. In this study, we tried to enhance the effect of conveying the meaning of music through this system.
https://doi.org/10.9717/kmms.2021.24.8.1184 인용 PDF KSCI HTML

Search Result 33, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)