• Title/Summary/Keyword: 오디오 추출

Search Result 170, Processing Time 0.026 seconds

Audio-Visual Integration based Multi-modal Speech Recognition System (오디오-비디오 정보 융합을 통한 멀티 모달 음성 인식 시스템)

  • Lee, Sahng-Woon;Lee, Yeon-Chul;Hong, Hun-Sop;Yun, Bo-Hyun;Han, Mun-Sung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.707-710
    • /
    • 2002
  • 본 논문은 오디오와 비디오 정보의 융합을 통한 멀티 모달 음성 인식 시스템을 제안한다. 음성 특징 정보와 영상 정보 특징의 융합을 통하여 잡음이 많은 환경에서 효율적으로 사람의 음성을 인식하는 시스템을 제안한다. 음성 특징 정보는 멜 필터 캡스트럼 계수(Mel Frequency Cepstrum Coefficients: MFCC)를 사용하며, 영상 특징 정보는 주성분 분석을 통해 얻어진 특징 벡터를 사용한다. 또한, 영상 정보 자체의 인식률 향상을 위해 피부 색깔 모델과 얼굴의 형태 정보를 이용하여 얼굴 영역을 찾은 후 강력한 입술 영역 추출 방법을 통해 입술 영역을 검출한다. 음성-영상 융합은 변형된 시간 지연 신경 회로망을 사용하여 초기 융합을 통해 이루어진다. 실험을 통해 음성과 영상의 정보 융합이 음성 정보만을 사용한 것 보다 대략 5%-20%의 성능 향상을 보여주고 있다.

  • PDF

The Study on the MPEG-2 Video Bitrate Control using GOP Structure (GOP구조를 이용한 MPEG2 비디오 비트율 제어에 관한 연구)

  • Kim, Sang-Dong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.889-891
    • /
    • 2005
  • 디지털 기술과 통신 기술의 발전으로 멀티미디어 컨텐츠가 급격히 증가하고 있다. 이러한 멀티미디어 컨텐츠는 다양한 유무선 서비스 환경에서 실시간 서비스가 가능해야 한다. 그러기 위해서는 멀티미디어 컨텐츠 중 가장 큰 비중을 차지하는 영상 및 오디오 컨텐츠에 대한 압축 및 전송 기술이 요구된다. 현재 여러 분야에서 영상 및 오디오에 대한 압축 표준으로는 MPEG 이 자리잡았다. 그리고 MPEG에 대한 많은 개선 노력이 있었으며 특히 테스트 모델인 TM5를 비롯한 많은 연구에서 벡터 추출이나 양자화를 이용한 방법을 중심으로 비트율을 제어하기 위한 부호화기 모델이 제시되었다. 본 논문에서는 기존의 연구 접근 방식과는 다르게 프레임 유형간의 거리를 이용하여 영상의 특성에 따라 보다 적합한 프레임 구조를 찾아 제안하였다. 즉 영상의 복잡도와 변화도에 따라 영상의 종류를 구분하고 각 종류별 영상에 대한 표준 MPEG 인코딩 화질을 기준으로 삼은 후, GOP구조 내에서 프레임 유형간의 거리를 조정하여 화질을 손상하지 않고 가장 적은 비트율을 보상하는 프레임구조를 실험 및 분석을 통해 찾아 제안하였다.

  • PDF

A dynamic character using watermarking technique (워터마킹을 이용한 동적캐릭터)

  • Park, Kyi-Tae;Kim, Kab-Il;Son, Young-Ik
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.464-467
    • /
    • 2003
  • 본 논문에서는 워터마킹을 응용한 동적 캐릭터 기술을 제안한다. 오디오 신호에 임의의 동작을 위한 일련의 동작코드를 들리지 않게 삽입하면, 이 캐릭터는 선행 처리된 오디오 파일이 재생될 때 마이크를 통해 얻은 소리를 처리하고 은닉된 코드를 추출함으로써 그 코드에 지정된 행동을 할 수 있다. 예를 들어 로봇이 동작코드가 은닉된 음악에 맞추어 지시된 춤을 출 수 있는 것이다. 이를 위해 우리는 워터마킹 기술을 적절히 응용하였으며, 제안된 기술은 공기(air)를 매질로 삼는 아날로그 채널에서 발생하는 잡음, 로봇과 음원의 거리에 따른 음의 감쇄, 그리고 동기화 등과 같은 문제들을 해결하였다. 여러 가지 상황에 따른 제안된 기술의 실험을 통해 제안된 기술의 성능을 입증하였다.

  • PDF

Similar Movie Contents Retrieval Using Peak Features from Audio (오디오의 Peak 특징을 이용한 동일 영화 콘텐츠 검색)

  • Chung, Myoung-Bum;Sung, Bo-Kyung;Ko, Il-Ju
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.11
    • /
    • pp.1572-1580
    • /
    • 2009
  • Combing through entire video files for the purpose of recognizing and retrieving matching movies requires much time and memory space. Instead, most current similar movie-matching methods choose to analyze only a part of each movie's video-image information. Yet, these methods still share a critical problem of erroneously recognizing as being different matching videos that have been altered only in resolution or converted merely with a different codecs. This paper proposes an audio-information-based search algorithm by which similar movies can be identified. The proposed method prepares and searches through a database of movie's spectral peak information that remains relatively steady even with changes in the bit-rate, codecs, or sample-rate. The method showed a 92.1% search success rate, given a set of 1,000 video files whose audio-bit-rate had been altered or were purposefully written in a different codec.

  • PDF

Auto fitting Parameter Extraction for Digital Hearing Aids (디지털 보청기의 자동 보정 파라미터 추출)

  • 석수영;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.495-505
    • /
    • 2000
  • In this paper, we propose an efficient auto-fitting system for digital hearing-aids which automatically adjusts the fitting parameters according to the auditory characteristics of hearing handicapped person. The fitting parameters are extracted from audiogram of hearing handicapped and are applied to digital hearing-aid purposed GM3036 chip. The characteristics of each parameter are compared with those from theoretical 2cc graph. The purposed system has applied to 50 patients and their satisfaction ratios show to the very high. As results, it shows effectiveness of proposed system.

  • PDF

A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning (DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1281-1297
    • /
    • 2016
  • In recent years, personal videos have seen a tremendous growth due to the substantial increase in the use of smart devices and networking services in which users create and share video content easily without many restrictions. However, taking both into account would significantly improve event detection performance because videos generally have multiple modalities and the frame data in video varies at different time points. This paper proposes an event detection method. In this method, high-level features are first extracted from multiple modalities in the videos, and the features are rearranged according to time sequence. Then the association of the modalities is learned by means of DNN to produce a personal video event detector. In our proposed method, audio and image data are first synchronized and then extracted. Then, the result is input into GoogLeNet as well as Multi-Layer Perceptron (MLP) to extract high-level features. The results are then re-arranged in time sequence, and every video is processed to extract one feature each for training by means of DNN.

An Embedding /Extracting Method of Audio Watermark Information for High Quality Stereo Music (고품질 스테레오 음악을 위한 오디오 워터마크 정보 삽입/추출 기술)

  • Bae, Kyungyul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.21-35
    • /
    • 2018
  • Since the introduction of MP3 players, CD recordings have gradually been vanishing, and the music consuming environment of music users is shifting to mobile devices. The introduction of smart devices has increased the utilization of music through music playback, mass storage, and search functions that are integrated into smartphones and tablets. At the time of initial MP3 player supply, the bitrate of the compressed music contents generally was 128 Kbps. However, as increasing of the demand for high quality music, sound quality of 384 Kbps appeared. Recently, music content of FLAC (Free License Audio Codec) format using lossless compression method is becoming popular. The download service of many music sites in Korea has classified by unlimited download with technical protection and limited download without technical protection. Digital Rights Management (DRM) technology is used as a technical protection measure for unlimited download, but it can only be used with authenticated devices that have DRM installed. Even if music purchased by the user, it cannot be used by other devices. On the contrary, in the case of music that is limited in quantity but not technically protected, there is no way to enforce anyone who distributes it, and in the case of high quality music such as FLAC, the loss is greater. In this paper, the author proposes an audio watermarking technology for copyright protection of high quality stereo music. Two kinds of information, "Copyright" and "Copy_free", are generated by using the turbo code. The two watermarks are composed of 9 bytes (72 bits). If turbo code is applied for error correction, the amount of information to be inserted as 222 bits increases. The 222-bit watermark was expanded to 1024 bits to be robust against additional errors and finally used as a watermark to insert into stereo music. Turbo code is a way to recover raw data if the damaged amount is less than 15% even if part of the code is damaged due to attack of watermarked content. It can be extended to 1024 bits or it can find 222 bits from some damaged contents by increasing the probability, the watermark itself has made it more resistant to attack. The proposed algorithm uses quantization in DCT so that watermark can be detected efficiently and SNR can be improved when stereo music is converted into mono. As a result, on average SNR exceeded 40dB, resulting in sound quality improvements of over 10dB over traditional quantization methods. This is a very significant result because it means relatively 10 times improvement in sound quality. In addition, the sample length required for extracting the watermark can be extracted sufficiently if the length is shorter than 1 second, and the watermark can be completely extracted from music samples of less than one second in all of the MP3 compression having a bit rate of 128 Kbps. The conventional quantization method can extract the watermark with a length of only 1/10 compared to the case where the sampling of the 10-second length largely fails to extract the watermark. In this study, since the length of the watermark embedded into music is 72 bits, it provides sufficient capacity to embed necessary information for music. It is enough bits to identify the music distributed all over the world. 272 can identify $4*10^{21}$, so it can be used as an identifier and it can be used for copyright protection of high quality music service. The proposed algorithm can be used not only for high quality audio but also for development of watermarking algorithm in multimedia such as UHD (Ultra High Definition) TV and high-resolution image. In addition, with the development of digital devices, users are demanding high quality music in the music industry, and artificial intelligence assistant is coming along with high quality music and streaming service. The results of this study can be used to protect the rights of copyright holders in these industries.

Implementation of Musical Note Generation System using Rhythm Information (리듬정보를 이용한 악보생성 시스템 구현)

  • 소두석;최재원;이종혁
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.6
    • /
    • pp.1210-1216
    • /
    • 2003
  • Traditional indexing mechanism are based on the song's metadata such as the title and the composer and so on. However, these system have a major limitation that users have to know the metadata of the songs they want to retrieve. In order to solve these limitation, we proposed a rhythm extraction system that allows users to retrieve music information efficiently from a large music database using the rhythm that is defined as the parts of the music.

Relation Extraction between Image Objects using Dual Supervision (Dual Supervision 을 이용한 이미지 객체 간 관계 추출)

  • Min-Kyu Kim;Min-Soo Jang;Hee-Gook Jun;Dong-Hyuk Im
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.1244-1246
    • /
    • 2023
  • 비디오, 오디오, 이미지, 텍스트 등의 비정형 데이터는 데이터 구조가 없어 데이터 자체만으로는 내용에 대한 질의 처리가 힘들어 정형 데이터로 변환하는 과정이 필요하다. 관계 추출 작업은 문장 내 단어 간 속성 또는 관계를 예측하여, 문장을 구조적으로 표현한다. 자연어처리 기법인 Dual Supervision 모델은 인간이 레이블한 데이터와 기계가 레이블한 데이터를 기반으로 기존 모델보다 적은 리소스로 관계를 예측한다. 해당 자연어 처리 모델을 이미지 처리에도 적용하여 기존 방법보다 적은 리소스를 이용하여 이미지에 대한 내용을 구조적으로 나타내는 모델을 제안하였으며, 실험을 통해 효율적인 이미지 객체 관계 추출이 가능함을 확인하였다.

A Study on Design Schemes of Extracting Control Signals for a CD-G System (디지틀 오디오용 그래픽 시스템의 실시간 제어신호 추출을 위한 설계방식 연구)

  • 이용석;정화자;김용득
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.10
    • /
    • pp.1063-1073
    • /
    • 1992
  • This paper deals with a method for extracting picture signals from CD graphics with a conventional CD player, schemes for designing circuits for the effective extraction of control signals, and the implementation of such circuits using commercially available logic components, thereby achieving cost-effectiveness. This paper also presents an implementation and evaluation of the CD-G system, which requires extracting picture signals, deinterleaving the extracted signals and analyzing control commands and displaying them on a screen. The CD-G system implemented using the extraction circuit presented herein has been observed to operate well in real time.

  • PDF