• Title/Summary/Keyword: 3D Audio

Search Result 210, Processing Time 0.04 seconds

An Embedding /Extracting Method of Audio Watermark Information for High Quality Stereo Music (고품질 스테레오 음악을 위한 오디오 워터마크 정보 삽입/추출 기술)

  • Bae, Kyungyul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.21-35
    • /
    • 2018
  • Since the introduction of MP3 players, CD recordings have gradually been vanishing, and the music consuming environment of music users is shifting to mobile devices. The introduction of smart devices has increased the utilization of music through music playback, mass storage, and search functions that are integrated into smartphones and tablets. At the time of initial MP3 player supply, the bitrate of the compressed music contents generally was 128 Kbps. However, as increasing of the demand for high quality music, sound quality of 384 Kbps appeared. Recently, music content of FLAC (Free License Audio Codec) format using lossless compression method is becoming popular. The download service of many music sites in Korea has classified by unlimited download with technical protection and limited download without technical protection. Digital Rights Management (DRM) technology is used as a technical protection measure for unlimited download, but it can only be used with authenticated devices that have DRM installed. Even if music purchased by the user, it cannot be used by other devices. On the contrary, in the case of music that is limited in quantity but not technically protected, there is no way to enforce anyone who distributes it, and in the case of high quality music such as FLAC, the loss is greater. In this paper, the author proposes an audio watermarking technology for copyright protection of high quality stereo music. Two kinds of information, "Copyright" and "Copy_free", are generated by using the turbo code. The two watermarks are composed of 9 bytes (72 bits). If turbo code is applied for error correction, the amount of information to be inserted as 222 bits increases. The 222-bit watermark was expanded to 1024 bits to be robust against additional errors and finally used as a watermark to insert into stereo music. Turbo code is a way to recover raw data if the damaged amount is less than 15% even if part of the code is damaged due to attack of watermarked content. It can be extended to 1024 bits or it can find 222 bits from some damaged contents by increasing the probability, the watermark itself has made it more resistant to attack. The proposed algorithm uses quantization in DCT so that watermark can be detected efficiently and SNR can be improved when stereo music is converted into mono. As a result, on average SNR exceeded 40dB, resulting in sound quality improvements of over 10dB over traditional quantization methods. This is a very significant result because it means relatively 10 times improvement in sound quality. In addition, the sample length required for extracting the watermark can be extracted sufficiently if the length is shorter than 1 second, and the watermark can be completely extracted from music samples of less than one second in all of the MP3 compression having a bit rate of 128 Kbps. The conventional quantization method can extract the watermark with a length of only 1/10 compared to the case where the sampling of the 10-second length largely fails to extract the watermark. In this study, since the length of the watermark embedded into music is 72 bits, it provides sufficient capacity to embed necessary information for music. It is enough bits to identify the music distributed all over the world. 272 can identify $4*10^{21}$, so it can be used as an identifier and it can be used for copyright protection of high quality music service. The proposed algorithm can be used not only for high quality audio but also for development of watermarking algorithm in multimedia such as UHD (Ultra High Definition) TV and high-resolution image. In addition, with the development of digital devices, users are demanding high quality music in the music industry, and artificial intelligence assistant is coming along with high quality music and streaming service. The results of this study can be used to protect the rights of copyright holders in these industries.

A Study on Real-Time Loudness Metering Algorithm for Digital Broadcasting (디지털 방송용 오디오 레벨 계측 알고리즘의 실시간화 연구)

  • Park Seong-Gyoon
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.16 no.4 s.95
    • /
    • pp.427-437
    • /
    • 2005
  • In this paper, the perceived audio level metering algorithm of digital audio sound to be able to operate in real-time is proposed. Through analyzing a conventional recommendation ITU-RBS1387-I for objective audio quality analysis, FFT-based loudness metering algorithm is implemented and the real-time method of that algorithm was advised and proved. The proposed method is based on look-up table. In order to prove the proved method, using 23 pure tones and 30 preselected digital audio samples, its performance and operation time is evaluated. Its performance, compared with an original algorithm's, have a good figure of less than $2\;\%$ error even if look-up table related with spectral spreading have large level resolution of $10\;\cal{dB}$. The proposed algorithm take only 1/21 of original algorithm's measuring time. Also, in the proposed algorithm auditory pitch group energy calculation take 1/450 of original algorithm's and excitation calculation take 1/3.57. In conclusion, the proposed algorithm is expected to be implemented into DSP-based real-time loudness meter.

Development of ATSC3.0 based UHDTV Broadcasting System providing Ultra-high-quality Service that supports HDR/WCG Video and 3D Audio, and a Fixed UHD/Mobile HD Service (HDR/WCG 비디오와 3D 오디오를 지원하는 초고품질 방송서비스와 고정 UHD/이동 HD 방송 서비스를 제공하는 ATSC 3.0 기반 UHDTV 방송 시스템 개발)

  • Ki, Myungseok;Seok, Jinwuk;Beack, Seungkwon;Jang, Daeyoung;Lee, Taejin;Kim, Hui Yong;Oh, Hyeju;Lim, Bo-mi;Bae, Byungjun;Kim, Heung Mook;Choi, Jin Soo
    • Journal of Broadcast Engineering
    • /
    • v.22 no.6
    • /
    • pp.829-849
    • /
    • 2017
  • Due to the large-scale TV display, the convergence of broadcasting and broadband, and the advancement of signal compression and transmission technology, terrestrial digital broadcasting has evolved into UHD broadcasting capable of providing simultaneous broadcasting of fixed UHD and mobile HD. The Korean standard for terrestrial UHDTV broadcasting is based on ATSC 3.0, the broadcasting standard of North America. The terrestrial UHDTV broadcasting standard chose that as a new AV codec standard, HEVC video codec which can compress with higher efficiency compared to AVC, and MPEG-H 3D audio codec for realistic audio. Also, DASH and MMT are adopted as transmission format instead of MPEG-2 TS to support broadband as well as broadcasting network, and in order to provide 4K UHD/mobile HD service simultaneously ROUTE multiplexing technology is applied. In this paper, we propose an audio/video encoder, which is required to provide HDR/WCG supported high quality video service, 10.2 channel/4 object supporting stereo sound service, fixed UHD and mobile HD simultaneous broadcasting service based on ATSC3.0, also we implemented the ATSC 3.0 LDM system for ROUTE/DASH packager, multiplexing system and physical layer transmission/reception, and verified the service ability by applying it to real time broadcast environment.

Haptic Media Broadcasting (촉각방송)

  • Cha, Jong-Eun;Kim, Yeong-Mi;Seo, Yong-Won;Ryu, Je-Ha
    • Broadcasting and Media Magazine
    • /
    • v.11 no.4
    • /
    • pp.118-131
    • /
    • 2006
  • With rapid development in ultra fast communication and digital multimedia, the realistic broadcasting technology, that can stimulate five human senses beyond the conventional audio-visual service is emerging as a new generation broadcasting technology. In this paper, we introduce a haptic broadcasting system and related core system and component techniques by which we can 'touch and feel' objects in an audio-visual scene. The system is composed of haptic media acquisition and creation, contents authoring, in the haptic broadcasting, the haptic media can be 3-D geometry, dynamic properties, haptic surface properties, movement, tactile information to enable active touch and manipulation and passive movement following and tactile effects. In the proposed system, active haptic exploration and manipulation of a 3-D mesh, active haptic exploration of depth video, passive kinesthetic interaction, and passive tactile interaction can be provided as potential haptic interaction scenarios and a home shopping, a movie with tactile effects, and conducting education scenarios are produced to show the feasibility of the proposed system.

Design methodology of the controller circuit for a highly efficient class D Amplifiers (D급 증폭기를 위한 제어회로의 설계)

  • Lee, Jong-Kue;Song, Pil-Jae
    • Proceedings of the Korean Institute of IIIuminating and Electrical Installation Engineers Conference
    • /
    • 2006.05a
    • /
    • pp.407-409
    • /
    • 2006
  • This paper presents the methods of designing the control circuits for a Class D amplifier to have a peak performance. The proposed approach is based on the three functional components - a carrier generator, a feedback circuit and a dead-time circuit. First the analog signal is applied to the controller, which outputs the 3 level PWM waveform. The controller used for this experiment is made of the operational amplifier and the logic circuit. The experimental results show that the control circuit performs with satisfaction and its output is proportional to input audio signal, providing a satisfactory 3 level PWM pattern. From this design methodology, by implementing a proposed control circuit we can achieve the efficient Class D amplifier using the half-bridge, full-bridge or push-pull topology at the output stage.

  • PDF

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

  • Eom, Youngsik;Bang, Junseong
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.148-154
    • /
    • 2021
  • With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.

The Effect of Visual Cues in the Identification of the English Consonants /b/ and /v/ by Native Korean Speakers (한국어 화자의 영어 양순음 /b/와 순치음 /v/ 식별에서 시각 단서의 효과)

  • Kim, Yoon-Hyun;Koh, Sung-Ryong;Valerie, Hazan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.25-30
    • /
    • 2012
  • This study investigated whether native Korean listeners could use visual cues for the identification of the English consonants /b/ and /v/. Both auditory and audiovisual tokens of word minimal pairs in which the target phonemes were located in word-initial or word-medial position were used. Participants were instructed to decide which consonant they heard in $2{\times}2$ conditions: cue (audio-only, audiovisual) and location (word-initial, word-medial). Mean identification scores were significantly higher for audiovisual than audio-only condition and for word-initial than word-medial condition. Also, according to signal detection theory, sensitivity, d', and response bias, c were calculated based on both hit rates and false alarm rates. The measures showed that the higher identification rate in the audiovisual condition was related with an increase in sensitivity. There were no significant differences in response bias measures across conditions. This result suggests that native Korean speakers can use visual cues while identifying confusing non-native phonemic contrasts. Visual cues can enhance non-native speech perception.

An Efficient Computation of FFT for MPEG/Audio Psycho-Acoustic Model (MPEG 심리음향모델의 고속 구현을 위한 효율적 FFT 연산)

  • 송건호;이근섭;박영철;윤대희
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.261-269
    • /
    • 2004
  • In this paper, an efficient algorithm for computing in the MPEG/audio Layer Ⅲ (MP3) encoder is proposed. The proposed algerian performs a full-band 1024-point FFT by computing 32-point FFT's of 32 subband outputs. To reduce the aliasing caused by the analysis filter bank, an aliasing cancellation butterfly is developed. A major benefit of the proposed algorithm is the computational saving. By using the proposed algorithm, it is possible to save 40~50% of computations for FFT, which results in about 20% reduction of the PAM-2 complexity.

Implementation of 3-D Audio using Korean-Type HRTF (한국형 HRTF를 이용한 입체음향 구현)

  • 김재현;정상배;양희식;한민수
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2001.11b
    • /
    • pp.63-67
    • /
    • 2001
  • 입체음향의 구현은 21세기 멀티미디어 콘텐트 관련 산업의 핵심기술 중 하나로 인식되고 있으며, 그 응용분야가 매우 넓기 때문에 이에 대한 투자가 점차 늘어가고 있는 실정이다. 본 논문은 한국인의 표준형 두상에 맞는 HRTF(Head-Related Transfer Function)를 이용한 입체음향의 구현 및 현장효과의 인공적 재현 방법에 대한 연구 결과이다.

  • PDF

Scope and Status of Audio Visual Interactive Services Standardization (상호대화형 오디오비주얼 서비스의 표준화 현황과 전망)

  • Hyun, D.W.;Lee, B.H.
    • Electronics and Telecommunications Trends
    • /
    • v.9 no.3
    • /
    • pp.97-102
    • /
    • 1994
  • 상호대화형 오디오비주얼 서비스는 텍스트, 도형, 사진, 오디오, 비디오 등과 같은 다양한 형태의 표현 요소로 구성되는 입출력 정보를 사용자의 단말이나 워크스테이션에 제공하는 서비스이다. 이러한 기능의 범위는 간단한 검색에서부터 상호대화적인 문의, 구성요소들의 재배치, 그들 요소들의 수정등의 서비스를 사용자에게 제공 할 수 있다. 이와 관련하여 ITU-T SG8/Q.11에서는 AVI 서비스를 위해 요구되는, 시스템, 데이터 교환형식, 그리고 프로토콜과 같은 일련의 기술적 사항을 표준화하는 작업을 하고 있다. 본고에서는 AVI 서비스의 기술적인 사항에 대하여 논하고, 현재 진행되고 있는 표준화 동향에 대하여 알아본다.