• Title/Summary/Keyword: 오디오신호

Search Result 435, Processing Time 0.022 seconds

Development of a Listener Position Adaptive Real-Time Sound Reproduction System (청취자 위치 적응 실시간 사운드 재생 시스템의 개발)

  • Lee, Ki-Seung;Lee, Seok-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.7
    • /
    • pp.458-467
    • /
    • 2010
  • In this paper, a new audio reproduction system was developed in which the cross-talk signals would be reasonably cancelled at an arbitrary listener position. To adaptively remove the cross-talk signals according to the listener's position, a method of tracking the listener position was employed. This was achieved using the two microphones, where the listener direction was estimated using the time-delay between the two signals from the two microphones, respectively. Moreover, room reverberation effects were taken into consideration where linear prediction analysis was involved. To remove the cross-talk signals at the left-and right-ears, the paths between the sources and the ears were represented using the KEMAR head-related transfer functions (HRTFs) which were measured from the artificial dummy head. To evaluate the usefulness of the proposed listener tracking system, the performance of cross-talk cancellation was evaluated at the estimated listener positions. The performance was evaluated in terms of the channel separation ration (CSR), a -10 dB of CSR was experimentally achieved although the listener positions were more or less deviated. A real-time system was implemented using a floating-point digital signal processor (DSP). It was confirmed that the average errors of the listener direction was 5 degree and the subjects indicated that 80 % of the stimuli was perceived as the correct directions.

An Embedded Watermark into Multiple Lower Bitplanes of Digital Image (디지털 영상의 다중 하위 비트플랜에 삽입되는 워터마크)

  • Rhee, Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.101-109
    • /
    • 2006
  • Recently, according to the number of internet in widely use and the development of the related application program, the distribution and use of multimedia content(text, images, video, audio etc.) is very easy. Digital signal may be easily duplicated and the duplicated data can have same quality of original data so that it is difficult to warrant original owner. For the solution of this problem, the protection method of copyright which is encipher and watermarking. Digital watermarking is used to protect IP(Intellectual Property) and authenticate the owner of multimedia content. In this paper, the proposed watermarking algerian embeds watermark into multiple lower bitplanes of digital image. In the proposed algorithm, original and watermark images are decomposed to bitplanes each other and the watermarking operation is executed in the corresponded bitplane. The position of watermark image embedded in each bitplane is used to the watermarking key and executed in multiple lower bitplane which has no an influence on human visual recognition. Thus this algorithm can present watermark image to the multiple inherent patterns and needs small watermarking quantity. In the experiment, the author confirmed that it has high robustness against attacks of JPEG, MEDIAN and PSNR but it is weakness against attacks of NOISE, RNDDIST, ROT, SCALE, SS on spatial domain when a criterion PSNR of watermarked image is 40dB.

Microscopic DVS based Optimization Technique of Multimedia Algorithm (Microscopic DVS 기반의 멀티미디어 알고리즘 최적화 기법)

  • Lee Eun-Seo;Kim Byung-Il;Chang Tae-Gye
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.4 s.304
    • /
    • pp.167-176
    • /
    • 2005
  • This paper proposes a new power minimization technique for the frame-based multimedia signal processing. The derivation of the technique is based on the newly proposed microscopic DVS(Dynamic Voltage Scaling) method, where, the operating frequency and the supply voltage levels are dynamically controlled according to the processing requirement for each frame of multimedia data. The multimedia signal processing algorithms are also redesigned and optimized to maximize the power saving efficiency of the microscopic DVS technology. The characterization of the mean/variance distribution of the processing load in the frame-based multimedia signal processing provides the major basis not only for the optimized application of the microscopic DVS technology but also for the optimization of the multimedia algorithms. The power saying efficiency of the proposed DVS approach is experimentally tested with the algorithms of MPEG-2 video decoder and MPEG-2 AAC audio encoder on the ARM9 RISC processor. The experimental results with the diverse MPEG-2 video and audio files show The average power saving efficiencies of 50$\%$ and 30$\%$, respectively. The results also agree very well with those of the analytic derivations.

Comprehensive analysis of deep learning-based target classifiers in small and imbalanced active sonar datasets (소량 및 불균형 능동소나 데이터세트에 대한 딥러닝 기반 표적식별기의 종합적인 분석)

  • Geunhwan Kim;Youngsang Hwang;Sungjin Shin;Juho Kim;Soobok Hwang;Youngmin Choo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.329-344
    • /
    • 2023
  • In this study, we comprehensively analyze the generalization performance of various deep learning-based active sonar target classifiers when applied to small and imbalanced active sonar datasets. To generate the active sonar datasets, we use data from two different oceanic experiments conducted at different times and ocean. Each sample in the active sonar datasets is a time-frequency domain image, which is extracted from audio signal of contact after the detection process. For the comprehensive analysis, we utilize 22 Convolutional Neural Networks (CNN) models. Two datasets are used as train/validation datasets and test datasets, alternatively. To calculate the variance in the output of the target classifiers, the train/validation/test datasets are repeated 10 times. Hyperparameters for training are optimized using Bayesian optimization. The results demonstrate that shallow CNN models show superior robustness and generalization performance compared to most of deep CNN models. The results from this paper can serve as a valuable reference for future research directions in deep learning-based active sonar target classification.

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.