• Title/Summary/Keyword: 음향 정보

Search Result 1,315, Processing Time 0.026 seconds

Voice Activity Detection using Motion and Variation of Intensity in The Mouth Region (입술 영역의 움직임과 밝기 변화를 이용한 음성구간 검출 알고리즘 개발)

  • Kim, Gi-Bak;Ryu, Je-Woong;Cho, Nam-Ik
    • Journal of Broadcast Engineering
    • /
    • v.17 no.3
    • /
    • pp.519-528
    • /
    • 2012
  • Voice activity detection (VAD) is generally conducted by extracting features from the acoustic signal and a decision rule. The performance of such VAD algorithms driven by the input acoustic signal highly depends on the acoustic noise. When video signals are available as well, the performance of VAD can be enhanced by using the visual information which is not affected by the acoustic noise. Previous visual VAD algorithms usually use single visual feature to detect the lip activity, such as active appearance models, optical flow or intensity variation. Based on the analysis of the weakness of each feature, we propose to combine intensity change measure and the optical flow in the mouth region, which can compensate for each other's weakness. In order to minimize the computational complexity, we develop simple measures that avoid statistical estimation or modeling. Specifically, the optical flow is the averaged motion vector of some grid regions and the intensity variation is detected by simple thresholding. To extract the mouth region, we propose a simple algorithm which first detects two eyes and uses the profile of intensity to detect the center of mouth. Experiments show that the proposed combination of two simple measures show higher detection rates for the given false positive rate than the methods that use a single feature.

A Study on Real-time Implementing of Time-Scale Modification (음성 신호 시간축 변환의 실시간 구현에 관한 연구)

  • Han, Dong-Chul;Lee, Ki-Seung;Cha, Il-Hawan;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.50-61
    • /
    • 1995
  • A time scale modification method yielding rate-modified speech while conserving the characteristic of speech was implemented in real-time using a goneral purpose digital signal processor. Time scale modification changed pronunciation speed only, producing a time difference between the input signal and the modified signal, making it impossible to implement it in real-time. In this thesis, a system was implemented to remove the time difference between the input and modified signals. Speech signals slowed down or speeded up by a physical time scale modification method, such as adjusting the motor speed of the cassett tape recorder, was used as the input signal. Physical modification that controled only the inter speed of the cassette tape player distorted the pitch period of the original speech. In this study, a real-time system was implemented so that the pitch-distorted speech was reconstructed back to the original by fractional sampling pitch shifting using an FIR filter, and this signal was time scale modified to match the cassette tape recorder motor speed using SOLA time-scale medification. In experiments using speech signals medifiedby the proposed method, results obtained using a 16-bit resolution ADSP2101 processor and using computer simulations employing floating point operations showed about the same average frame signal-to-noise ratio of about 20 dB.

  • PDF

Time- and Frequency-Domain Block LMS Adaptive Digital Filters: Part Ⅰ- Realization Structures (시간영역 및 주파수영역 블럭적응 여파기에 관한 연구 : 제1부- 구현방법)

  • Lee, Jae-Chon;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.7 no.4
    • /
    • pp.31-53
    • /
    • 1988
  • In this work we study extensively the structures and performance characteristics of the block least mean-square (BLMS) adaptive digital filters (ADF's) that can be realized efficiently using the fast Fourier transform (FFT). The weights of a BLMS ADF realized using the FFT can be adjusted either in the time domain or in the frequency domain, leading to the time-domain BLMS(TBLMS) algorithm or the frequency-domain BLMS (FBLMS) algorithm, respectively. In Part Ⅰof the paper, we first present new results on the overlap-add realization and the number-theoretic transform realization of the FBLMS ADF's. Then, we study how we can incorporate the concept of different frequency-weighting on the error signals and the self-orthogonalization of weight adjustment in the FBLMS ADF's , and also in the TBLMS ADF's. As a result, we show that the TBLMS ADF can also be made to have the same fast convergence speed as that of the self-orthogonalizing FBLMS ADF. Next, based on the properties of the sectioning operations in weight adjustment, we discuss unconstrained FBLMS algorithms that can reduce two FFT operations both for the overlap-save and overlap-add realizations. Finally, we investigate by computer simulation the effects of different parameter values and different algorithms on the convergence behaviors of the FBLMS and TBLMS ADF's. In Part Ⅱ of the paper, we will analyze the convergence characteristics of the TBLMS and FBLMS ADF's.

  • PDF

Impact Monitoring of Composite Structures using Fiber Bragg Grating Sensors (광섬유 브래그 격자 센서를 이용한 복합재 구조물의 충격 모니터링 기법 연구)

  • Jang, Byeong-Wook;Park, Sang-Oh;Lee, Yeon-Gwan;Kim, Chun-Gon;Park, Chan-Yik;Lee, Bong-Wan
    • Composites Research
    • /
    • v.24 no.1
    • /
    • pp.24-30
    • /
    • 2011
  • Low-velocity impact can cause various damages which are mostly hidden inside the laminates or occur in the opposite side. Thus, these damages cannot be easily detected by visual inspection or conventional NDT systems. And if they occurred between the scheduled NDT periods, the possibilities of extensive damages or structural failure can be higher. Due to these reasons, the built-in NDT systems such as real-time impact monitoring system are required in the near future. In this paper, we studied the impact monitoring system consist of impact location detection and damage assessment techniques for composite flat and stiffened panel. In order to acquire the impact-induced acoustic signals, four multiplexed FBG sensors and high-speed FBG interrogator were used. And for development of the impact and damage occurrence detections, the neural networks and wavelet transforms were adopted. Finally, these algorithms were embodied using MATLAB and LabVIEW software for the user-friendly interface.

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

A Study on Intelligent Mobility Enhancement System for the Mobility Handicapped (첨단 교통약자 보호시스템에 대한 연구)

  • Han, Woong-Gu;Shin, Kang-Won;Choi, Kee-Choo;Kim, Nam-Sun;Sohn, Sang-Hyun
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.5
    • /
    • pp.25-37
    • /
    • 2010
  • This study is aimed at enhancing mobility rights for the transportation underprivileged that has been made light of relatively compared to normal people. In order to do this, we've suggested having ITS (Intelligent Traffic System) built and improving satisfaction through the test operation of its main system. The existing sound signal device for the visually handicapped has one problem with managing it. Because, the people in charge of it had to visit each problematic site directly to maintain and fix some problems every time it was out of order. Moreover, it couldn't provide sustainable services about voice guidance and the visually handicapped had to control it by either confirming the location of buttons that were installed on the pillar of traffic light and then pressing one of them or using a remote controller on their own. In order to improve such inconveniences, we have created a new typed sound signal device for the visually handicapped by applying the cutting-edge wireless technology based on ergonomics considering actual road situations. Such technology enables it report the status of signal device and light to them by using its voice guidance system automatically every time they have access to it. Additionally, we've already introduced it to a couple of test areas and then known the fact that they recognized traffic situation more conveniently and safely compared to the existing sound signal device. That is above average in terms of satisfaction. In addition to that, we've provided LTS (Location Tracking System - Location-based service intended for elementary students) by utilizing the existing wireless infrastructure and founded the fact that about 87% of their parents were satisfied with the service based on LTS.

Study on the Design of S/PDIF BC which Can Operate without PLL (PLL없이 동작하는 S/PDIF IC 설계에 관한 연구)

  • Park Ju-Sung;Kim Suk-Chan;Kim Kyoung-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.11-20
    • /
    • 2005
  • In this paper, we deal with the research about a S/PDIF (Sony Philips Digital Interface) receiver which can operate without PLL (Phase Locked Loop) circuits. Although a S/PDIF receiver is used in most audio devices and audio processors in these days. yet there are only few domestic researches about S/PDIF. Currently used commercial DACs (Digital-to-Analog Converters) which can decode S/PDIF signals, have a PLL circuit inside them. The PLL makes it possible to extract clock information from S/PDIF digital signal and to synchronize a clock signal with input signals. But the PLL circuit makes many diffculties in designing the SOC (System On Chips) of VLSIs (Vew Large Scale Integrated Ciruits) because it is an "analog circuit". We proposed a S/PDIF receiver which doesn't have PLL circuits and only has Pure digital circuits. The key idea of the proposed S/PDIF receiver. is to use the ratio between a 16 MHz basic input clock and S/PDIF signals. After having decoded hundreds thousands S/PDIF inputs, it went to prove that a S/PDIF receiver can be designed with pure digital circuits and without any analog circuits such as PLL circuits. We have confidence that the proposed S/PDIF receiver can be used as an IP (Intellectual Property) for the SOC design of the digital circuits.

Continuous Speech Recognition based on Parmetric Trajectory Segmental HMM (모수적 궤적 기반의 분절 HMM을 이용한 연속 음성 인식)

  • 윤영선;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.35-44
    • /
    • 2000
  • In this paper, we propose a new trajectory model for characterizing segmental features and their interaction based upon a general framework of hidden Markov models. Each segment, a sequence of vectors, is represented by a trajectory of observed sequences. This trajectory is obtained by applying a new design matrix which includes transitional information on contiguous frames, and is characterized as a polynomial regression function. To apply the trajectory to the segmental HMM, the frame features are replaced with the trajectory of a given segment. We also propose the likelihood of a given segment and the estimation of trajectory parameters. The obervation probability of a given segment is represented as the relation between the segment likelihood and the estimation error of the trajectories. The estimation error of a trajectory is considered as the weight of the likelihood of a given segment in a state. This weight represents the probability of how well the corresponding trajectory characterize the segment. The proposed model can be regarded as a generalization of a conventional HMM and a parametric trajectory model. The experimental results are reported on the TIMIT corpus and performance is show to improve significantly over that of the conventional HMM.

  • PDF

An Adaptive Time Delay Estimation Method Based on Canonical Correlation Analysis (정준형 상관 분석을 이용한 적응 시간 지연 추정에 관한 연구)

  • Lim, Jun-Seok;Hong, Wooyoung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.548-555
    • /
    • 2013
  • The localization of sources has a numerous number of applications. To estimate the position of sources, the relative delay between two or more received signals for the direct signal must be determined. Although the generalized cross-correlation method is the most popular technique, an approach based on eigenvalue decomposition (EVD) is also popular one, which utilizes an eigenvector of the minimum eigenvalue. The performance of the eigenvalue decomposition (EVD) based method degrades in the low SNR and the correlated environments, because it is difficult to select a single eigenvector for the minimum eigenvalue. In this paper, we propose a new adaptive algorithm based on Canonical Correlation Analysis (CCA) in order to extend the operation range to the lower SNR and the correlation environments. The proposed algorithm uses the eigenvector corresponding to the maximum eigenvalue in the generalized eigenvalue decomposition (GEVD). The estimated eigenvector contains all the information that we need for time delay estimation. We have performed simulations with uncorrelated and correlated noise for several SNRs, showing that the CCA based algorithm can estimate the time delays more accurately than the adaptive EVD algorithm.

Surficial Sediment Classification using Backscattered Amplitude Imagery of Multibeam Echo Sounder(300 kHz) (다중빔 음향 탐사시스템(300 kHz)의 후방산란 자료를 이용한 해저면 퇴적상 분류에 관한 연구)

  • Park, Yo-Sup;Lee, Sin-Je;Seo, Won-Jin;Gong, Gee-Soo;Han, Hyuk-Soo;Park, Soo-Chul
    • Economic and Environmental Geology
    • /
    • v.41 no.6
    • /
    • pp.747-761
    • /
    • 2008
  • In order to experiment the acoustic remote classification of seabed sediment, we achieved ground-truth data(i.e. video and grab samples, etc.) and developed post-processing for automatic classification procedure on the basis of 300 kHz MultiBeam Echo Sounder(MBES) backscattering data, which was acquired using KONGBERG Simrad EM3000 at Sock-Cho Port, East Sea of South Korea. Sonar signal and its classification performance were identified with geo-referenced video imagery with the aid of GIS (Geographic Information System). The depth range of research site was from 5 m to 22.7 m, and the backscattering amplitude showed from -36dB to -15dB. The mean grain sizes of sediment from equi-distanced sampling site(50 m interval) varied from 2.86$(\phi)$ to 0.88(\phi). To acquire the main feature for the seabed classification from backscattering amplitude of MBES, we evaluated the correlation factors between the backscattering amplitude and properties of sediment samples. The performance of seabed remote classification proposed was evaluated with comparing the correlation of human expert segmentation to automatic algorithm results. The cross-model perception error ratio on automatic classification algorithm shows 8.95% at rocky bottoms, and 2.06% at the area representing low mean grain size.