Search | Korea Science

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

김명원;한문성;이순신;류정우
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.3
- /
- pp.67-77
- /
- 2004
The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.
PDF KSCI

Effects of Elastic Band-Resistive Exercise using Audio-visual Medium on Pain, Proprioceptive Sense, and Motor Function in Adult Females with Chronic Neck and Shoulder Pain (만성 목-어깨 통증이 있는 여성 성인에게 시청각 매체를 활용한 탄력밴드 저항운동이 통증, 고유수용성 감각과 운동기능에 미치는 영향)

Nam Gi Lee;Jeong-Woo Lee
- Journal of Korean Physical Therapy Science
- /
- v.31 no.1
- /
- pp.33-45
- /
- 2024
Background: This study aimed to investigate the effect of elastic band-resistive exercise using audio-visual medium on pain, proprioception, and motor function in adults with chronic neck and shoulder pain. Design: One group pretest-posttest follow-up experimental design. Method: Twenty adult women with neck and shoulder pain voluntarily participated in this study. Elastic band-resistive exercise using audio-visual medium including cervical flexion and extension, shoulder external rotation, and scapular retraction-protraction motions was conducted 5 times a week for 3 weeks. The Numerical Rating Scale, pressure threshold tool, CROM goniometer, and Image J software were used to assess subjective pain level, tenderness threshold (pain), joint position sense error (proprioception), joint range of motion, and postural alignment (motor function), respectively. Result:: The pain intensity and threshold and joint position sense error showed significant decreases after the intervention, whereas the joint range of motion angle revealed significant increases. The postural alignment including forward head posture and rounded shoulder revealed significant improvements after the intervention. Conclusions: Therefore, we suggest that elastic band-resistive exercise through audio-visual medium would be helpful in preventing and managing pain and physical dysfunction in individuals with chronic neck and shoulder pain, and then it would support the development of health management-related online education content.
https://doi.org/10.26862/jkpts.2024.03.31.1.33 인용 PDF

Study of DRM Application for the Portable Digital Audio Device (휴대용 디지털 오디오 기기에서의 DRM 적용에 관한 연구)

Cho, Nam-Kyu;Lee, Dong-Hwi;Lee, Dong-Chun;J. Kim, Kui-Nam;Park, Sang-Min
- Convergence Security Journal
- /
- v.6 no.4
- /
- pp.21-27
- /
- 2006
With the introduction of sound source sharing over the high speed internet and portable digital audio, the digitalization of sound source has been rapidly expanded and the sales and distribution of sound sources of the former offline markets are stagnant. Also, the problem of infringement of copyright is being issued seriously through illegal reproduction and distribution of digitalized sound sources. To solve these problems, the DRM technology for protecting contents and copyrights in portable digital audio device began to be introduced. However, since the existing DRM was designed based on the fast processing CPU and network environment, there were many problems in directly applying to the devices with small screen resolution, low processing speed and network function such as digital portable audio devices which the contents are downloadable through the PC. In this study, the DRM structural model which maintains similar security level as PC environment in the limited hardware conditions such as portable digital audio devices is proposed and analyzed. The proposed model chose portable digital audio exclusive device as a target platform which showed much better result in the aspect of security and usability compared to the DRM structure of exiting portable digital audio device.
PDF

L2 Proficiency Effect on the Acoustic Cue-Weighting Pattern by Korean L2 Learners of English: Production and Perception of English Stops

Kong, Eun Jong;Yoon, In Hee
- Phonetics and Speech Sciences
- /
- v.5 no.4
- /
- pp.81-90
- /
- 2013
This study explored how Korean L2 learners of English utilize multiple acoustic cues (VOT and F0) in perceiving and producing the English alveolar stop with a voicing contrast. Thirty-four 18-year-old high-school students participated in the study. Their English proficiency level was classified as either 'high' (HEP) or 'low' (LEP) according to high-school English level standardization. Thirty different synthesized syllables were presented in audio stimuli by combining a 6-step VOTs and a 5-step F0s. The listeners judged how close the audio stimulus was to /t/ or /d/ in L2 using a visual analogue scale. The L2 /d/ and /t/ productions collected from the 22 learners (12 HEP, 10 LEP) were acoustically analyzed by measuring VOT and F0 at the vowel onset. Results showed that LEP listeners attended to the F0 in the stimuli more sensitively than HEP listeners, suggesting that HEP listeners could inhibit less important acoustic dimensions better than LEP listeners in their L2 perception. The L2 production patterns also exhibited a group-difference between HEP and LEP in that HEP speakers utilized their VOT dimension (primary cue in L2) more effectively than LEP speakers. Taken together, the study showed that the relative cue-weighting strategies in L2 perception and production are closely related to the learner's L2 proficiency level in that more proficient learners had a better control of inhibiting and enhancing the relevant acoustic parameters.
https://doi.org/10.13064/KSSS.2013.5.4.081 인용 PDF

Multi-modal Detection of Anchor Shot in News Video (다중모드 특징을 사용한 뉴스 동영상의 앵커 장면 검출 기법)

Yoo, Sung-Yul;Kang, Dong-Wook;Kim, Ki-Doo;Jung, Kyeong-Hoon
- Journal of Broadcast Engineering
- /
- v.12 no.4
- /
- pp.311-320
- /
- 2007
In this paper, an efficient detection algorithm of an anchor shot in news video is presented. We observed the audio visual characteristics of news video and proposed several low level features which are appropriate for detecting an anchor shot in news video. The overall structure of the proposed algorithm is composed of 3 stages: the pause detection, the audio cluster classification, and the matching with motion activity stage. We used the audio features as well as the motion feature in order to improve the indexing accuracy and the simulation results show that the performance of the proposed algorithm is quite satisfactory.
https://doi.org/10.5909/JBE.2007.12.4.311 인용 PDF KSCI

Intrusion detection based on the sound field variation of audible frequency band (가청 주파수대 음장 변화 측정 기반 침입 감지 기술)

Lee, Sung-Q.;Park, Kang-Ho;Yang, Woo-Seok;Kim, Jong-Dae;Kim, Dae-Sung;Kim, Ki-Hyun;Wang, Se-Myung
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2010.10a
- /
- pp.187-192
- /
- 2010
In this paper, intrusion detection technique based on the sound field variation of audio frequency in the security space is proposed. The sound field formed by sound source can be detected with the microphone when the obstacle or intruder is positioned. The sound field variation due to the intruder is based on the interference of audio wave. With the help of numerical simulation of sound field formations, the increase or decrease of sound pressure level is analyzed not only the obstacle, but also the intruder. Even the microphone is positioned behind the source, sound pressure level can be increase or decrease due to the interference. Frequency response test is performed with Gaussian white noise signal to get the whole frequency response from 0 to half of sampling frequency. There are three security cases. Case 1 is the situation of empty space with and without intruder, case 2 is the situation of blocking obstacle with and without intruder, and case 3 is the situation of side blocking obstacle with and without intruder. At each case, the frequency response is obtained first at the security space without intruder, and second with intruder. From the experiment, intruder size of $50cm{\times}50cm$ can be successfully detected with the proposed technique. Moreover, the case 2 or case 3 bring about bigger sound field variation. It means that the proposed technique have the potential of more credible security sensing in real situation.
PDF

Intrusion Detection Based on the Sound Field Variation of Audible Frequency Band (가청 주파수대 음장 변화 측정 기반 침입 감지 기술)

Lee, Sung-Q;Park, Kang-Ho;Yang, Woo-Seok;Kim, Jong-Dae;Kim, Dae-Sung;Kim, Ki-Hyun;Wang, Se-Myung
- Transactions of the Korean Society for Noise and Vibration Engineering
- /
- v.21 no.3
- /
- pp.212-219
- /
- 2011
In this paper, intrusion detection technique based on the sound field variation of audio frequency in the security space is proposed. The sound field formed by sound source can be detected with the microphone when the obstacle or intruder is positioned. The sound field variation due to the intruder is mainly caused by the interference of audio wave. With the help of numerical simulation of sound field formations, the increase or decrease of sound pressure level is analyzed not only by the obstacle, but also by the intruder. Even the microphone is positioned behind the source, sound pressure level can be increased or decreased due to the interference of sound wave. Frequency response test is performed with Gaussian white noise signal to get the whole frequency response from 0 to half of sampling frequency. There are three security cases. Case 1 is the situation of empty space with and without intruder, case 2 is the situation of blocking obstacle with and without intruder, and case 3 is the situation of side blocking obstacle with and without intruder. At each case, the frequency response is obtained first at the security space without intruder, and second with intruder. From the experiment, intruder size of diameter of 50 cm pillar can be successfully detected with the proposed technique. Moreover, the case 2 and case 3 bring about bigger sound field variation. It means that the proposed technique have the potential of more credible security guarantee in real situation.
https://doi.org/10.5050/KSNVE.2011.21.3.212 인용 PDF KSCI

Improved 20Mb/s CMOS Optical Receiver for Digital Audio Interfaces (디지털 오디오 인터페이스용 개선된 20Mb/s CMOS 광수신기)

Yoo, Jae-Tack;Kim, Gil-Su
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.3 s.357
- /
- pp.6-11
- /
- 2007
This paper proposes CMOS optical receivers to reduce effective area and pulse width distortion (PWD) in high definition digital audio interfaces. To mitigate effective area and PWD, proposed receivers include a frans-impedance amplifier (TIA) with dual output and a level shifter with threshold convergence, respectively. Proposed circuits are fabricated using $0.25{\mu}m$ CMOS process and measured result demonstrated the effective area of $270\times120{\mu}m^2$ and PWD of ${\pm}3%$ for the receiver with a dual output TIA, and the effective area of $410\times140{\mu}m^2$ and PWD of ${\pm}2%$ for the receiver with a threshold convergence level shifter.
PDF KSCI

Enhanced Pre echo Control Algorithm for MPEG Audio Coders (MPEG 오디오 부호화기를 위한 향상된 프리 에코 컨트롤 알고리듬)

Lee Chang-Joon;Lee Jae-Seong;Park Young-Cheol
- Journal of Broadcast Engineering
- /
- v.11 no.2 s.31
- /
- pp.191-199
- /
- 2006
This paper presents an efficient pre echo control scheme for MPEG Audio coders based on the psychoacoustic model II (PAM-II). Pre echo control is the final step for the calculation of masking threshold in the PAM II. It is to minimize the spread of quantization error over the processing frame. In the conventional encoders, pre echo is reduced by restricting the estimated masking threshold not to exceed the one obtained in the previous frame. The conventional method performs pre echo control not only for short blocks but also for long blocks, which lowers the masking threshold in long blocks and, in turn, increases the quantization noise level of corresponding blocks. This paper proposes an efficient pre echo control process. The test result shows a mean enhancement of more than 0.4 especially for complex signals on the ITU R 5 point audio impairment scale.
PDF KSCI

Preprocessing method for enhancing digital audio quality in speech communication system (음성통신망에서 디지털 오디오 신호 음질개선을 위한 전처리방법)

Song Geun-Bae;Ahn Chul-Yong;Kim Jae-Bum;Park Ho-Chong;Kim Austin
- Journal of Broadcast Engineering
- /
- v.11 no.2 s.31
- /
- pp.200-206
- /
- 2006
This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.
PDF KSCI

Search Result 252, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)