Search | Korea Science

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
- ETRI Journal
- /
- v.46 no.1
- /
- pp.22-34
- /
- 2024
Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.
https://doi.org/10.4218/etrij.2023-0266 인용 PDF

Implementing a Smart Space Service Testbed based on the Concept of Reconfigurable Spatial Functions (Reconfigurable Space 개념에 의한 스마트공간서비스 시나리오의 테스트베드 구현)

Cho, Yun-Jung;Kim, Sung-Ah
- 한국HCI학회:학술대회논문집
- /
- 2009.02a
- /
- pp.855-861
- /
- 2009
This paper presents the concept of dynamically reconfigurable space by introducing smart building components. Thanks to the advances in ubiquitous computing and ITC technology, we are able to expect, in the near future, the aspects of future buildings which may transform their appearance and states to perform specific functions. In other words, it is certain that the building space will actively reconfigure itself to accommodate user's needs once we acquire proper technologies. Based on the assumption that building components may not be transformed through the magical process, but change its physical states (e.g. transparency, illumination, display contents, etc.) and functions of embedded devices (e.g. audio, actuators, sensors, etc.), we can envision a dynamically reconfigurable smart space. In order to conceptualize such spaces, critical surveys have been conducted on current works of leading architects. When the room needs to be used as a specific function room, the components need to change theirs states or to behave in a certain manner to create an optimum environment. Our model defines the relationships and elements to describe the mechanism of reconfigurable space. We expect this model provides a conceptual guideline for developing a smart building components based on spatial service scenarios. Therefore, a future smart spaces implemented by integrating various technologies are not designed in deterministic manner, so that spatial functions are expanded without constrained by physical existence.
PDF

Collaborative Authoring System using 3D Spatio-Temporal Space (삼차원 시.공간을 이용하는 프레젠테이션 공동저작 시스템)

이도형;성미영
- Journal of KIISE:Computing Practices and Letters
- /
- v.9 no.6
- /
- pp.623-634
- /
- 2003
In this paper, we propose a collaborative multimedia authoring system. Our authoring system represents a multimedia presentation in a 3D coordinate system. One axis represents the traditional timeline information (T-zone), and the other two axes represent spatial coordinates (XY-zone). Our system represents a visual media objects as a 3D parallelepipeds and audio media objects as cylinders. This interface allows for simultaneous authoring and manipulation of both the temporal and the spatial aspects of a presentation. Using our system, users can design multimedia presentations collaboratively in the unified spatio-temporal space while freely traversing the spatial domain and the temporal domain without changing the context of authoring. In addition, we suggest an efficient mechanism of concurrency control for shared objects generated by our collaborative writing system. The mechanism is mainly based on the user awareness, the multiple versions, and the access permission of shared objects. Our concurrency control mechanism is designed to keep data consistency by minimizing the collision due to the delay or the failure of network communication and to allow maximum responsiveness for users using optimistic concurrency control. Also, the mechanism maximize the responsiveness by refining the locking granularity and applying different concurrency control mechanisms to each.
PDF KSCI

Development of a Scalable Clustering A/V Server for the Internet Personal-Live Broadcasting (인터넷 개인 생방송을 위한 Scalable Clustering A/V Server 개발)

Lee, Sang-Moon;Kang, Sin-Jun;Min, Byung-Seok;Kim, Hag-Bae;Park, Jin-Bae
- The KIPS Transactions:PartC
- /
- v.9C no.1
- /
- pp.107-114
- /
- 2002
In these days, rapid advances of the computer system and the high speed network have made the multimedia services popularized among various applications and services in the internet. Internet live broadcasting, a part of multimedia services, makes it possible to provide not only existing broadcasting services including audio and video but also interactive communications which also expand application scopes by freeing from both temporal and spatial limitation. In the Paper, an interned Personal-live broadcasting server system is developed by allowing individual users to actively create or join live-broadcasting services with such basic multimedia devices as a PC camera and a sound card. As the number of broadcasters and participants increases, concurrent multiple channels are established and groups are to be expanded. The system should also guarantee High Availability (HA) for continuous services even in the presence of partial failure of the cluster. Furthermore, a transmission mode switching is supported to consider network environments in the user system.
https://doi.org/10.3745/KIPSTC.2002.9C.1.107 인용 PDF KSCI

DECODE: A Novel Method of DEep CNN-based Object DEtection using Chirps Emission and Echo Signals in Indoor Environment (실내 환경에서 Chirp Emission과 Echo Signal을 이용한 심층신경망 기반 객체 감지 기법)

Nam, Hyunsoo;Jeong, Jongpil
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.21 no.3
- /
- pp.59-66
- /
- 2021
Humans mainly recognize surrounding objects using visual and auditory information among the five senses (sight, hearing, smell, touch, taste). Major research related to the latest object recognition mainly focuses on analysis using image sensor information. In this paper, after emitting various chirp audio signals into the observation space, collecting echoes through a 2-channel receiving sensor, converting them into spectral images, an object recognition experiment in 3D space was conducted using an image learning algorithm based on deep learning. Through this experiment, the experiment was conducted in a situation where there is noise and echo generated in a general indoor environment, not in the ideal condition of an anechoic room, and the object recognition through echo was able to estimate the position of the object with 83% accuracy. In addition, it was possible to obtain visual information through sound through learning of 3D sound by mapping the inference result to the observation space and the 3D sound spatial signal and outputting it as sound. This means that the use of various echo information along with image information is required for object recognition research, and it is thought that this technology can be used for augmented reality through 3D sound.
https://doi.org/10.7236/JIIBC.2021.21.3.59 인용 PDF KSCI HTML

Mobile Phone Guide for Cultural Heritage (문화유적지 투어를 위한 모바일 폰 가이드 시스템)

Suh, Young-Jung;Woo, Woon-Tack
- 한국HCI학회:학술대회논문집
- /
- 2009.02a
- /
- pp.116-121
- /
- 2009
In the design of mobile entertainment systems for historical heritage sites, it is important to not only overcome technical challenges imposed by power requirements, computation limits, and connectivity, but to support group experiences and consider users preferences for situated media consumption. Cultural heritage sites provide an opportunity to entertain and educate the public through the use of mobile media. The proposed system implemented on a Java-enabled mobile phone provides both audio and visual content that is tailored by tracking user movement with GPS, collecting various user inputs and demographics, and allowing for socially acceptable eavesdropping via wireless networking. By designing for the spatial, personal, and social considerations of the environment, we aim to help users navigate the diverse topology of the space and consume the vast quantities of historical media.
PDF

Effective Method to Change Multimedia Scene Configuration Information Using DOM Update (DOM update를 이용한 효율적인 멀티미디어 장면 구성 정보 변경 방안)

Kim, Kyuheon;Park, JungWook;Kim, Byungchul
- Journal of Broadcast Engineering
- /
- v.18 no.1
- /
- pp.43-58
- /
- 2013
Richmedia Service means that interactive media service can provide view with various multimedia elements(such as Video, Audio, Text) at same time. Various Multimedia elements can be serviced by Scene Description technology standards like BIFS(Binary Format for Scenes) and LASeR(Light Application Scene Representation). By providing Scene Component information, richmedia service is available to various multimedia services. so users is available to personalized services fitting temporal and spatial options. In conventional technology, when the scene is changed by user or service, mobile deletes the scene of configuration information and makes new scene of configuration information. this is a very inefficient way. In this paper, Propoesed that by using DOM(Document Object Model) method, to pass only the dynamic configuration part, changes scene method.
https://doi.org/10.5909/JBE.2013.18.1.43 인용 PDF KSCI

A Study on Contents-based Retrieval using Wavelet (Wavelet을 이용한 내용기반 검색에 관한 연구)

강진석;박재필;나인호;최연성;김장형
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.4 no.5
- /
- pp.1051-1066
- /
- 2000
According to the recent advances of digital encoding technologies and computing power, large amounts of multimedia informations such as image, graphic, audio and video are fully used in multimedia systems through Internet. By this, diverse retrieval mechanisms are required for users to search dedicated informations stored in multimedia systems, and especially it is preferred to use contents-based retrieval method rather than text-type keyword retrieval method. In this paper, we propose a new contents-based indexing and searching algorithm which aims to get both high efficiency and high retrieval performance. To achieve these objectives, firstly the proposed algorithm classifies images by a pre-processing process of edge extraction, range division, and multiple filtering, and secondly it searches the target images using spatial and textural characteristics of colors, which are extracted from the previous process, in a image. In addition, we describe the simulation results of search requests and retrieval outputs for several images of company's trade-mark using the proposed contents-based retrieval algorithm based on wavelet.
PDF

An Improved Synthesis Method of Parametric Stereo Coding Based on Tonality Information (토널리티 정보를 기반으로 한 파라메트릭 스테레오 부호화의 개선된 합성 기법)

Lee, Tung chin;Park, Young-Cheol;Youn, Dae Hee
- Journal of the Institute of Electronics and Information Engineers
- /
- v.51 no.6
- /
- pp.221-227
- /
- 2014
In this paper, we propose a synthesis method that can effectively suppress the ambience which affects tonal components in the PS decoder. Ambience component was obtained by using decorrelation filter and the weighting of the ambience in the decoder was determined through IC parameter. However, since the parameters are extracted in the sub-band domain, a low IC value could be analyzed even if the tonal component is dominant. The quality of the output signal may be degraded. To prevent this problem, the tonality was measured in the downmixed signal and the weighting of the ambience components were adjusted appropriately according to the measured tonality index. The performance of the proposed method was evaluated by simulations. Furthermore, the subjective test was performed and the results confirmed that the proposed method offers improved quality.
https://doi.org/10.5573/ieie.2014.51.6.221 인용 PDF KSCI

Method of scalable video application in the advanced T-DMB (지상파 DMB 고도화 망에서의 스케일러블 비디오 부호화 기술)

Jun, Dong-San;Kwak, Sang-Min;Lim, Hyung-Soo;Choi, Hae-Chul;Kim, Jae-Gon;Lim, Jong-Soo;Hong, Jin-Woo
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.44 no.1
- /
- pp.1-9
- /
- 2007
Digital Multimedia Broadcasting is the next generation broadcasting service which enables various digital multimedia contents, i.e., audio and video, and data access for mobile users. However, due to the bandwidth limitation, the spatial resolution is limited to CIF(Common Interleaved Frame). The Advanced Terrestrial DMB (AT-DMB) secures additional bandwidth by adopting hierarchical modulation transmission technology and provides high data rate and quality for mobile multimedia broadcasting services with scalable video coding(SVC). This paper proposes scalable video coding technology for AT-DMB which enables high quality mobile multimedia broadcasting services that exceeds current DMB service's quality and contents capability.
PDF KSCI

Search Result 90, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)