• Title/Summary/Keyword: visual-audio

Search Result 424, Processing Time 0.027 seconds

XCRAB : A Content and Annotation-based Multimedia Indexing and Retrieval System (XCRAB :내용 및 주석 기반의 멀티미디어 인덱싱과 검색 시스템)

  • Lee, Soo-Chelo;Rho, Seung-Min;Hwang, Een-Jun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.5
    • /
    • pp.587-596
    • /
    • 2004
  • During recent years, a new framework, which aims to bring a unified and global approach in indexing, browsing and querying various digital multimedia data such as audio, video and image has been developed. This new system partitions each media stream into smaller units based on actual physical events. These physical events within oath media stream can then be effectively indexed for retrieval. In this paper, we present a new approach that exploits audio, image and video features to segment and analyze the audio-visual data. Integration of audio and visual analysis can overcome the weakness of previous approach that was based on the image or video analysis only. We Implement a web-based multi media data retrieval system called XCRAB and report on its experiment result.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Development of a Real-Time Driving Simulator for Vehicle System Development and Human Factor Study (차량 시스템 개발 및 운전자 인자 연구를 위한 실시간 차량 시뮬레이터의 개발)

  • 이승준
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.7 no.7
    • /
    • pp.250-257
    • /
    • 1999
  • Driving simulators are used effectively for human factor study, vehicle system development and other purposes by enabling to reproduce actural driving conditions in a safe and tightly controlled enviornment. Interactive simulation requries appropriate sensory and stimulus cuing to the driver . Sensory and stimulus feedback can include visual , auditory, motion, and proprioceptive cues. A fixed-base driving simulator has been developed in this study for vehicle system developmnet and human factor study . The simulator consists of improved and synergistic subsystems (a real-time vehicle simulation system, a visual/audio system and a control force loading system) based on the motion -base simulator, KMU DS-Ⅰ developed for design and evaluation of a full-scale driving simulator and for driver-vehicle interaction.

  • PDF

The Influence of SOA between the Visual and Auditory Stimuli with Semantic Properties on Integration of Audio-Visual Senses -Focus on the Redundant Target Effect and Visual Dominance Effect- (의미적 속성을 가진 시.청각자극의 SOA가 시청각 통합 현상에 미치는 영향 -중복 표적 효과와 시각 우세성 효과를 중심으로-)

  • Kim, Bo-Seong;Lee, Young-Chang;Lim, Dong-Hoon;Kim, Hyun-Woo;Min, Yoon-Ki
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.475-484
    • /
    • 2010
  • This study examined the influence of the SOA(stimulus onset asynchrony) between visual and auditory stimuli on the integration phenomenon of audio-visual senses. Within the stimulus integration phenomenon, the redundant target effect (the faster and more accurate response to the target stimulus when the target stimulus is presented with more than two modalities) and the visual dominance effect (the faster and more accurate response to a visual stimulus compared to an auditory stimulus) were examined as we composed a visual and auditory unimodal target condition and a multimodal target condition and then observed the response time and accuracy. Consequently, despite the change between visual and auditory stimuli SOA, there was no redundant target effect present. The auditory dominance effect appeared when the SOA between the two stimuli was over 100ms. Theses results imply that the redundant target effect is continuously maintained even when the SOA between two modal stimuli is altered, and also suggests that the behavioral results of superior information processing can only be deducted when the time difference between the onset of the auditory stimuli and the visual stimuli is approximately over 100ms.

  • PDF

Robust Feature Extraction Based on Image-based Approach for Visual Speech Recognition (시각 음성인식을 위한 영상 기반 접근방법에 기반한 강인한 시각 특징 파라미터의 추출 방법)

  • Gyu, Song-Min;Pham, Thanh Trung;Min, So-Hee;Kim, Jing-Young;Na, Seung-You;Hwang, Sung-Taek
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.3
    • /
    • pp.348-355
    • /
    • 2010
  • In spite of development in speech recognition technology, speech recognition under noisy environment is still a difficult task. To solve this problem, Researchers has been proposed different methods where they have been used visual information except audio information for visual speech recognition. However, visual information also has visual noises as well as the noises of audio information, and this visual noises cause degradation in visual speech recognition. Therefore, it is one the field of interest how to extract visual features parameter for enhancing visual speech recognition performance. In this paper, we propose a method for visual feature parameter extraction based on image-base approach for enhancing recognition performance of the HMM based visual speech recognizer. For experiments, we have constructed Audio-visual database which is consisted with 105 speackers and each speaker has uttered 62 words. We have applied histogram matching, lip folding, RASTA filtering, Liner Mask, DCT and PCA. The experimental results show that the recognition performance of our proposed method enhanced at about 21% than the baseline method.

Research on Audiovisual Type Preservation Format Selection Criteria and Recommended Formats: Focusing on Audio Types (시청각 유형 보존포맷 선정기준 및 권고포맷 연구 - 오디오 유형을 중심으로 -)

  • Hanyeok Jeon;Dongmin Yang
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.35 no.1
    • /
    • pp.273-300
    • /
    • 2024
  • In the electronic records environment, along with discussions on ways to digitize analog records, it is important to prepare preservation strategies for each type of records produced and received electronically. In the same context, there is a need for discussion on applying a preservation format selection system with the goal of long-term preservation of data sets and audio-visual type electronic records other than document types. Audiovisual records must apply preservation strategies appropriate to the characteristics of each medium, such as images, audio, and video. This study establishes unique standards for selecting a preservation format for audio-visual electronic records through analysis of Significant Properties based on literature review, composed audio-type preservation format suitability evaluation items, and proposed a recommended format based on the results of applying them.

Dimension-Reduced Audio Spectrum Projection Features for Classifying Video Sound Clips

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.3E
    • /
    • pp.89-94
    • /
    • 2006
  • For audio indexing and targeted search of specific audio or corresponding visual contents, the MPEG-7 standard has adopted a sound classification framework, in which dimension-reduced Audio Spectrum Projection (ASP) features are used to train continuous hidden Markov models (HMMs) for classification of various sounds. The MPEG-7 employs Principal Component Analysis (PCA) or Independent Component Analysis (ICA) for the dimensional reduction. Other well-established techniques include Non-negative Matrix Factorization (NMF), Linear Discriminant Analysis (LDA) and Discrete Cosine Transformation (DCT). In this paper we compare the performance of different dimensional reduction methods with Gaussian mixture models (GMMs) and HMMs in the classifying video sound clips.

The Effect of Visual Cues in the Identification of the English Consonants /b/ and /v/ by Native Korean Speakers (한국어 화자의 영어 양순음 /b/와 순치음 /v/ 식별에서 시각 단서의 효과)

  • Kim, Yoon-Hyun;Koh, Sung-Ryong;Valerie, Hazan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.25-30
    • /
    • 2012
  • This study investigated whether native Korean listeners could use visual cues for the identification of the English consonants /b/ and /v/. Both auditory and audiovisual tokens of word minimal pairs in which the target phonemes were located in word-initial or word-medial position were used. Participants were instructed to decide which consonant they heard in $2{\times}2$ conditions: cue (audio-only, audiovisual) and location (word-initial, word-medial). Mean identification scores were significantly higher for audiovisual than audio-only condition and for word-initial than word-medial condition. Also, according to signal detection theory, sensitivity, d', and response bias, c were calculated based on both hit rates and false alarm rates. The measures showed that the higher identification rate in the audiovisual condition was related with an increase in sensitivity. There were no significant differences in response bias measures across conditions. This result suggests that native Korean speakers can use visual cues while identifying confusing non-native phonemic contrasts. Visual cues can enhance non-native speech perception.

'EVE-SoundTM' Toolkit for Interactive Sound in Virtual Environment (가상환경의 인터랙티브 사운드를 위한 'EVE-SoundTM' 툴킷)

  • Nam, Yang-Hee;Sung, Suk-Jeong
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.273-280
    • /
    • 2007
  • This paper presents a new 3D sound toolkit called $EVE-Sound^{TM}$ that consists of pre-processing tool for environment simplification preserving sound effect and 3D sound API for real-time rendering. It is designed so that it can allow users to interact with complex 3D virtual environments by audio-visual modalities. $EVE-Sound^{TM}$ toolkit would serve two different types of users: high-level programmers who need an easy-to-use sound API for developing realistic 3D audio-visually rendered applications, and the researchers in 3D sound field who need to experiment with or develop new algorithms while not wanting to re-write all the required code from scratch. An interactive virtual environment application is created with the sound engine constructed using $EVE-Sound^{TM}$ toolkit, and it shows the real-time audio-visual rendering performance and the applicability of proposed $EVE-Sound^{TM}$ for building interactive applications with complex 3D environments.