• Title/Summary/Keyword: visual/audio system

Search Result 150, Processing Time 0.022 seconds

New Interactive TV Service Model based on the MPEG-4 System

  • Kim, Jongho;Jechang Jeong
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.125-128
    • /
    • 2002
  • In this paper, a new interactive TV service model is proposed. The MPEG-4 system is specified for composing and managing various object streams including user interactions. The data broadcasting model supporting user interactions is designed using MPEG-4 system in our proposal. We evaluate possibility of proposed service model using simulation player. This player supports MPEG-2 TS which contains MPEG-2 video and AC-3 audio streams as a main service and MPEC-4 system data as interactive services as well as user specific EPG information, and XML data, etc as supplemetary services. The player also supports a multi-channel environment. The synchronization between audio and visual data is achieved by DTS and PTS in TS.

  • PDF

Implementation of the Broadcasting System for Digital Media Contents (디지털 미디어 콘텐츠 방송 시스템 구현)

  • Shin, Jae-Heung;Kim, Hong-Ryul;Lee, Sang-Cheal
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.10
    • /
    • pp.1883-1887
    • /
    • 2008
  • Most of digital media contents are composed with video and audio, picture and animation informations. Sometime, there is some deviation of information recognition quality for the video and audio information according to information receiver's characteristics or the understanding. But visual information using the text provide most clear and accurate ways for information recognition to human being. In this paper, we propose a new broadcasting system(BSDMC) to transmit clear and accurate meaning of the digital media contents. We implement general-purpose components to display the video, picture, text and symbol simultaneously. Only plug-in and call these components with proper parameters on the application developing tool, we can easily develop the multimedia contents broadcasting system. These components are implemented based on the object-oriented framework and modular structure so that increase the reusability and can be develop other applications quick and reliable.

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

  • Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.233-243
    • /
    • 2016
  • Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.

Multimedia TIAV System

  • Beknazarova, Saida Safibullayevna
    • Journal of Multimedia Information System
    • /
    • v.2 no.4
    • /
    • pp.295-302
    • /
    • 2015
  • This article discusses the features and trends of development of the process of implementation of multimedia systems in various fields, research substantiate the basic concepts of multimedia systems, information flow, describes the classification and characterization of information flows and systems. Described container TIAV, which is designed with all the modern features and is aimed at future trends in the field of play.

MPEG-4 BIFS Optimization for Interactive T-DMB Content (지상파 DMB 컨텐츠의 MPEG-4 BIFS 최적화 기법)

  • Cha, Kyung-Ae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.1
    • /
    • pp.54-60
    • /
    • 2007
  • The Digital Multimedia Broadcasting(DMB) system is developed to offer high quality multimedia content to the mobile environment. The system adopts the MPEG-4 standard for the main video, audio and other media format. For providing interactive contents, it also adopts the MPEG-4 scene description that refers to the spatio-temporal specifications and behaviors of individual objects. With more interactive contents, the scene description also needs higher bitrate. However, the bandwidth for allocating meta data, such as scene description is restrictive in the mobile environment. On one hand, the DMB terminal renders each media stream according to the scene description. Thus the binary format for scene(BIFS) stream corresponding to the scene description should be decoded and parsed in advance when presenting media data. With this reasoning, the transmission delay of the BIFS stream would cause the delay in transmitting whole audio-visual scene presentations, although the audio or video streams are encoded in very low bitrate. This paper presents the effective optimization technique in adapting the BIFS stream into the expected bitrate without any waste in bandwidth and avoiding transmission delays inthe initial scene description for interactive DMB content.

  • PDF

Visual Telephone System of Differential Task Interrupt Method (차등 태스크 인터럽트 방식의 영상단말 시스템)

  • 박배욱;정하재;오창석
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.5
    • /
    • pp.739-746
    • /
    • 2002
  • In this paper, a new visual telephone system which has a differential task interrupt transfer feature for real time video phone service is presented. Owing to the result of Interrupt transfer of different speed according to the time critical degree of tasks, the flow of audio and video data stream can be kept as constant speed in other word that means video phone services are carried out in real time. The ITU-T H.32x visual telephone recommendations are first analyzed, and the unsatisfactory items of existing systems are second inquired the cause, such as performance, quality. And then the design concept and ideas which enable it to solve them are third devised, the next, the new architecture of visual telephone system for real time video phone source are designed, which make it possible to solve the existing problems by means of different tasks interrupt transfer method.

Aural-visual two-stream based infant cry recognition (Aural-visual two-stream 기반의 아기 울음소리 식별)

  • Bo, Zhao;Lee, Jonguk;Atif, Othmane;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.354-357
    • /
    • 2021
  • Infants communicate their feelings and needs to the outside world through non-verbal methods such as crying and displaying diverse facial expressions. However, inexperienced parents tend to decode these non-verbal messages incorrectly and take inappropriate actions, which might affect the bonding they build with their babies and the cognitive development of the newborns. In this paper, we propose an aural-visual two-stream based infant cry recognition system to help parents comprehend the feelings and needs of crying babies. The proposed system first extracts the features from the pre-processed audio and video data by using the VGGish model and 3D-CNN model respectively, fuses the extracted features using a fully connected layer, and finally applies a SoftMax function to classify the fused features and recognize the corresponding type of cry. The experimental results show that the proposed system classification exceeds 0.92 in F1-score, which is 0.08 and 0.10 higher than the single-stream aural model and single-stream visual model.

Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition (바이모달 음성인식의 음성정보와 입술정보 결합방법 비교)

  • 박병구;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.31-37
    • /
    • 1999
  • A bimodal speech recognition using visual and audio information has been proposed and researched to improve the performance of ASR(Automatic Speech Recognition) system in noisy environments. The integration method of two modalities can be usually classified into an early integration and a late integration. The early integration method includes a method using a fixed weight of lip parameters and a method using a variable weight according to speech SNR information. The 4 late integration methods are a method using audio and visual information independently, a method using speech optimal path, a method using lip optimal path and a way using speech SNR information. Among these 6 methods, the method using the fixed weight of lip parameter showed a better recognition rate.

  • PDF

Protocol Testing Methodology of DAVIC Standard (DAVIC 표준의 프로토콜 시험 방안 연구)

  • O, Haeng-Seok;Park, Gi-Sik;Lee, Sang-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.1
    • /
    • pp.203-215
    • /
    • 1999
  • Recently with the rapid development of data communication products and service industry, the system on behalf of users for multimedia services such as VoD(Video on Demand), Teleshopping are at the height of development, However, if the new style products and services do not conform to the international standard, the products will come to lose the competitive power in the market. Therefore, it is essential to have conformance testing considered the properties of related protocol for the interoperability of the products and services. As a systematic and efficient conformance testing method of new products in respect of the main protocol of DAVIC(Digital Audio Visual Council) standard, this paper presents the protocol test methodology and architecture ; a single protocol stack an the related protocol stacks testing.

  • PDF

MultiFormat motion picture storage subsystem using DirectShow Filters for a Mutichannel Visual Monitoring System (다채널 영상 감시 시스템을 위한 다중 포맷 동영상 저장 DirectShow Filter설계 및 구현)

  • 정연권;하상석;정선태
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.113-116
    • /
    • 2002
  • Windows provides Directshow for efficient multimedia streaming processings such as multimedia capture, storage, display and etc. Presently, many motion picture codecs and audio codecs are made to be used in Directshow framework and Windows also supports many codecs (MPEG4, H,263, WMV, WMA, ASF, etc.) in addition to a lot of useful tools for multimedia streaming processing. Therefore, Directshow can be effectively utilized for developing windows-based multimedia streaming applications such as visual monitoring systems which needs to store real-time video data for later retrieval. In this paper, we present our efforts for developing a Directshow Filter System supporting storage of motion pictures in various motion picture codecs. Our Directshow Filter system also provides an additional functionality of motion detection.

  • PDF