• Title/Summary/Keyword: Audio-Visual Information

Search Result 207, Processing Time 0.027 seconds

Story-based Information Retrieval (스토리 기반의 정보 검색 연구)

  • You, Eun-Soon;Park, Seung-Bo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.81-96
    • /
    • 2013
  • Video information retrieval has become a very important issue because of the explosive increase in video data from Web content development. Meanwhile, content-based video analysis using visual features has been the main source for video information retrieval and browsing. Content in video can be represented with content-based analysis techniques, which can extract various features from audio-visual data such as frames, shots, colors, texture, or shape. Moreover, similarity between videos can be measured through content-based analysis. However, a movie that is one of typical types of video data is organized by story as well as audio-visual data. This causes a semantic gap between significant information recognized by people and information resulting from content-based analysis, when content-based video analysis using only audio-visual data of low level is applied to information retrieval of movie. The reason for this semantic gap is that the story line for a movie is high level information, with relationships in the content that changes as the movie progresses. Information retrieval related to the story line of a movie cannot be executed by only content-based analysis techniques. A formal model is needed, which can determine relationships among movie contents, or track meaning changes, in order to accurately retrieve the story information. Recently, story-based video analysis techniques have emerged using a social network concept for story information retrieval. These approaches represent a story by using the relationships between characters in a movie, but these approaches have problems. First, they do not express dynamic changes in relationships between characters according to story development. Second, they miss profound information, such as emotions indicating the identities and psychological states of the characters. Emotion is essential to understanding a character's motivation, conflict, and resolution. Third, they do not take account of events and background that contribute to the story. As a result, this paper reviews the importance and weaknesses of previous video analysis methods ranging from content-based approaches to story analysis based on social network. Also, we suggest necessary elements, such as character, background, and events, based on narrative structures introduced in the literature. We extract characters' emotional words from the script of the movie Pretty Woman by using the hierarchical attribute of WordNet, which is an extensive English thesaurus. WordNet offers relationships between words (e.g., synonyms, hypernyms, hyponyms, antonyms). We present a method to visualize the emotional pattern of a character over time. Second, a character's inner nature must be predetermined in order to model a character arc that can depict the character's growth and development. To this end, we analyze the amount of the character's dialogue in the script and track the character's inner nature using social network concepts, such as in-degree (incoming links) and out-degree (outgoing links). Additionally, we propose a method that can track a character's inner nature by tracing indices such as degree, in-degree, and out-degree of the character network in a movie through its progression. Finally, the spatial background where characters meet and where events take place is an important element in the story. We take advantage of the movie script to extracting significant spatial background and suggest a scene map describing spatial arrangements and distances in the movie. Important places where main characters first meet or where they stay during long periods of time can be extracted through this scene map. In view of the aforementioned three elements (character, event, background), we extract a variety of information related to the story and evaluate the performance of the proposed method. We can track story information extracted over time and detect a change in the character's emotion or inner nature, spatial movement, and conflicts and resolutions in the story.

Analysis on the Possibility of Electronic Surveillance Society in the Intelligence Information age

  • Chung, Choong-Sik
    • Journal of Platform Technology
    • /
    • v.6 no.4
    • /
    • pp.11-17
    • /
    • 2018
  • In the smart intelligence information society, there is a possibility that the social dysfunction such as the personal information protection issue and the risk to the electronic surveillance society may be highlighted. In this paper, we refer to various categories and classify electronic surveillance into audio surveillance, visual surveillance, location surveillance, biometric information surveillance, and data surveillance. In order to respond to new electronic surveillance in the intelligent information society, it requires a change of perception that is different from that of the past. This starts with the importance of digital privacy and results in the right to self-determination of personal information. Therefore, in order to preemptively respond to the dysfunctions that may arise in the intelligent information society, it is necessary to further raise the awareness of the civil society to protect information human rights.

Some effects of audio-visual speech in perceiving Korean

  • Kim, Jee-Sun;Davis, Chris
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.335-342
    • /
    • 1999
  • The experiments reported here investigated whether seeing a speaker's face (visible speech) affects the perception and memory of Korean speech sounds. In order to exclude the possibility of top-down, knowledge-based influences on perception and memory, the experiments tested people with no knowledge of Korean. The first experiment examined whether visible speech (Auditory and Visual - AV) assists English native speakers (with no knowledge of Korean) in the detection of a syllable within a Korean speech phrase. It was found that a syllable was more likely to be detected within a phrase when the participants could see the speaker's face. The second experiment investigated whether English native speakers' judgments about the duration of a Korean phrase would be affected by visible speech. It was found that in the AV condition participant's estimates of phrase duration were highly correlated with the actual durations whereas those in the AO condition were not. The results are discussed with respect to the benefits of communication with multimodal information and future applications.

  • PDF

MultiFormat motion picture storage subsystem using DirectShow Filters for a Mutichannel Visual Monitoring System (다채널 영상 감시 시스템을 위한 다중 포맷 동영상 저장 DirectShow Filter설계 및 구현)

  • 정연권;하상석;정선태
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.113-116
    • /
    • 2002
  • Windows provides Directshow for efficient multimedia streaming processings such as multimedia capture, storage, display and etc. Presently, many motion picture codecs and audio codecs are made to be used in Directshow framework and Windows also supports many codecs (MPEG4, H,263, WMV, WMA, ASF, etc.) in addition to a lot of useful tools for multimedia streaming processing. Therefore, Directshow can be effectively utilized for developing windows-based multimedia streaming applications such as visual monitoring systems which needs to store real-time video data for later retrieval. In this paper, we present our efforts for developing a Directshow Filter System supporting storage of motion pictures in various motion picture codecs. Our Directshow Filter system also provides an additional functionality of motion detection.

  • PDF

Lip Feature Extraction using Contrast of YCbCr (YCbCr 농도 대비를 이용한 입술특징 추출)

  • Kim, Woo-Sung;Min, Kyung-Won;Ko, Han-Seok
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.259-260
    • /
    • 2006
  • Since audio speech recognition is affected by noise in real environment, visual speech recognition is used to support speech recognition. For the visual speech recognition, this paper suggests the extraction of lip-feature using two types of image segmentation and reduced ASM. Input images are transformed to YCbCr based images and lips are segmented using the contrast of Y/Cb/Cr between lip and face. Subsequently, lip-shape model trained by PCA is placed on segmented lip region and then lip features are extracted using ASM.

  • PDF

DECODE: A Novel Method of DEep CNN-based Object DEtection using Chirps Emission and Echo Signals in Indoor Environment (실내 환경에서 Chirp Emission과 Echo Signal을 이용한 심층신경망 기반 객체 감지 기법)

  • Nam, Hyunsoo;Jeong, Jongpil
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.3
    • /
    • pp.59-66
    • /
    • 2021
  • Humans mainly recognize surrounding objects using visual and auditory information among the five senses (sight, hearing, smell, touch, taste). Major research related to the latest object recognition mainly focuses on analysis using image sensor information. In this paper, after emitting various chirp audio signals into the observation space, collecting echoes through a 2-channel receiving sensor, converting them into spectral images, an object recognition experiment in 3D space was conducted using an image learning algorithm based on deep learning. Through this experiment, the experiment was conducted in a situation where there is noise and echo generated in a general indoor environment, not in the ideal condition of an anechoic room, and the object recognition through echo was able to estimate the position of the object with 83% accuracy. In addition, it was possible to obtain visual information through sound through learning of 3D sound by mapping the inference result to the observation space and the 3D sound spatial signal and outputting it as sound. This means that the use of various echo information along with image information is required for object recognition research, and it is thought that this technology can be used for augmented reality through 3D sound.

Audio Generative AI Usage Pattern Analysis by the Exploratory Study on the Participatory Assessment Process

  • Hanjin Lee;Yeeun Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.47-54
    • /
    • 2024
  • The importance of cultural arts education utilizing digital tools is increasing in terms of enhancing tech literacy, self-expression, and developing convergent capabilities. The creation process and evaluation of innovative multi-modal AI, provides expanded creative audio-visual experiences in users. In particular, the process of creating music with AI provides innovative experiences in all areas, from musical ideas to improving lyrics, editing and variations. In this study, we attempted to empirically analyze the process of performing tasks using an Audio and Music Generative AI platform and discussing with fellow learners. As a result, 12 services and 10 types of evaluation criteria were collected through voluntary participation, and divided into usage patterns and purposes. The academic, technological, and policy implications were presented for AI-powered liberal arts education with learners' perspectives.

Design and Implementation of A Video Information Management System for Digital Libraries (디지털 도서관을 위한 동영상 정보 관리 시스템의 설계 및 구현)

  • 김현주;권재길;정재희;김인홍;강현석;배종민
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.2
    • /
    • pp.131-141
    • /
    • 1998
  • Video data occurred in multimedia documents consist of a large scale of irregular data including audio-visual, spatial-temporal, and semantic information. In general, it is difficult to grasp the exact meaning of such a video information because video data apparently consist of unmeaningful symbols and numerics. In order to relieve these difficulties, it is necessary to develop an integrated manager for complex structures of video data and provide users of video digital libraries with easy, systematic access mechanisms to video informations. This paper proposes a generic integrated video information model(GIVIM) based on an extended Dublin Core metadata system to effectively store and retrieve video documents in digital libraries. The GIVIM is an integrated mo이 of a video metadata model(VMN) and a video architecture information model(VAIM). We also present design and implementation results of a video document management system(VDMS) based on the GIVIM.

  • PDF

A Study on the Creative, Conceptual Using of Digital Technique (디지털 기법의 창조적, 개념적 활용의 유형에 관한 사례 연구 - 공간디자인 프로세스를 중심으로 -)

  • 박영태
    • Korean Institute of Interior Design Journal
    • /
    • no.28
    • /
    • pp.158-166
    • /
    • 2001
  • e-revolution makes a lot of changes in the methodology all over the world. That is, the theory of real time showing helps people to access audio and visual wherever and whenever they are. In the pst computers were considered as only tools which could make us work easily. However, the meaning of computer is changing with e-revolution nowadays. Computers are not just computers as they were; they have done a lot of things which we thought impossible and they will do in the future as well. This new wave encourages people who are teaching the design to use computers whatever they do. For example, instead of using pencil and a drafting board, most people in the design field work with monitors, mouse and plotter. Therefore, most people who are in the design field need to have the ability of computer skills. They have to use computers not only in their class but also in their office. However, if we use computers for visual presenting in the class, it will not be enough to catch the e-revolution. That is, we should work with computers in the creative and conceptual design such as the using of the design information and the applying digital techniques in the early stage of the work. The purpose of this study is to show how to work with computers in the spacial design process especially th using of the DIS(Design Information System) and the applying digital techniques in the early stage of the work.

  • PDF

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment (모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구)

  • Gyu, Song-Min;Pham, Thanh Trung;Kim, Jin-Young;Taek, Hwang-Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.478-484
    • /
    • 2009
  • Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.