• Title/Summary/Keyword: video recognition

Search Result 696, Processing Time 0.024 seconds

Application of Speech Recognition with Closed Caption for Content-Based Video Segmentations

  • Son, Jong-Mok;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.135-142
    • /
    • 2005
  • An important aspect of video indexing is the ability to segment video into meaningful segments, i.e., content-based video segmentation. Since the audio signal in the sound track is synchronized with image sequences in the video program, a speech signal in the sound track can be used to segment video into meaningful segments. In this paper, we propose a new approach to content-based video segmentation. This approach uses closed caption to construct a recognition network for speech recognition. Accurate time information for video segmentation is then obtained from the speech recognition process. For the video segmentation experiment for TV news programs, we made 56 video summaries successfully from 57 TV news stories. It demonstrates that the proposed scheme is very promising for content-based video segmentation.

  • PDF

Automatic Video Management System Using Face Recognition and MPEG-7 Visual Descriptors

  • Lee, Jae-Ho
    • ETRI Journal
    • /
    • v.27 no.6
    • /
    • pp.806-809
    • /
    • 2005
  • The main goal of this research is automatic video analysis using a face recognition technique. In this paper, an automatic video management system is introduced with a variety of functions enabled, such as index, edit, summarize, and retrieve multimedia data. The automatic management tool utilizes MPEG-7 visual descriptors to generate a video index for creating a summary. The resulting index generates a preview of a movie, and allows non-linear access with thumbnails. In addition, the index supports the searching of shots similar to a desired one within saved video sequences. Moreover, a face recognition technique is utilized to personalbased video summarization and indexing in stored video data.

  • PDF

Video Palmprint Recognition System Based on Modified Double-line-single-point Assisted Placement

  • Wu, Tengfei;Leng, Lu
    • Journal of Multimedia Information System
    • /
    • v.8 no.1
    • /
    • pp.23-30
    • /
    • 2021
  • Palmprint has become a popular biometric modality; however, palmprint recognition has not been conducted in video media. Video palmprint recognition (VPR) has some advantages that are absent in image palmprint recognition. In VPR, the registration and recognition can be automatically implemented without users' manual manipulation. A good-quality image can be selected from the video frames or generated from the fusion of multiple video frames. VPR in contactless mode overcomes several problems caused by contact mode; however, contactless mode, especially mobile mode, encounters with several revere challenges. Double-line-single-point (DLSP) assisted placement technique can overcome the challenges as well as effectively reduce the localization error and computation complexity. This paper modifies DLSP technique to reduce the invalid area in the frames. In addition, the valid frames, in which users place their hands correctly, are selected according to finger gap judgement, and then some key frames, which have good quality, are selected from the valid frames as the gallery samples that are matched with the query samples for authentication decision. The VPR algorithm is conducted on the system designed and developed on mobile device.

Action Recognition Method in Sports Video Shear Based on Fish Swarm Algorithm

  • Jie Sun;Lin Lu
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.554-562
    • /
    • 2023
  • This research offers a sports video action recognition approach based on the fish swarm algorithm in light of the low accuracy of existing sports video action recognition methods. A modified fish swarm algorithm is proposed to construct invariant features and decrease the dimension of features. Based on this algorithm, local features and global features can be classified. The experimental findings on the typical sports action data set demonstrate that the key details of sports action can be successfully retained by the dimensionality-reduced fusion invariant characteristics. According to this research, the average recognition time of the proposed method for walking, running, squatting, sitting, and bending is less than 326 seconds, and the average recognition rate is higher than 94%. This proves that this method can significantly improve the performance and efficiency of online sports video motion recognition.

Television and Video Viewing at Early Childhood All-day Program Settings and Teachers' Recognition of Its Effects on Young Children (영유아 기관에서의 TV·비디오시청과 교사인식)

  • Suh, Young Sook;Chun, Hye Jung
    • Korean Journal of Child Studies
    • /
    • v.26 no.6
    • /
    • pp.321-334
    • /
    • 2005
  • This research investigated television and video viewing of young children in early childhood all-day program settings and teachers' recognition of its effects on young children through the survey of 452 early childhood teachers. The results show that television and video viewing is used as a whole group activity during transition period and/or waiting time activity for children who come earlier in the morning and remain late until closing time. It means television and video viewing at early childhood settings is mainly used as a group baby sitter or pacifier. Daily viewing time is about 44.02 minutes and early childhood teachers show low recognition of their role in children's viewing habits. Young children's viewing patterns and time are differed by teachers' variables so that young children of beginning teachers at small size settings appear more viewing time. Teachers show more negative recognition of television and video viewing on young children when they are older and have higher educational level and longer education experiences. The results also show that the more teachers have positive recognition on television and video viewing, the more young children are exposed to television and video viewing in their classes.

  • PDF

A Study on the Use of Speech Recognition Technology for Content-based Video Indexing and Retrieval (내용기반 비디오 색인 및 검색을 위한 음성인식기술 이용에 관한 연구)

  • 손종목;배건성;강경옥;김재곤
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.16-20
    • /
    • 2001
  • An important aspect of video program indexing and retrieval is the ability to segment video program into meaningful segments, in other words, the ability of content-based video program segmentation. In this paper, a new approach using speech recognition technology has been proposed for content-based video program segmentation. This approach uses speech recognition technique to synchronize closed caption with speech signal. Experimental results demonstrate that the proposed scheme is very promising for content-based video program segmentation.

  • PDF

Video Expression Recognition Method Based on Spatiotemporal Recurrent Neural Network and Feature Fusion

  • Zhou, Xuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.337-351
    • /
    • 2021
  • Automatically recognizing facial expressions in video sequences is a challenging task because there is little direct correlation between facial features and subjective emotions in video. To overcome the problem, a video facial expression recognition method using spatiotemporal recurrent neural network and feature fusion is proposed. Firstly, the video is preprocessed. Then, the double-layer cascade structure is used to detect a face in a video image. In addition, two deep convolutional neural networks are used to extract the time-domain and airspace facial features in the video. The spatial convolutional neural network is used to extract the spatial information features from each frame of the static expression images in the video. The temporal convolutional neural network is used to extract the dynamic information features from the optical flow information from multiple frames of expression images in the video. A multiplication fusion is performed with the spatiotemporal features learned by the two deep convolutional neural networks. Finally, the fused features are input to the support vector machine to realize the facial expression classification task. The experimental results on cNTERFACE, RML, and AFEW6.0 datasets show that the recognition rates obtained by the proposed method are as high as 88.67%, 70.32%, and 63.84%, respectively. Comparative experiments show that the proposed method obtains higher recognition accuracy than other recently reported methods.

AnoVid: A Deep Neural Network-based Tool for Video Annotation (AnoVid: 비디오 주석을 위한 심층 신경망 기반의 도구)

  • Hwang, Jisu;Kim, Incheol
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.986-1005
    • /
    • 2020
  • In this paper, we propose AnoVid, an automated video annotation tool based on deep neural networks, that automatically generates various meta data for each scene or shot in a long drama video containing rich elements. To this end, a novel meta data schema for drama video is designed. Based on this schema, the AnoVid video annotation tool has a total of six deep neural network models for object detection, place recognition, time zone recognition, person recognition, activity detection, and description generation. Using these models, the AnoVid can generate rich video annotation data. In addition, AnoVid provides not only the ability to automatically generate a JSON-type video annotation data file, but also provides various visualization facilities to check the video content analysis results. Through experiments using a real drama video, "Misaeing", we show the practical effectiveness and performance of the proposed video annotation tool, AnoVid.

Face Detection based on Video Sequence (비디오 영상 기반의 얼굴 검색)

  • Ahn, Hyo-Chang;Rhee, Sang-Burm
    • Journal of the Semiconductor & Display Technology
    • /
    • v.7 no.3
    • /
    • pp.45-49
    • /
    • 2008
  • Face detection and tracking technology on video sequence has developed indebted to commercialization of teleconference, telecommunication, front stage of surveillance system using face recognition, and video-phone applications. Complex background, color distortion by luminance effect and condition of luminance has hindered face recognition system. In this paper, we have proceeded to research of face recognition on video sequence. We extracted facial area using luminance and chrominance component on $YC_bC_r$ color space. After extracting facial area, we have developed the face recognition system applied to our improved algorithm that combined PCA and LDA. Our proposed algorithm has shown 92% recognition rate which is more accurate performance than previous methods that are applied to PCA, or combined PCA and LDA.

  • PDF

Two-Stream Convolutional Neural Network for Video Action Recognition

  • Qiao, Han;Liu, Shuang;Xu, Qingzhen;Liu, Shouqiang;Yang, Wanggan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3668-3684
    • /
    • 2021
  • Video action recognition is widely used in video surveillance, behavior detection, human-computer interaction, medically assisted diagnosis and motion analysis. However, video action recognition can be disturbed by many factors, such as background, illumination and so on. Two-stream convolutional neural network uses the video spatial and temporal models to train separately, and performs fusion at the output end. The multi segment Two-Stream convolutional neural network model trains temporal and spatial information from the video to extract their feature and fuse them, then determine the category of video action. Google Xception model and the transfer learning is adopted in this paper, and the Xception model which trained on ImageNet is used as the initial weight. It greatly overcomes the problem of model underfitting caused by insufficient video behavior dataset, and it can effectively reduce the influence of various factors in the video. This way also greatly improves the accuracy and reduces the training time. What's more, to make up for the shortage of dataset, the kinetics400 dataset was used for pre-training, which greatly improved the accuracy of the model. In this applied research, through continuous efforts, the expected goal is basically achieved, and according to the study and research, the design of the original dual-flow model is improved.