• Title/Summary/Keyword: video recognition

Search Result 696, Processing Time 0.038 seconds

A Semantic-based rate control method for motion video coding (동영상 부호화를 위한 의미 기반 Rate control 기법)

  • 이봉호;전경재;곽노윤;강태하;황병원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.3B
    • /
    • pp.529-540
    • /
    • 2000
  • This is paper presents the semantic based rate-control method which is based on very low bit rate video coding standards H.263 plus, applied on very low bit rate applications. Previous rate control methods control the generated bit rates by setting the optimum quantization parameters per macro block unit on frame. But, in this paper, we added the pre-processing algorithm, semantic region recognition and assignment of priority algorithm, to obtain the subjective quality enhancement. This work aims to improve the subjective quality of skin color region or face by using unimportant background region's bit resources.

  • PDF

Hardware Implementation for Stabilization of Detected Face Area (검출된 얼굴 영역 안정화를 위한 하드웨어 구현)

  • Cho, Ho-Sang;Jang, Kyoung-Hoon;Kang, Hyun-Jung;Kang, Bong-Soon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.13 no.2
    • /
    • pp.77-82
    • /
    • 2012
  • This paper presents a hardware-implemented face regions stabilization algorithm that stabilizes facial regions using the locations and sizes of human faces found by a face detection system. Face detection algorithms extract facial features or patterns determining the presence of a face from a video source and detect faces via a classifier trained on example faces. But face detection results has big variations in the detected locations and sizes of faces by slight shaking. To address this problem, the high frequency reduce filter that reduces variations in the detected face regions by taking into account the face range information between the current and previous video frames are implemented in addition to center distance comparison and zooming operations.

Artificial Intelligence-based Echocardiogram Video Classification by Aggregating Dynamic Information

  • Ye, Zi;Kumar, Yogan J.;Sing, Goh O.;Song, Fengyan;Ni, Xianda;Wang, Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.500-521
    • /
    • 2021
  • Echocardiography, an ultrasound scan of the heart, is regarded as the primary physiological test for heart disease diagnoses. How an echocardiogram is interpreted also relies intensively on the determination of the view. Some of such views are identified as standard views because of the presentation and ease of the evaluations of the major cardiac structures of them. However, finding valid cardiac views has traditionally been time-consuming, and a laborious process because medical imaging is interpreted manually by the specialist. Therefore, this study aims to speed up the diagnosis process and reduce diagnostic error by providing an automated identification of standard cardiac views based on deep learning technology. More importantly, based on a brand-new echocardiogram dataset of the Asian race, our research considers and assesses some new neural network architectures driven by action recognition in video. Finally, the research concludes and verifies that these methods aggregating dynamic information will receive a stronger classification effect.

Salt and Pepper Noise Removal using Neighborhood Pixels (이웃한 픽셀을 이용한 Salt and Pepper 잡음제거)

  • Baek, Ji-Hyeoun;Kim, Chul-Ki;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.217-219
    • /
    • 2019
  • In response to the increased use of digital video device, more researches are actively made on the image processing technologies. Image processing is practically used on various applied fields such as medical photographic interpretation, and object recognition. The types of image noise include Gaussian Noise, Impulse Noise, and Salt and Pepper. Noise refers to the unnecessary information which damages the video and the noise is mainly removed by a filter. Typical noise removal methods are Median Filter and Average Filter. While Median Filter is effective for removing Salt and Pepper noise, the noise removal performance is relatively lower in the environment with high noise density. To address such issue, this study suggested an algorithm which utilizes neighboring pixels to remove noise.

  • PDF

A Matching Method of Recommendations Advertisements by Extracting Immersive 360-degree Video Object (실감형 360도 영상저작물 객체 추출을 통한 추천광고 매칭방법)

  • Jang, Seyoung;Park, Byeongchan;Kim, Youngmo;Yoo, Injae;Lee, Jeacheng;Kim, Seok-Yoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.231-233
    • /
    • 2020
  • 최근 360도 형태로 영상을 촬영하고 제공하는 경우가 많아 일반적인 동영상과 달리 360도 형태의 영상저작물에 적절하고 효과적인 방법으로 광고를 삽입하여 노출 시킬 수 있는 방법이 필요하게 되었다. 따라서 본 논문에서는 실감형 360도 영상저작물 객체 추출을 통한 추천 광고 매칭방법을 제안한다. 360도 영상저작물 내에 광고를 매칭하고 추출된 객체와 연관된 광고를 추출하여 해당 프레임에 자동으로 삽입 노출이 가능하도록 하는 방법으로 이 방법을 이용함으로써 사용자의 현재 시점 영역 내에 광고 영상이 노출되도록 광고의 삽입 위치를 이동시켜 영상이 재생되도록 하거나, 광고 영상이 삽입된 좌표로 사용자의 현재 시점을 이동시켜 영상이 재생되게 할 수 있다.

  • PDF

Human Gesture Recognition Technology Based on User Experience for Multimedia Contents Control (멀티미디어 콘텐츠 제어를 위한 사용자 경험 기반 동작 인식 기술)

  • Kim, Yun-Sik;Park, Sang-Yun;Ok, Soo-Yol;Lee, Suk-Hwan;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.10
    • /
    • pp.1196-1204
    • /
    • 2012
  • In this paper, a series of algorithms are proposed for controlling different kinds of multimedia contents and realizing interact between human and computer by using single input device. Human gesture recognition based on NUI is presented firstly in my paper. Since the image information we get it from camera is not sensitive for further processing, we transform it to YCbCr color space, and then morphological processing algorithm is used to delete unuseful noise. Boundary Energy and depth information is extracted for hand detection. After we receive the image of hand detection, PCA algorithm is used to recognize hand posture, difference image and moment method are used to detect hand centroid and extract trajectory of hand movement. 8 direction codes are defined for quantifying gesture trajectory, so the symbol value will be affirmed. Furthermore, HMM algorithm is used for hand gesture recognition based on the symbol value. According to series of methods we presented, we can control multimedia contents by using human gesture recognition. Through large numbers of experiments, the algorithms we presented have satisfying performance, hand detection rate is up to 94.25%, gesture recognition rate exceed 92.6%, hand posture recognition rate can achieve 85.86%, and face detection rate is up to 89.58%. According to these experiment results, we can control many kinds of multimedia contents on computer effectively, such as video player, MP3, e-book and so on.

Optimization of 3D ResNet Depth for Domain Adaptation in Excavator Activity Recognition

  • Seungwon SEO;Choongwan KOO
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1307-1307
    • /
    • 2024
  • Recent research on heavy equipment has been conducted for the purposes of enhanced safety, productivity improvement, and carbon neutrality at construction sites. A sensor-based approach is being explored to monitor the location and movements of heavy equipment in real time. However, it poses significant challenges in terms of time and cost as multiple sensors should be installed on numerous heavy equipment at construction sites. In addition, there is a limitation in identifying the collaboration or interference between two or more heavy equipment. In light of this, a vision-based deep learning approach is being actively conducted to effectively respond to various working conditions and dynamic environments. To enhance the performance of a vision-based activity recognition model, it is essential to secure a sufficient amount of training datasets (i.e., video datasets collected from actual construction sites). However, due to safety and security issues at construction sites, there are limitations in adequately collecting training dataset under various situations and environmental conditions. In addition, the videos feature a sequence of multiple activities of heavy equipment, making it challenging to clearly distinguish the boundaries between preceding and subsequent activities. To address these challenges, this study proposed a domain adaptation in vision-based transfer learning for automated excavator activity recognition utilizing 3D ResNet (residual deep neural network). Particularly, this study aimed to identify the optimal depth of 3D ResNet (i.e., the number of layers of the feature extractor) suitable for domain adaptation via fine-tuning process. To achieve this, this study sought to evaluate the activity recognition performance of five 3D ResNet models with 18, 34, 50, 101, and 152 layers, which used two consecutive videos with multiple activities (5 mins, 33 secs and 10 mins, 6 secs) collected from actual construction sites. First, pretrained weights from large-scale datasets (i.e., Kinetic-700 and Moment in Time (MiT)) in other domains (e.g., humans, animals, natural phenomena) were utilized. Second, five 3D ResNet models were fine-tuned using a customized dataset (14,185 clips, 60,606 secs). As an evaluation index for activity recognition model, the F1 score showed 0.881, 0.689, 0.74, 0.684, and 0.569 for the five 3D ResNet models, with the 18-layer model performing the best. This result indicated that the activity recognition models with fewer layers could be advantageous in deriving the optimal weights for the target domain (i.e., excavator activities) when fine-tuning with a limited dataset. Consequently, this study identified the optimal depth of 3D ResNet that can maintain a reliable performance in dynamic and complex construction sites, even with a limited dataset. The proposed approach is expected to contribute to the development of decision-support systems capable of systematically managing enhanced safety, productivity improvement, and carbon neutrality in the construction industry.

Research for user recognition data based image display technology using store display (매장 디스플레이를 활용한 사용자 인지 정보 기반 영상 표출 기술 연구)

  • Hong, Sinyou;Yang, Seungyoun;Cha, JaeSang
    • Journal of Satellite, Information and Communications
    • /
    • v.12 no.1
    • /
    • pp.54-57
    • /
    • 2017
  • In commercial facilities, it is very difficult in real to promote specific products among various products because various products are displayed and sold. Therefore, in order to promote information on most display products, there is a great tendency to rely on printed matter or salesperson. So, it is focused to display advertising or promotional video based on interactive user information by utilizing the diplay used in the store, and to implement technology considering energy saving. Also it is very utilized for a specific product sold or displayed at a commercial facility to be sent to manufacturer or advertiser who wants to promote specific products and content at customer contacts. In this study, we implemented that maximized advertisement effect through the advertisement video and combination of product using transparent display. we implemented User-oriented image display technology to provide an interactive service according to whether there is a user or not. Through this study, we proposed a new direction to maximize advertising/public relations effect in various commercial facilities.

Eye Location Algorithm For Natural Video-Conferencing (화상 회의 인터페이스를 위한 눈 위치 검출)

  • Lee, Jae-Jun;Choi, Jung-Il;Lee, Phill-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.12
    • /
    • pp.3211-3218
    • /
    • 1997
  • This paper addresses an eye location algorithm which is essential process of human face tracking system for natural video-conferencing. In current video-conferencing systems, user's facial movements are restricted by fixed camera, therefore it is inconvenient to users. We Propose an eye location algorithm for automatic face tracking. Because, locations of other facial features guessed from locations of eye and scale of face in the image can be calculated using inter-ocular distance. Most previous feature extraction methods for face recognition system are approached under assumption that approximative face region or location of each facial feature is known. The proposed algorithm in this paper uses no prior information on the given image. It is not sensitive to backgrounds and lighting conditions. The proposed algorithm uses the valley representation as major information to locate eyes. The experiments have been performed for 213 frames of 17 people and show very encouraging results.

  • PDF

Growth and Decay of Alpha Tracks in a Large Scale Cloud Chamber after Injection of Radon

  • Wada, Shinichi;Kobayashi, Tsuneo;Katayama, Yoshiro;Iwami, Toshiaki;Kato, Tsuguhisa;Cameron, John R.
    • Proceedings of the Korean Society of Medical Physics Conference
    • /
    • 2002.09a
    • /
    • pp.275-278
    • /
    • 2002
  • The recognition of the natural background radiation is important not only for radiological education but also for the promotion of people's scientific view about radiation. We made a "room" on the web showing natural background radiation as part of a VRM (Virtual Radiation Museum). The "room" shows the video images of the tracks of charged particles from natural background radiation, alpha and beta ray track from known sources using a Large Scale Diffusion Cloud Chamber. The purpose of this study is to make clear the origin of a kind of track (named A-track) which is thick and easy to recognize with the length less than several cm in the cloud chamber, and to make numerical explanation of its counting rate. The study was carried out using a Large Scale Diffusion Cloud Chamber (Phywe, Germany) installed in the Niigata Science Museum. The Model RNC (Pylon Electronics, Canada) was used as Rn-222 source. Ra-226 activity in RNC was 111.6 Bq calibrated with NIST protocol. Rn-222 gas was injected into the cloud chamber. Continuous video recording with use of Digital Handycam (SONY, Japan) was carried out for 360 min. after injection of Rn-222 gas. The number of alpha-ray track (alpha track) in the video images was analyzed. The growth and decay curve of the total activity of Rn-222 and its alpha emitting progeny were calculated and compared with the count of the alpha tracks. As a result the alpha tracks formed by Rn-222 injection resemble A-Tracks. The relationship between A-track in the cloud chamber and atmospheric Rn is discussed.

  • PDF