• Title/Summary/Keyword: Captions

Search Result 64, Processing Time 0.024 seconds

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions (Bi-GRU 이미지 캡션의 서술 성능 향상을 위한 Parallel Injection 기법 연구)

  • Lee, Jun Hee;Lee, Soo Hwan;Tae, Soo Ho;Seo, Dong Hoan
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1223-1232
    • /
    • 2019
  • The injection is the input method of the image feature vector from the encoder to the decoder. Since the image feature vector contains object details such as color and texture, it is essential to generate image captions. However, the bidirectional decoder model using the existing injection method only inputs the image feature vector in the first step, so image feature vectors of the backward sequence are vanishing. This problem makes it difficult to describe the context in detail. Therefore, in this paper, we propose the parallel injection method to improve the description performance of image captions. The proposed Injection method fuses all embeddings and image vectors to preserve the context. Also, We optimize our image caption model with Bidirectional Gated Recurrent Unit (Bi-GRU) to reduce the amount of computation of the decoder. To validate the proposed model, experiments were conducted with a certified image caption dataset, demonstrating excellence in comparison with the latest models using BLEU and METEOR scores. The proposed model improved the BLEU score up to 20.2 points and the METEOR score up to 3.65 points compared to the existing caption model.

A Survey about Oral Hygiene Management Attitude among People with Hearing Impairments and Speech defect in an Area (일부 지역 청각·언어장애인의 구강위생관리행태에 관한 조사 연구)

  • Lee, Jung-hee;Park, Myung-Suk
    • Journal of dental hygiene science
    • /
    • v.10 no.4
    • /
    • pp.273-278
    • /
    • 2010
  • This study surveyed people with hearing impairments and speech defect living in an area of Gyeonggi-do using a self-administered questionnaire and interviewed them with a deaf interpreter's help during the period from January 21 to February 14, 2009 in order to investigate their oral health management attitude according to their general characteristics. The difference between males and females was statistically significant. The results of this study were as follows. 1. As to the number of times of tooth brushing according to gender, 62.5% of males and 34.2% of females brushed their teeth 2 times, and 25.0% and 35.4% 3 times. 2. As to experiences in scaling according to Academic qualification, those with high academic qualification had scaling more regularly 3. As to tooth brushing methods according to cohabitant, the rolling method was most common regardless of cohabitant. As to the number of times of tooth brushing according to residence type, 41.9% of those living with their parents brushed their teeth 3 times, and 69.2% of those living alone and 47.5% of married ones 2 times. 4. With regard to how to access oral hygiene management education, 81.1% of elementary school graduates used multimedia materials and sign language explanation and captions, 48.6% of middle school graduates used multimedia materials and sign language explanation and 14.3% multimedia materials and captions, 50.0% of high school graduates used multimedia materials and sign language explanation and 17.3% multimedia materials and captions. The parish for the buccal cavity hygiene managements of the hearing impairments speech defect and Development of an educational program are needed.

A Method for Reconstructing Original Images for Captions Areas in Videos Using Block Matching Algorithm (블록 정합을 이용한 비디오 자막 영역의 원 영상 복원 방법)

  • 전병태;이재연;배영래
    • Journal of Broadcast Engineering
    • /
    • v.5 no.1
    • /
    • pp.113-122
    • /
    • 2000
  • It is sometimes necessary to remove the captions and recover original images from video images already broadcast, When the number of images requiring such recovery is small, manual processing is possible, but as the number grows it would be very difficult to do it manually. Therefore, a method for recovering original image for the caption areas in needed. Traditional research on image restoration has focused on restoring blurred images to sharp images using frequency filtering or video coding for transferring video images. This paper proposes a method for automatically recovering original image using BMA(Block Matching Algorithm). We extract information on caption regions and scene change that is used as a prior-knowledge for recovering original image. From the result of caption information detection, we know the start and end frames of captions in video and the character areas in the caption regions. The direction for the recovery is decided using information on the scene change and caption region(the start and end frame for captions). According to the direction, we recover the original image by performing block matching for character components in extracted caption region. Experimental results show that the case of stationary images with little camera or object motion is well recovered. We see that the case of images with motion in complex background is also recovered.

  • PDF

Caption Detection and Recognition for Video Image Information Retrieval (비디오 영상 정보 검색을 위한 문자 추출 및 인식)

  • 구건서
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.7
    • /
    • pp.901-914
    • /
    • 2002
  • In this paper, We propose an efficient automatic caption detection and location method, caption recognition using FE-MCBP(Feature Extraction based Multichained BackPropagation) neural network for content based retrieval of video. Frames are selected at fixed time interval from video and key frames are selected by gray scale histogram method. for each key frames, segmentation is performed and caption lines are detected using line scan method. lastly each characters are separated. This research improves speed and efficiency by color segmentation using local maximum analysis method before line scanning. Caption detection is a first stage of multimedia database organization and detected captions are used as input of text recognition system. Recognized captions can be searched by content based retrieval method.

  • PDF

A Method for Character Segmentation using MST(Minimum Spanning Tree) (MST를 이용한 문자 영역 분할 방법)

  • Chun, Byung-Tae;Kim, Young-In
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.3
    • /
    • pp.73-78
    • /
    • 2006
  • Conventional caption extraction methods use the difference between frames or color segmentation methods from the whole image. Because these methods depend heavily on heuristics, we should have a priori knowledge of the captions to be extracted. Also they are difficult to implement. In this paper, we propose a method that uses little heuristic and simplified algorithm. We use topographical features of characters to extract the character points and use MST(Minimum Spanning Tree) to extract the candidate regions for captions. Character regions are determined by testing several conditions and verifying those candidate regions. Experimental results show that the candidate region extraction rate is 100%, and the character region extraction rate is 98.2%. And then we can see the results that caption area in complex images is well extracted.

  • PDF

Context-Awareness Cat Behavior Captioning System (반려묘의 상황인지형 행동 캡셔닝 시스템)

  • Chae, Heechan;Choi, Yoona;Lee, Jonguk;Park, Daihee;Chung, Yongwha
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.21-29
    • /
    • 2021
  • With the recent increase in the number of households raising pets, various engineering studies have been underway for pets. The final purpose of this study is to automatically generate situation-sensitive captions that can express implicit intentions based on the behavior and sound of cats by embedding the already mature behavioral detection technology of pets as basic element technology in the video capturing research. As a pilot project to this end, this paper proposes a high-level capturing system using optical-flow, RGB, and sound information of cat videos. That is, the proposed system uses video datasets collected in an actual breeding environment to extract feature vectors from the video and sound, then through hierarchical LSTM encoder and decoder, to identify the cat's behavior and its implicit intentions, and to perform learning to create context-sensitive captions. The performance of the proposed system was verified experimentally by utilizing video data collected in the environment where actual cats are raised.

Decomposition of Fe-EDTA in Nuclear Waste Water by using Underwater discharge Plasma

  • Kim, Jin-Kil;Lee, Han-Yong;Kang, Duk-Won;Uhm, Han-Sup
    • Proceedings of the Korean Radioactive Waste Society Conference
    • /
    • 2004.06a
    • /
    • pp.336-336
    • /
    • 2004
  • EDTA contained in decontamination wastes can cause complexation of radioactive captions resulting from its various treatment process such as chemical precipitation, and ion exchange etc. It might also import for elevated teachability and higher mobility of cationic contaminants from conditioned wastes such as waste immobilized in cement or other matrices. Therefore, various cheated or unchlelated EDTAS must be treated to environmentally safe materials.(omitted)

  • PDF

Generate Korean image captions using LSTM (LSTM을 이용한 한국어 이미지 캡션 생성)

  • Park, Seong-Jae;Cha, Jeong-Won
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.82-84
    • /
    • 2017
  • 본 논문에서는 한국어 이미지 캡션을 학습하기 위한 데이터를 작성하고 딥러닝을 통해 예측하는 모델을 제안한다. 한국어 데이터 생성을 위해 MS COCO 영어 캡션을 번역하여 한국어로 변환하고 수정하였다. 이미지 캡션 생성을 위한 모델은 CNN을 이용하여 이미지를 512차원의 자질로 인코딩한다. 인코딩된 자질을 LSTM의 입력으로 사용하여 캡션을 생성하였다. 생성된 한국어 MS COCO 데이터에 대해 어절 단위, 형태소 단위, 의미형태소 단위 실험을 진행하였고 그 중 가장 높은 성능을 보인 형태소 단위 모델을 영어 모델과 비교하여 영어 모델과 비슷한 성능을 얻음을 증명하였다.

  • PDF

Caption Detection Algorithm Using Temporal Information in Video (동영상에서 시간 영역 정보를 이용한 자막 검출 알고리듬)

  • 권철현;신청호;김수연;박상희
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.8
    • /
    • pp.606-610
    • /
    • 2004
  • A noble caption text detection and recognition algorithm using the temporal nature of video is proposed in this paper. A text registration technique is used to locate the temporal and spatial positions of captions in video from the accumulated frame difference information. Experimental results show that the proposed method is effective and robust. Also, a high processing speed is achieved since no time consuming operation is included.

A Study on Safety Information Provision for Workers using Virtual Reality-based Construction Site (Virtual Reality 기반의 가상 공사현장 구축을 통한 작업자 안전정보 제공 방안)

  • Park, Junwon;Lee, Sang-Ho;Kim, Sung-Hoon;Won, Jeong-Hun;Yoon, Young-Cheol
    • Journal of the Korean Society of Safety
    • /
    • v.35 no.1
    • /
    • pp.45-52
    • /
    • 2020
  • Construction industry has a relatively higher industrial accident rate than other domestic industries. Thus, to reduce the accident rate, researches on the methodology of worker's safety education combined with new technologies like IT technology have increased. In the light of workers' safety information provision, this study develops a VR(Virtual Reality)-based construction site using the BIM(Building Information Modeling) data. The target structures and geographical features are included in the VR-based construction site where the construction machinery model and worker model are also created using a game engine. For highly effective provision of safety information, video clips with suitable captions corresponding to working processes were made with proper screen directing. They should be appropriately connected to correct worker's operations to improve the work commitment level, sense of reality and inducement of interest. From this scenario, the 3D VR-based construction site, which can be experienced through a VR equipment, was created and in the same platform, the safety information was provided by the video clip combined with the suitable captions. Although the real construction site involves various requirements depending on field situation and the expertness and experience of workers are not consistent, the developed safety information provision based on the VR construction site is expected to effectively reduce the incomplete factors leading to construction accidents by improving the worker's perception of workplace safety.