• Title/Summary/Keyword: Scene text

Search Result 117, Processing Time 0.024 seconds

Design of a Online Digital Storytelling System Making a Cartoon Sketch into a Motion Picture (만화적 스케치의 동영상화를 이용한 온라인 디지털 스토리텔링 시스템 설계)

  • 남양희;이상곤
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.4
    • /
    • pp.434-440
    • /
    • 2002
  • This paper proposes a digital storytelling system that employs a method of making users simple online sketch being animated in 3D. To help users focus on their story development, our proposed system gives sketch guidelines providing 3D scene structure, and publishes final results as animated scenes with story text across time. This system is based on the web and the authoring process can also be shared with others.

  • PDF

Identification of Korea Traditional Color Harmony (비디오에서 프로젝션을 이용한 문자 인식)

  • Baek, Jeong-Uk;Shin, Seong-Yoon;Rhee, Yang-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.196-197
    • /
    • 2009
  • In Video, key frame generated from the scene change detection is to perform character recognition through the projections. The separation between the text are separated by a vertical projection. Phoneme is separated Cho-sung, Jung-sung, and Jong-sung and is divided 6 types. Phoneme pattern is separated to suitable 6 types through the horizontal projection. Phoneme are separated horizontal, vertical, diagonal, reverse-diagonal direction. Phoneme is recognized using the 4-direction projection and location information.

  • PDF

A Comparison of Deep Neural Network based Scene Text Detection with YOLO and EAST (이미지 속 문자열 탐지에 대한 YOLO와 EAST 신경망의 성능 비교)

  • Park, Chan-Yong;Lee, Gyu-Hyun;Lim, Young-Min;Jeong, Seung-Dae;Cho, Young-Heuk;Kim, Jin-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.422-425
    • /
    • 2021
  • 본 논문에서는 최근 다양한 분야에서 많이 활용되고 있는 YOLO와 EAST 신경망을 이미지 속 문자열 탐지문제에 적용해보고 이들의 성능을 비교분석 해 보았다. YOLO 신경망은 v3 이전 모델까지는 이미지 속 문자영역 탐지에 낮은 성능을 보인다고 알려졌으나, 최근 출시된 YOLOv4와 YOLOv5의 경우 다양한 형태의 이미지 속에 있는 한글과 영문 문자열 탐지에 뛰어난 성능을 보여줌을 확인하고 향후 문자 인식 분야에서 많이 활용될 것으로 기대된다.

A Study on the OCR of Korean Sentence Using DeepLearning (딥러닝을 활용한 한글문장 OCR연구)

  • Park, Sun-Woo
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.470-474
    • /
    • 2019
  • 한글 OCR 성능을 높이기 위해 딥러닝 모델을 활용하여 문자인식 부분을 개선하고자 하였다. 본 논문에서는 폰트와 사전데이터를 사용해 딥러닝 모델 학습을 위한 한글 문장 이미지 데이터를 직접 생성해보고 이를 활용해서 한글 문장의 OCR 성능을 높일 다양한 모델 조합들에 대한 실험을 진행했다. 딥러닝 모델은 STR(Scene Text Recognition) 구조를 사용해 변환, 추출, 시퀀스, 예측 모듈 각 24가지 모델 조합을 구성했다. 딥러닝 모델을 활용한 OCR 실험 결과 한글 문장에 적합한 모델조합은 변환 모듈을 사용하고 시퀀스와 예측 모듈에는 BiLSTM과 어텐션을 사용한 모델조합이 다른 모델 조합에 비해 높은 성능을 보였다. 해당 논문에서는 이전 한글 OCR 연구와 비교해 적용 범위를 글자 단위에서 문장 단위로 확장하였고 실제 문서 이미지에서 자주 발견되는 유형의 데이터를 사용해 애플리케이션 적용 가능성을 높이고자 한 부분에 의의가 있다.

  • PDF

FMM: Fusion media middleware for actual feeling service (실감 서비스 제공을 위한 융합 미디어 미들웨어)

  • Lee, Ji-Hye;Yoon, Yong-Ik
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.2
    • /
    • pp.308-315
    • /
    • 2010
  • User Generated contents(UGC) interchange with internet users actively in Web2.0 environment. According to growth of content sharing site, the number of non-expert's contents increased. But non-expert's contents have a simple media just recorded. For providing actual feeling like effects and actions to non-expert's contents, we suggest Fusion Media Middleware(FMM). The FMM can increase user satisfaction by providing actual feeling. Furthermore, The content changes advanced media that has emotional impression. The FMM for providing actual feeling classify the inputted media as a scene based on MPEG-7. The FMM provide an actual feeling to simple media by inserting effects like a sound, image and text among the classified media. Using the BSD code of MPEG-21, the FMM can link up with inputted media and effects. Through the mapping BSD code the FMM control synchronization between media and effects. In this paper, Using the Fusion Media Middleware, the non-expert's contents express value as multimedia that has an actual feeling. Futhermore, the FMM creates flow of new media circulation.

Study on Extracting Filming Location Information in Movies Using OCR for Developing Customized Travel Content (맞춤형 여행 콘텐츠 개발을 위한 OCR 기법을 활용한 영화 속 촬영지 정보 추출 방안 제시)

  • Park, Eunbi;Shin, Yubin;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.29-39
    • /
    • 2020
  • Purpose The atmosphere of respect for individual tastes that have spread throughout society has changed the consumption trend. As a result, the travel industry is also seeing customized travel as a new trend that reflects consumers' personal tastes. In particular, there is a growing interest in 'film-induced tourism', one of the areas of travel industry. We hope to satisfy the individual's motivation for traveling while watching movies with customized travel proposals, which we expect to be a catalyst for the continued development of the 'film-induced tourism industry'. Design/methodology/approach In this study, we implemented a methodology through 'OCR' of extracting and suggesting film location information that viewers want to visit. First, we extract a scene from a movie selected by a user by using 'OpenCV', a real-time image processing library. In addition, we detected the location of characters in the scene image by using 'EAST model', a deep learning-based text area detection model. The detected images are preprocessed by using 'OpenCV built-in function' to increase recognition accuracy. Finally, after converting characters in images into recognizable text using 'Tesseract', an optical character recognition engine, the 'Google Map API' returns actual location information. Significance This research is significant in that it provides personalized tourism content using fourth industrial technology, in addition to existing film tourism. This could be used in the development of film-induced tourism packages with travel agencies in the future. It also implies the possibility of being used for inflow from abroad as well as to abroad.

Automatic Text Extraction from News Video using Morphology and Text Shape (형태학과 문자의 모양을 이용한 뉴스 비디오에서의 자동 문자 추출)

  • Jang, In-Young;Ko, Byoung-Chul;Kim, Kil-Cheon;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.4
    • /
    • pp.479-488
    • /
    • 2002
  • In recent years the amount of digital video used has risen dramatically to keep pace with the increasing use of the Internet and consequently an automated method is needed for indexing digital video databases. Textual information, both superimposed and embedded scene texts, appearing in a digital video can be a crucial clue for helping the video indexing. In this paper, a new method is presented to extract both superimposed and embedded scene texts in a freeze-frame of news video. The algorithm is summarized in the following three steps. For the first step, a color image is converted into a gray-level image and applies contrast stretching to enhance the contrast of the input image. Then, a modified local adaptive thresholding is applied to the contrast-stretched image. The second step is divided into three processes: eliminating text-like components by applying erosion, dilation, and (OpenClose+CloseOpen)/2 morphological operations, maintaining text components using (OpenClose+CloseOpen)/2 operation with a new Geo-correction method, and subtracting two result images for eliminating false-positive components further. In the third filtering step, the characteristics of each component such as the ratio of the number of pixels in each candidate component to the number of its boundary pixels and the ratio of the minor to the major axis of each bounding box are used. Acceptable results have been obtained using the proposed method on 300 news images with a recognition rate of 93.6%. Also, my method indicates a good performance on all the various kinds of images by adjusting the size of the structuring element.

The Development of Real-time Video Associated Data Service System for T-DMB (T-DMB 실시간 비디오 부가데이터 서비스 시스템 개발)

  • Kim Sang-Hun;Kwak Chun-Sub;Kim Man-Sik
    • Journal of Broadcast Engineering
    • /
    • v.10 no.4 s.29
    • /
    • pp.474-487
    • /
    • 2005
  • T-DMB (Terrestrial-Digital Multimedia Broadcasting) adopted MPEG-4 BIFS (Binary Format for Scene) Core2D scene description profile and graphics profile as the standard of video associated data service. By using BIFS, we can support to overlay objects, i.e. text, stationary image, circle, polygon, etc., on the main display of receiving end according to the properties designated in broadcasting side and to make clickable buttons and website links on desired objects. Therefore, a variety of interactive data services can be served by BIFS. In this paper, we implement real-time video associated data service system far T-DMB. Our developing system places emphasis on real-time data service by user operation and on inter-working and stability with our previously developed video encoder. Our system consists of BIFS Real-time System, Automatic Stream Control System and Receiving Monitoring System. Basic functions of our system are designed to reflect T-DMB programs and characteristics of program production environment as a top priority. Our developed system was used in BIFS trial service via KBS T-DMB, it is supposed to be used in T-DMB main service after improvement process such as intensifying system stability.

An Semiotic analysis on Spirited Away (애니메이션(센과 치히로의 행방불명)에 대한 기호학적분석)

  • Lee Yun-Hui
    • Broadcasting and Media Magazine
    • /
    • v.10 no.1
    • /
    • pp.99-112
    • /
    • 2005
  • Christian Metz, the precursor of cine-semiology, considered cinema as a language in the sense that it is a set of messages grounded in a given matter of expression, and a signifying practice characterized by specific codifications. According to Metz, film forms a structured network produced by the interweaving of cinematic codes, within which cinematic subcodes represent specific usages of the particular code. For Metz, cinematic language is a totality of cinematic codes and subcodes, and history of the cinema is the trace of the competition, incorporations and exclusions of the subcodes. He also suggested a filmic text is not just a list of codes in effect, but a process of constant displacement and deformation of codes. Following Metz' textual analysis methodology, I investigated the formal configuration of Hayao Miyazaki‘s animation, Spirited Away. It is interesting to trace the interweaving of cinematic codes in Spirited Away, i.e. codes of lighting, color, movement, and auteurism, across the animation. I focused on the first scene at the bridge to Yubaba's bathhouse, analyzing each cinematic code and its subcode applied. The first bridge scene is carefully constructed to stand out the confrontation of Chihiro (with Haku) and the bathhouse. The bathhouse is not just a building, it represents the powerful witch, Yubaba, yet to appear on the scene, and functions as an antipode to Chihiro. In each shot, every subcode within the codes of framing, direction, angle, color, lighting and movement is used to maximize the contrast between the dominant bathhouse and the feeble 10-year-old girl. In Spirited Away, the subcodes within each cinematic ode are constantly competing and displacing each other to augment the antithesis between the characters and develop the narrative. As Metz's argument that film constitutes a quasi-linguistic practice as a pluricodic medium, Spirited Away communicates with the spectators with the combination and displacement of these cinematic codes and subcodes.

A Study on Extraction of text region using shape analysis of text in natural scene image (자연영상에서 문자의 형태 분석을 이용한 문자영역 추출에 관한 연구)

  • Yang, Jae-Ho;Han, Hyun-Ho;Kim, Ki-Bong;Lee, Sang-Hun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.11
    • /
    • pp.61-68
    • /
    • 2018
  • In this paper, we propose a method of character detection by analyzing image enhancement and character type to detect characters in natural images that can be acquired in everyday life. The proposed method emphasizes the boundaries of the object part using the unsharp mask in order to improve the detection rate of the area to be recognized as a character in a natural image. By using the boundary of the enhanced object, the character candidate region of the image is detected using Maximal Stable Extermal Regions (MSER). In order to detect the region to be judged as a real character in the detected character candidate region, the shape of each region is analyzed and the non-character region other than the region having the character characteristic is removed to increase the detection rate of the actual character region. In order to compare the objective test of this paper, we compare the detection rate and the accuracy of the character region with the existing methods. Experimental results show that the proposed method improves the detection rate and accuracy of the character region over the existing character detection method.