• Title/Summary/Keyword: Visual comprehension

Search Result 61, Processing Time 0.029 seconds

A Novel Two-Stage Training Method for Unbiased Scene Graph Generation via Distribution Alignment

  • Dongdong Jia;Meili Zhou;Wei WEI;Dong Wang;Zongwen Bai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3383-3397
    • /
    • 2023
  • Scene graphs serve as semantic abstractions of images and play a crucial role in enhancing visual comprehension and reasoning. However, the performance of Scene Graph Generation is often compromised when working with biased data in real-world situations. While many existing systems focus on a single stage of learning for both feature extraction and classification, some employ Class-Balancing strategies, such as Re-weighting, Data Resampling, and Transfer Learning from head to tail. In this paper, we propose a novel approach that decouples the feature extraction and classification phases of the scene graph generation process. For feature extraction, we leverage a transformer-based architecture and design an adaptive calibration function specifically for predicate classification. This function enables us to dynamically adjust the classification scores for each predicate category. Additionally, we introduce a Distribution Alignment technique that effectively balances the class distribution after the feature extraction phase reaches a stable state, thereby facilitating the retraining of the classification head. Importantly, our Distribution Alignment strategy is model-independent and does not require additional supervision, making it applicable to a wide range of SGG models. Using the scene graph diagnostic toolkit on Visual Genome and several popular models, we achieved significant improvements over the previous state-of-the-art methods with our model. Compared to the TDE model, our model improved mR@100 by 70.5% for PredCls, by 84.0% for SGCls, and by 97.6% for SGDet tasks.

Visualization of Linear Algebra concepts with Sage and GeoGebra (Sage와 GeoGebra를 이용한 선형대수학 개념의 Visual-Dynamic 자료 개발과 활용)

  • Lee, Sang-Gu;Jang, Ji-Eun;Kim, Kyung-Won
    • Communications of Mathematical Education
    • /
    • v.27 no.1
    • /
    • pp.1-17
    • /
    • 2013
  • This work started with recent students' conception on Linear Algebra. We were trying to help their understanding of Linear Algebra concepts by adding visualization tools. To accomplish this, we have developed most of needed tools for teaching of Linear Algebra class. Visualizing concepts of Linear Algebra is not only an aid for understanding but also arouses students' interest on the subject for a better comprehension, which further helps the students to play with them for self-discovery. Therefore, visualizing data should be prepared thoroughly rather than just merely understanding on static pictures as a special circumstance when we would study visual object. By doing this, we carefully selected GeoGebra which is suitable for dynamic visualizing and Sage for algebraic computations. We discovered that this combination is proper for visualizing to be embodied and gave a variety of visualizing data for undergraduate mathematics classes. We utilized GeoGebra and Sage for dynamic visualizing and tools used for algebraic calculation as creating a new kind of visual object for university math classes. We visualized important concepts of Linear Algebra as much as we can according to the order of the textbook. We offered static visual data for understanding and studied visual object and further prepared a circumstance that could create new knowledge. We found that our experience on visualizations in Linear Algebra using Sage and GeoGebra to our class can be effectively adopted to other university math classes. It is expected that this contribution has a positive effect for school math education as well as the other lectures in university.

A Study on the set-up of Time Range for Typology of Space Observation Characters (공간주시특성의 유형화를 위한 시간범위설정에 관한 연구)

  • Kim, Jong-Ha;Jung, Jae-Young
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.4
    • /
    • pp.87-95
    • /
    • 2012
  • This study is for the analysis to which element of space the users observing the lobby at a public space pay more attention for their visual perception. It is focused on the typology process of observation characters. The subjects, in the observation process, came to be interested in the circumstantial clues for space perception and the detailed characters drawing their interest. I could analyze the observation characters of the subjects observing the space by the comprehension and typology of their observation characters. First, from the viewpoint of successive 9 times of observations, each subject observed for 0.32 second to get the visual perception in the applied space, but spent another 0.39 second for the exploration of another observation object or the space roaming. The observation character of the subjects at the lobby of the public space selected for this experiment was that they spent more time on space exploration than on concentration on a point in the space. Second, I analyzed the typology process through the time range. Since the subjects' frequency varied depending on the way to set up the time range, the necessity was proposed that the time range for the analysis of observation characters should be set up more objectively. Third, in case of analyzing the observation characters by 10-second-unit time range, the concentration in the beginning and the middle was 25%, and that in the beginning and the final 41.7%, which showed that 75% of the subjects concentrated in the beginning of the observation time when the concentration in the beginning is added to it. Fourth, the type 3 categorized as "concentration in the beginning and the middle" is the group 47.1 percent of the subjects belong to, and each subject concentrated 1.1 times in the beginning and 2.1 times in the final, which showed that the concentration in the final was 1.75 times as high as that in the beginning.

  • PDF

Analysis on Video Image Effect in , China's Performing Arts Work of Cultural Tourism (중국의 문화관광 공연작품 <장한가>에 나타난 영상이미지 효과 분석)

  • Yook, Jung-Hak
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.6
    • /
    • pp.77-85
    • /
    • 2013
  • This study aims to analyze the effects that video image in Seo-an's , claiming to China's first gigantic historic dance drama, has on the performance; it focuses on investigating which video image is used to accomplish the effects in showing specific themes and materials in . Image is meant by 'reflection of object', such as movie, television, dictionary, etc, with its coverage being extensive. The root of a word, image', is founded on imitary, signifying specifically and mentally visual representation. In other words, video image is considered combination of two synonymous words, 'video' and 'image'. Video is not just comprehension of traditional art genre, like literary value, theatrical qualities, and artistry of scenario, but wholeness as product, integrating original functions of all kinds of art and connecting subtle image creation of human being. The effects of video image represented in are as followings; first, expressive effect of the connotative meaning, reflecting the spirit of the age and its culture. Second, imaginary identification. Third, transformation scene. Fourth, dramatic interest through immersion. Last but not least, visual effect by dint of dimension of performance.

Hybrid Learning for Vision-and-Language Navigation Agents (시각-언어 이동 에이전트를 위한 복합 학습)

  • Oh, Suntaek;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.281-290
    • /
    • 2020
  • The Vision-and-Language Navigation(VLN) task is a complex intelligence problem that requires both visual and language comprehension skills. In this paper, we propose a new learning model for visual-language navigation agents. The model adopts a hybrid learning that combines imitation learning based on demo data and reinforcement learning based on action reward. Therefore, this model can meet both problems of imitation learning that can be biased to the demo data and reinforcement learning with relatively low data efficiency. In addition, the proposed model uses a novel path-based reward function designed to solve the problem of existing goal-based reward functions. In this paper, we demonstrate the high performance of the proposed model through various experiments using both Matterport3D simulation environment and R2R benchmark dataset.

Reaction Test Platform and Application by Auditory and Visual Stimulus for Language Learning Ability Improvement (언어 학습 능력 향상을 위한 청각 및 시각 자극에 대한 반응속도 측정 플랫폼과 응용)

  • Lee, Hye-Ran;Beak, Seung-Hyun
    • Journal of Internet Computing and Services
    • /
    • v.11 no.1
    • /
    • pp.77-84
    • /
    • 2010
  • Children, who have a language disorder, have difficulty in expressing their reaction about stimulus of sound and vision. So it is very hard to grasp that they recognize external stimulus or not. For solving these problem, we can check response time and make them to choose stimulus by giving stimulus of sound and vision to them through Audio and Visual Stimulus and Reaction Meter System. Additionally, We can help them by improving response time by repeated study based on the results and making them to recognize and choose stimulus faster without aversion about external stimulus. It would make them not to feel uncomfortable and isolated because they are unfamiliar with external stimulus.

A Narrative Strategy of Storytelling Advertising Videos: Heineken's Case

  • Byun, Chan-Bok
    • Culinary science and hospitality research
    • /
    • v.22 no.1
    • /
    • pp.9-18
    • /
    • 2016
  • The purpose of this paper was to explore the narrative strategy of storytelling advertisement videos for a beer brand Heineken. Heineken was one of the most active advertisers who had made very impressive ad videos. The author selected five story driven advertisement videos which had been most frequently watched by Internet viewers. Those were "The Insider", "Odyssey 2011", "Heineken lip gross", "Italy Activation Milan AC vs. Real Madrid", "the Match". The five selected videos have 90 second running time. The target videos were repeatedly watched and the expected key image cuts and key verbal copies were captured as well. To categorize the narrative structure and key copies of each video, Fog, Budtz & Yakaboylu's four element model of storytelling and Gustav Freytag's three act structure or five stage model of a plot were exploited as underlying theories. Most of the ad videos had clear boundary between or among the stages of the plot and used emotional appeals including humor and sexual appeals. This paper found that the target videos used visual rhetorics to enhance the viewers' persuasion and comprehension. It also revealed that the target videos took advantage of football match as an emotional engagement to get ad viewers closely banded with Heineken.

The Role of Syntactic Cues in Pronoun Referential Resolution: The Effects of Number Cue and Gender Cue (대명사의 통사단서가 참조해결과정에 미치는 효과: 대명사의 수 단서와 성별 단서)

  • Lee Jae-Ho
    • Korean Journal of Cognitive Science
    • /
    • v.15 no.3
    • /
    • pp.25-33
    • /
    • 2004
  • Two experiments were conducted to investigate the effects of two syntactic cues in pronoun referential resolution: number cue (plural or singular) and gender cue (unambiguous or ambiguous). Using self-paced sentence reading task for pronoun sentences and lexical decision task for antecedents, Experiment 1 showed that the reading time of a plural pronoun ('they') was faster than a singular pronoun ('he' or 'she'), but the lexical decision time did not differ with a number cue and a Bender cue. In Experiment 2, using RSVP for pronoun sentences and lexical decision task for antecedents, the results showed that the lexical decision time differed for a gender cue only. These results suggested that the syntactic cues of a pronoun influenced strongly on referential resolution in discourse comprehension.

  • PDF

A Visual Study of the Phonemic Awareness (음소인지에 관한 시각적 연구)

  • Park, Heesuk
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.219-225
    • /
    • 2015
  • This experimental study aims at understanding the Korean subjects' phonemic awareness in the English minimal pairs. For the purpose of the experiment, English listening comprehension tests were designed using minimal pairs and conducted among subjects, and the results of the tests were analyzed with the help of spectrogram. From the results of this study, I could find out three important things: First, subjects have difficulty in understanding and distinguishing English vowel minimal pairs. Second, among the English vowel minimal pairs, they had much difficulty in distinguishing between /ə:/ and /ɔ:/. Third, subjects could recognize the semivowel /w/ in words without any difficulty. In addition to this, I tried to analyze the results using the spectrogram, which helps to educate students effectively.

A Study on the Symbolism of Costumes Appeared in Aflred Hitchcock′s Film (알프레드 히스콕의 스릴러 영활에 나타난 복식의 상징성)

  • 이효진;류근영
    • The Research Journal of the Costume Culture
    • /
    • v.9 no.2
    • /
    • pp.259-276
    • /
    • 2001
  • Hitchcock, "a master of the thriller" "leading figure of thriller", was famous for his work style. He never starts filming until completing a perfect conception in his mind before shooting. He started filming after getting a perfect filming plan adding a picture even one detailed shot picture. Also the movie costumes was not an exception in his movie. He put more main object to express a symbolic meanings by recreating movie costumes which were fit to drama′s subject than function which spoke for contemporary popularity also he guides actors to put on the movie costumes examined previously to shape a definite visual character in that works. This research intends not only to look into symbolic and expressive means of dresses through the movie costumes on thriller appeared in Hitchcock′s movie, but also the comprehension width by grasping importance of the movie-costume and dress image in the movie. Hitchock made about 55 movies, mostly thriller movies, from Number 13(1922) n unfinished work to family plot(1976) ones posthumous work. This research examined his the second half of term works(after 1950) such s Rear Window(1954), Vertigo(1958), Psycho(1960), Torn Curtain(19660, Topaz(1969) and Frenzy(1972) which are generally familiar to the public. In conclusion, we can find that the thriller movie costumes as well s the other movie and appeared a character of the film′s characters, social rank, economic level, personality. But expecially, a costume of Hitchcock′s thriller movie can be contained a characteristic factor of a kind of five.

  • PDF