• Title/Summary/Keyword: Visual task

Search Result 505, Processing Time 0.029 seconds

Image classification and captioning model considering a CAM-based disagreement loss

  • Yoon, Yeo Chan;Park, So Young;Park, Soo Myoung;Lim, Heuiseok
    • ETRI Journal
    • /
    • v.42 no.1
    • /
    • pp.67-77
    • /
    • 2020
  • Image captioning has received significant interest in recent years, and notable results have been achieved. Most previous approaches have focused on generating visual descriptions from images, whereas a few approaches have exploited visual descriptions for image classification. This study demonstrates that a good performance can be achieved for both description generation and image classification through an end-to-end joint learning approach with a loss function, which encourages each task to reach a consensus. When given images and visual descriptions, the proposed model learns a multimodal intermediate embedding, which can represent both the textual and visual characteristics of an object. The performance can be improved for both tasks by sharing the multimodal embedding. Through a novel loss function based on class activation mapping, which localizes the discriminative image region of a model, we achieve a higher score when the captioning and classification model reaches a consensus on the key parts of the object. Using the proposed model, we established a substantially improved performance for each task on the UCSD Birds and Oxford Flowers datasets.

Multiple Object-Based Design Model for Quality Improvement of User Interface (사용자 인터페이스 품질 향상을 위한 다중 객체 기반 설계 모델)

  • Kim Jeong-Ok;Lee Sang-Young
    • The KIPS Transactions:PartD
    • /
    • v.12D no.7 s.103
    • /
    • pp.957-964
    • /
    • 2005
  • According to rapid growth of web environment, user interface design needs to support the complex interactions between human and computer. In the paper we suggest the object modeling method for Qualify Improvement of User Interface. We propose the 4 business event's object modeling phases such as business event object modeling, task object modeling, transaction object modeling, and form object modeling to enhance visual cohesion of UI. As a result, this 4 phases in this paper allows us to enhance visual cohesion of User Interface prototype. We have found that the visual cohesion of business events become strong and unskilled designer can develope the qualified user interface prototype. And it also improves understanding of business task and reduces prototype system development iteration.

Evaluation of fatigue by Analysis of Relation between Subjective Rating Score and Working Performance with Color Temperature (주관평가와 작업수행도의 상관관계 분석에 의한 조명 색온도에서의 피로도 평가)

  • 양희경;고한우;김묘향;임석기;윤용현
    • Science of Emotion and Sensibility
    • /
    • v.4 no.2
    • /
    • pp.63-68
    • /
    • 2001
  • We purpose to evaluate the fatigue of the error correcting task on monitoring with color temperature 2700 K, 4000 K and 6500 K. Results of questionnaire on subjective feeling of visual fatigue, mental fatigue and concentration. Visual fatigue and mental fatigue level were the lowest, concentration level was the highest and working performance was the best at 2700 K. At 6500 K, mental fatigue level was the highest and concentration level is the lowest. At 4000 K, visual fatigue level was the highest and working performance shown the worst ratio of correcting answers. As results, color temperature 2700 K is the best condition of color temperature to perform the error correcting task on monitoring.

  • PDF

Visual Telephone System of Differential Task Interrupt Method (차등 태스크 인터럽트 방식의 영상단말 시스템)

  • 박배욱;정하재;오창석
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.5
    • /
    • pp.739-746
    • /
    • 2002
  • In this paper, a new visual telephone system which has a differential task interrupt transfer feature for real time video phone service is presented. Owing to the result of Interrupt transfer of different speed according to the time critical degree of tasks, the flow of audio and video data stream can be kept as constant speed in other word that means video phone services are carried out in real time. The ITU-T H.32x visual telephone recommendations are first analyzed, and the unsatisfactory items of existing systems are second inquired the cause, such as performance, quality. And then the design concept and ideas which enable it to solve them are third devised, the next, the new architecture of visual telephone system for real time video phone source are designed, which make it possible to solve the existing problems by means of different tasks interrupt transfer method.

Effects of Correlated Color Temperature of LED Light Sources and a Flourescent Light Source on Visual Performance (LED광원과 형광광원의 상관색온도가 시작업 성능에 미치는 영향)

  • Baik, Seung-Heon;Jeong, In-Young;Shin, Hwa-Young;Kim, Jeong-Tai
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.23 no.1
    • /
    • pp.18-26
    • /
    • 2009
  • Recently, from an environmental conservative point of view, the need of high-efficiency lighting system, using LED(Light Emitting Diode) light sources have been increased. However, applied LED light sources without regard to its color and pattern provide visual discomfort to occupant. The objective of this study is not only to evaluate the performance of task work under different correlated color temperature condition, but also furnish the preliminary data as concerning its purpose and user of inner space. For the purpose of this study, two types of LED light sources and a fluorescent light source were selected. Thirty undergraduate students served as the participants. Two different task work sheets were conducted to evaluate the accuracy and duration of time. Results from subjective performance, task work of error modification showed that LED light sources were 2.5[%] higher in accuracy with 17.1[%] lower duration than a fluorescent light source. In the case of reading task work, It is presented 20.6[%] decrease with the LED light sources comparison with a fluorescent light source.

Event-Related Potentials During the Visual Go/NoGo Task in Drug-Naive Boys with Attention-Deficit/Hyperactivity Disorder (약물 복용력이 없는 주의력결핍 과잉행동장애 남아의 시각적 Go/NoGo 과제 수행결과 및 수행시의 사건관련전위)

  • Kim, Kun-Woo;Lee, Jung-Sun;Park, Su-Bin;Hong, Jin-Pyo;Kim, Seong-Yoon;K.Yoo, Han-Ik
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.20 no.2
    • /
    • pp.61-67
    • /
    • 2009
  • Objectives: The purpose of this study was to examine the performance and electrophysiological characteristics of drug-naive children with attention-deficit/hyperactivity disorder(ADHD) during the Go/NoGo task. Methods: Twenty-three boys with ADHD and 18 age-matched normal boys were recruited at a child psychiatric outpatient clinic in Seoul. All subjects were assessed by the Kiddie Schedules for Affective Disorders and Schizophrenia Present and Lifetime version. The investigator also assessed all subjects using the ADHD Rating Scale-IV(ADHDRS). Event-related potentials were recorded from 8 scalp electrodes during the visual Go/NoGo task. Results: Children with ADHD showed a larger mean of standard deviation of response time during the Go/NoGo task than normal children. The temporal N200 and P300 amplitudes were larger in children with ADHD relative to controls. The parietal N200 and P300 latencies were more prolonged in children with ADHD compared to normal controls. Conclusion: These results suggest that psychotropic-naive children with ADHD may have more variable performance ability, more difficulty in discriminating visual stimuli, and slower information processing speed than their normal age-matched counterparts.

  • PDF

Improving visual relationship detection using linguistic and spatial cues

  • Jung, Jaewon;Park, Jongyoul
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.399-410
    • /
    • 2020
  • Detecting visual relationships in an image is important in an image understanding task. It enables higher image understanding tasks, that is, predicting the next scene and understanding what occurs in an image. A visual relationship comprises of a subject, a predicate, and an object, and is related to visual, language, and spatial cues. The predicate explains the relationship between the subject and object and can be categorized into different categories such as prepositions and verbs. A large visual gap exists although the visual relationship is included in the same predicate. This study improves upon a previous study (that uses language cues using two losses) and a spatial cue (that only includes individual information) by adding relative information on the subject and object of the extant study. The architectural limitation is demonstrated and is overcome to detect all zero-shot visual relationships. A new problem is discovered, and an explanation of how it decreases performance is provided. The experiment is conducted on the VRD and VG datasets and a significant improvement over previous results is obtained.

Survey of Visual Search Performance Models to Evaluate Accuracy and Speed of Visual Search Tasks

  • Kee, Dohyung
    • Journal of the Ergonomics Society of Korea
    • /
    • v.36 no.3
    • /
    • pp.255-265
    • /
    • 2017
  • Objective: This study aims to survey visual search performance models to assess and predict individual's visual tasks in everyday life and industrial sites. Background: Visual search is one of the most frequently performed and critical activities in everyday life and works. Visual search performance models are needed when designing or assessing the visual tasks. Method: This study was mainly based on survey of literatures related to ergonomics relevant journals and web surfing. In the survey, the keywords of visual search, visual search performance, visual search model, etc. were used. Results: On the basis of the purposes, developing methods and results of the models, this study categorized visual search performance models into six groups: probability-based models, SATO models, visual lobe-based models, computer vision models, neutral network-based models and detection time models. Major models by the categories were presented with their advantages and disadvantages. More models adopted the accuracy among two factors of accuracy and speed characterizing visual tasks as dependent variables. Conclusion: This study reviewed and summarized various visual search performance models. Application: The results would be used as a reference or tool when assessing the visual tasks.

A Development of Façade Dataset Construction Technology Using Deep Learning-based Automatic Image Labeling (딥러닝 기반 이미지 자동 레이블링을 활용한 건축물 파사드 데이터세트 구축 기술 개발)

  • Gu, Hyeong-Mo;Seo, Ji-Hyo;Choo, Seung-Yeon
    • Journal of the Architectural Institute of Korea Planning & Design
    • /
    • v.35 no.12
    • /
    • pp.43-53
    • /
    • 2019
  • The construction industry has made great strides in the past decades by utilizing computer programs including CAD. However, compared to other manufacturing sectors, labor productivity is low due to the high proportion of workers' knowledge-based task in addition to simple repetitive task. Therefore, the knowledge-based task efficiency of workers should be improved by recognizing the visual information of computers. A computer needs a lot of training data, such as the ImageNet project, to recognize visual information. This study, aim at proposing building facade datasets that is efficiently constructed by quickly collecting building facade data through portal site road view and automatically labeling using deep learning as part of construction of image dataset for visual recognition construction by the computer. As a method proposed in this study, we constructed a dataset for a part of Dongseong-ro, Daegu Metropolitan City and analyzed the utility and reliability of the dataset. Through this, it was confirmed that the computer could extract the significant facade information of the portal site road view by recognizing the visual information of the building facade image. Additionally, In contribution to verifying the feasibility of building construction image datasets. this study suggests the possibility of securing quantitative and qualitative facade design knowledge by extracting the facade design knowledge from any facade all over the world.

Visual servoing by a fuzzy reasoning method (퍼지추론에 의한 시각적 구동방법)

  • 김태원;서일홍;오상록
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1991.10a
    • /
    • pp.984-989
    • /
    • 1991
  • In this paper, a novel type of a visual servoing method is proposed for eye-in-hand robots by employing a self-organizing fuzzy controller. For this is there defined a new Jacobian riot to be the function of a relative position of the object but to be a function of the only image features. Instead of obtaining an analytic form of the proposed Jacobian, a self-organizing fuzzy controller is then proposed to alleviate difficulties in real-time implementation. To show the validities, the proposed method is applied to a 2-dimensional visual servoing task.

  • PDF