• Title/Summary/Keyword: attention mechanism

Search Result 794, Processing Time 0.019 seconds

Research on Pairwise Attention Reinforcement Model Using Feature Matching (특징 매칭을 이용한 페어와이즈 어텐션 강화 모델에 대한 연구)

  • Joon-Shik Lim;Yeong-Seok Ju
    • Journal of IKEEE
    • /
    • v.28 no.3
    • /
    • pp.390-396
    • /
    • 2024
  • Vision Transformer (ViT) learns relationships between patches, but it may overlook important features such as color, texture, and boundaries, which can result in performance limitations in fields like medical imaging or facial recognition. To address this issue, this study proposes the Pairwise Attention Reinforcement (PAR) model. The PAR model takes both the training image and a reference image as input into the encoder, calculates the similarity between the two images, and matches the attention score maps of images with high similarity, reinforcing the matching areas of the training image. This process emphasizes important features between images and allows even subtle differences to be distinguished. In experiments using clock-drawing test data, the PAR model achieved a Precision of 0.9516, Recall of 0.8883, F1-Score of 0.9166, and an Accuracy of 92.93%. The proposed model showed a 12% performance improvement compared to API-Net, which uses the pairwise attention approach, and demonstrated a 2% performance improvement over the ViT model.

Region of Interest Detection Based on Visual Attention and Threshold Segmentation in High Spatial Resolution Remote Sensing Images

  • Zhang, Libao;Li, Hao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.8
    • /
    • pp.1843-1859
    • /
    • 2013
  • The continuous increase of the spatial resolution of remote sensing images brings great challenge to image analysis and processing. Traditional prior knowledge-based region detection and target recognition algorithms for processing high resolution remote sensing images generally employ a global searching solution, which results in prohibitive computational complexity. In this paper, a more efficient region of interest (ROI) detection algorithm based on visual attention and threshold segmentation (VA-TS) is proposed, wherein a visual attention mechanism is used to eliminate image segmentation and feature detection to the entire image. The input image is subsampled to decrease the amount of data and the discrete moment transform (DMT) feature is extracted to provide a finer description of the edges. The feature maps are combined with weights according to the amount of the "strong points" and the "salient points". A threshold segmentation strategy is employed to obtain more accurate region of interest shape information with the very low computational complexity. Experimental statistics have shown that the proposed algorithm is computational efficient and provide more visually accurate detection results. The calculation time is only about 0.7% of the traditional Itti's model.

MLSE-Net: Multi-level Semantic Enriched Network for Medical Image Segmentation

  • Di Gai;Heng Luo;Jing He;Pengxiang Su;Zheng Huang;Song Zhang;Zhijun Tu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2458-2482
    • /
    • 2023
  • Medical image segmentation techniques based on convolution neural networks indulge in feature extraction triggering redundancy of parameters and unsatisfactory target localization, which outcomes in less accurate segmentation results to assist doctors in diagnosis. In this paper, we propose a multi-level semantic-rich encoding-decoding network, which consists of a Pooling-Conv-Former (PCFormer) module and a Cbam-Dilated-Transformer (CDT) module. In the PCFormer module, it is used to tackle the issue of parameter explosion in the conservative transformer and to compensate for the feature loss in the down-sampling process. In the CDT module, the Cbam attention module is adopted to highlight the feature regions by blending the intersection of attention mechanisms implicitly, and the Dilated convolution-Concat (DCC) module is designed as a parallel concatenation of multiple atrous convolution blocks to display the expanded perceptual field explicitly. In addition, MultiHead Attention-DwConv-Transformer (MDTransformer) module is utilized to evidently distinguish the target region from the background region. Extensive experiments on medical image segmentation from Glas, SIIM-ACR, ISIC and LGG demonstrated that our proposed network outperforms existing advanced methods in terms of both objective evaluation and subjective visual performance.

CAttNet: A Compound Attention Network for Depth Estimation of Light Field Images

  • Dingkang Hua;Qian Zhang;Wan Liao;Bin Wang;Tao Yan
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.483-497
    • /
    • 2023
  • Depth estimation is one of the most complicated and difficult problems to deal with in the light field. In this paper, a compound attention convolutional neural network (CAttNet) is proposed to extract depth maps from light field images. To make more effective use of the sub-aperture images (SAIs) of light field and reduce the redundancy in SAIs, we use a compound attention mechanism to weigh the channel and space of the feature map after extracting the primary features, so it can more efficiently select the required view and the important area within the view. We modified various layers of feature extraction to make it more efficient and useful to extract features without adding parameters. By exploring the characteristics of light field, we increased the network depth and optimized the network structure to reduce the adverse impact of this change. CAttNet can efficiently utilize different SAIs correlations and features to generate a high-quality light field depth map. The experimental results show that CAttNet has advantages in both accuracy and time.

Attention-based deep learning framework for skin lesion segmentation (피부 병변 분할을 위한 어텐션 기반 딥러닝 프레임워크)

  • Afnan Ghafoor;Bumshik Lee
    • Smart Media Journal
    • /
    • v.13 no.3
    • /
    • pp.53-61
    • /
    • 2024
  • This paper presents a novel M-shaped encoder-decoder architecture for skin lesion segmentation, achieving better performance than existing approaches. The proposed architecture utilizes the left and right legs to enable multi-scale feature extraction and is further enhanced by integrating an attention module within the skip connection. The image is partitioned into four distinct patches, facilitating enhanced processing within the encoder-decoder framework. A pivotal aspect of the proposed method is to focus more on critical image features through an attention mechanism, leading to refined segmentation. Experimental results highlight the effectiveness of the proposed approach, demonstrating superior accuracy, precision, and Jaccard Index compared to existing methods

The Effect of Consistency between Represented Location of the Cue and the Target on Attention Mechanism (단서자극과 표적자극의 표상된 위치의 일치성이 주의기제의 작용에 미치는 영향)

  • Seo, Jun-Ho;Li, Hyung-Chul O.
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.4
    • /
    • pp.481-506
    • /
    • 2009
  • The purpose of the present research was to examine whether the attention mechanism employs physical or represented location of the cue and target. To achieve this, we have employed the paradigm of facilitation of response as well as inhibition of return. In the experiments, valid and invalid conditions were defined by the position consistency of the cue and the target in the aspect of either physical or represented location. We used auditory cue and visual target in Experiment 1 while visual cue and auditory target in Experiment 2. As a results, in Experiment 1, effect of facilitation of response in valid condition was found when the valid/invalid conditions were defined in the aspect of represented location. In Experiment 2, effect of facilitation of response in valid condition was found when the valid/invalid conditions were defined in the aspect of represented location. In all the other conditions, no effect was found when the conditions were defined in the aspect of physical location. No effects of inhibition of return were found in Experiment 2. These results imply the possibility that attention mechanism operates based on objects' represented location rather than on their physical location. More importantly, the present research suggests that it is necessary to separate represented location from physical location of the target and the cue in the experiment of facilitation of response and inhibition of return in the future.

  • PDF

An end-to-end synthesis method for Korean text-to-speech systems (한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구)

  • Choi, Yeunju;Jung, Youngmoon;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.

Mechanism and Application Methodology of Mental Practice (정신 연습의 기전과 적용 방법)

  • Kim Jong-soon;Lee Keun-heui;Bae Sung-soo
    • The Journal of Korean Physical Therapy
    • /
    • v.15 no.2
    • /
    • pp.75-84
    • /
    • 2003
  • The purpose of this study was to review of mechanism and application methodology about mental practice. The mental practice is symbolic rehearsal of physical activity in the absence of any gross muscular movements. Human have the ability to generate mental correlates of perceptual and motor events without any triggering external stimulus, a function known as imagery, Practice produces both internal and external sensory consequences which are thought to be essential for learning to occur, It is for this reason that mental practice, rehearsal of skill in imagination rather than by overt physical activity, has intrigued theorists, especially those interested in cognitive process. Several studies in sport psychology have shown that mental practice can be effective in optimizing the execution of movements in athletes and help novice learner in the incremental acquisition of new skilled behaviors. There are many theories of mental practice for explaining the positive effect In skill learning and performance. Most tenable theories are symbolic learning theory, psyconeuromuscular theory, Paivio's theory, regional cerebral blood flow theory, motivation theory, modeling theory, mental and muscle movement nodes theory, insight theory, selective attention theory, and attention-arousal set theory etc.. The factors for influencing to effects of mental practice are application form, application period, time for length of the mental practice, number of repetition, existence of physical practice.

  • PDF

A Knowledge-Based Machine Vision System for Automated Industrial Web Inspection

  • Cho, Tai-Hoon;Jung, Young-Kee;Cho, Hyun-Chan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.1 no.1
    • /
    • pp.13-23
    • /
    • 2001
  • Most current machine vision systems for industrial inspection were developed with one specific task in mind. Hence, these systems are inflexible in the sense that they cannot easily be adapted to other applications. In this paper, a general vision system framework has been developed that can be easily adapted to a variety of industrial web inspection problems. The objective of this system is to automatically locate and identify \\\"defects\\\" on the surface of the material being inspected. This framework is designed to be robust, to be flexible, and to be as computationally simple as possible. To assure robustness this framework employs a combined strategy of top-down and bottom-up control, hierarchical defect models, and uncertain reasoning methods. To make this framework flexible, a modular Blackboard framework is employed. To minimize computational complexity the system incorporates a simple multi-thresholding segmentation scheme, a fuzzy logic focus of attention mechanism for scene analysis operations, and a partitioning if knowledge that allows concurrent parallel processing during recognition.cognition.

  • PDF

A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

  • Kim, Hee sook;Lee, Min Hi
    • International Journal of Advanced Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.83-89
    • /
    • 2017
  • In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.