• Title/Summary/Keyword: 어텐션

Search Result 121, Processing Time 0.022 seconds

Application and Evaluation of the Attention U-Net Using UAV Imagery for Corn Cultivation Field Extraction (무인기 영상 기반 옥수수 재배필지 추출을 위한 Attention U-NET 적용 및 평가)

  • Shin, Hyoung Sub;Song, Seok Ho;Lee, Dong Ho;Park, Jong Hwa
    • Ecology and Resilient Infrastructure
    • /
    • v.8 no.4
    • /
    • pp.253-265
    • /
    • 2021
  • In this study, crop cultivation filed was extracted by using Unmanned Aerial Vehicle (UAV) imagery and deep learning models to overcome the limitations of satellite imagery and to contribute to the technological development of understanding the status of crop cultivation field. The study area was set around Chungbuk Goesan-gun Gammul-myeon Yidam-li and orthogonal images of the area were acquired by using UAV images. In addition, study data for deep learning models was collected by using Farm Map that modified by fieldwork. The Attention U-Net was used as a deep learning model to extract feature of UAV in this study. After the model learning process, the performance evaluation of the model for corn cultivation extraction was performed using non-learning data. We present the model's performance using precision, recall, and F1-score; the metrics show 0.94, 0.96, and 0.92, respectively. This study proved that the method is an effective methodology of extracting corn cultivation field, also presented the potential applicability for other crops.

A Study on Utilization of Vision Transformer for CTR Prediction (CTR 예측을 위한 비전 트랜스포머 활용에 관한 연구)

  • Kim, Tae-Suk;Kim, Seokhun;Im, Kwang Hyuk
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.27-40
    • /
    • 2021
  • Click-Through Rate (CTR) prediction is a key function that determines the ranking of candidate items in the recommendation system and recommends high-ranking items to reduce customer information overload and achieve profit maximization through sales promotion. The fields of natural language processing and image classification are achieving remarkable growth through the use of deep neural networks. Recently, a transformer model based on an attention mechanism, differentiated from the mainstream models in the fields of natural language processing and image classification, has been proposed to achieve state-of-the-art in this field. In this study, we present a method for improving the performance of a transformer model for CTR prediction. In order to analyze the effect of discrete and categorical CTR data characteristics different from natural language and image data on performance, experiments on embedding regularization and transformer normalization are performed. According to the experimental results, it was confirmed that the prediction performance of the transformer was significantly improved when the L2 generalization was applied in the embedding process for CTR data input processing and when batch normalization was applied instead of layer normalization, which is the default regularization method, to the transformer model.

Contactless User Identification System using Multi-channel Palm Images Facilitated by Triple Attention U-Net and CNN Classifier Ensemble Models

  • Kim, Inki;Kim, Beomjun;Woo, Sunghee;Gwak, Jeonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.33-43
    • /
    • 2022
  • In this paper, we propose an ensemble model facilitated by multi-channel palm images with attention U-Net models and pretrained convolutional neural networks (CNNs) for establishing a contactless palm-based user identification system using conventional inexpensive camera sensors. Attention U-Net models are used to extract the areas of interest including hands (i.e., with fingers), palms (i.e., without fingers) and palm lines, which are combined to generate three channels being ped into the ensemble classifier. Then, the proposed palm information-based user identification system predicts the class using the classifier ensemble with three outperforming pre-trained CNN models. The proposed model demonstrates that the proposed model could achieve the classification accuracy, precision, recall, F1-score of 98.60%, 98.61%, 98.61%, 98.61% respectively, which indicate that the proposed model is effective even though we are using very cheap and inexpensive image sensors. We believe that in this COVID-19 pandemic circumstances, the proposed palm-based contactless user identification system can be an alternative, with high safety and reliability, compared with currently overwhelming contact-based systems.

A Thoracic Spine Segmentation Technique for Automatic Extraction of VHS and Cobb Angle from X-ray Images (X-ray 영상에서 VHS와 콥 각도 자동 추출을 위한 흉추 분할 기법)

  • Ye-Eun, Lee;Seung-Hwa, Han;Dong-Gyu, Lee;Ho-Joon, Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.51-58
    • /
    • 2023
  • In this paper, we propose an organ segmentation technique for the automatic extraction of medical diagnostic indicators from X-ray images. In order to calculate diagnostic indicators of heart disease and spinal disease such as VHS(vertebral heart scale) and Cobb angle, it is necessary to accurately segment the thoracic spine, carina, and heart in a chest X-ray image. A deep neural network model in which the high-resolution representation of the image for each layer and the structure converted into a low-resolution feature map are connected in parallel was adopted. This structure enables the relative position information in the image to be effectively reflected in the segmentation process. It is shown that learning performance can be improved by combining the OCR module, in which pixel information and object information are mutually interacted in a multi-step process, and the channel attention module, which allows each channel of the network to be reflected as different weight values. In addition, a method of augmenting learning data is presented in order to provide robust performance against changes in the position, shape, and size of the subject in the X-ray image. The effectiveness of the proposed theory was evaluated through an experiment using 145 human chest X-ray images and 118 animal X-ray images.

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.

Comparative Analysis of Self-supervised Deephashing Models for Efficient Image Retrieval System (효율적인 이미지 검색 시스템을 위한 자기 감독 딥해싱 모델의 비교 분석)

  • Kim Soo In;Jeon Young Jin;Lee Sang Bum;Kim Won Gyum
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.519-524
    • /
    • 2023
  • In hashing-based image retrieval, the hash code of a manipulated image is different from the original image, making it difficult to search for the same image. This paper proposes and evaluates a self-supervised deephashing model that generates perceptual hash codes from feature information such as texture, shape, and color of images. The comparison models are autoencoder-based variational inference models, but the encoder is designed with a fully connected layer, convolutional neural network, and transformer modules. The proposed model is a variational inference model that includes a SimAM module of extracting geometric patterns and positional relationships within images. The SimAM module can learn latent vectors highlighting objects or local regions through an energy function using the activation values of neurons and surrounding neurons. The proposed method is a representation learning model that can generate low-dimensional latent vectors from high-dimensional input images, and the latent vectors are binarized into distinguishable hash code. From the experimental results on public datasets such as CIFAR-10, ImageNet, and NUS-WIDE, the proposed model is superior to the comparative model and analyzed to have equivalent performance to the supervised learning-based deephashing model. The proposed model can be used in application systems that require low-dimensional representation of images, such as image search or copyright image determination.

Development and Efficacy Validation of an ICF-Based Chatbot System to Enhance Community Participation of Elderly Individuals with Mild Dementia in South Korea (우리나라 경도 치매 노인의 지역사회 참여 증진을 위한 ICF 기반 Decision Tree for Chatbot 시스템 개발과 효과성 검증)

  • Haewon Byeon
    • Journal of Advanced Technology Convergence
    • /
    • v.3 no.3
    • /
    • pp.17-27
    • /
    • 2024
  • This study focuses on the development and evaluation of a chatbot system based on the International Classification of Functioning, Disability, and Health (ICF) framework to enhance community participation among elderly individuals with mild dementia in South Korea. The study involved 12 elderly participants who were living alone and had been diagnosed with mild dementia, along with 15 caregivers who were actively involved in their daily care. The development process included a comprehensive needs assessment, system design, content creation, natural language processing using Transformer Attention Algorithm, and usability testing. The chatbot is designed to offer personalized activity recommendations, reminders, and information that support physical, social, and cognitive engagement. Usability testing revealed high levels of user satisfaction and perceived usefulness, with significant improvements in community activities and social interactions. Quantitative analysis showed a 92% increase in weekly community activities and an 84% increase in social interactions. Qualitative feedback highlighted the chatbot's user-friendly interface, relevance of suggested activities, and its role in reducing caregiver burden. The study demonstrates that an ICF-based chatbot system can effectively promote community participation and improve the quality of life for elderly individuals with mild dementia. Future research should focus on refining the system and evaluating its long-term impact.

Semi-supervised domain adaptation using unlabeled data for end-to-end speech recognition (라벨이 없는 데이터를 사용한 종단간 음성인식기의 준교사 방식 도메인 적응)

  • Jeong, Hyeonjae;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.29-37
    • /
    • 2020
  • Recently, the neural network-based deep learning algorithm has dramatically improved performance compared to the classical Gaussian mixture model based hidden Markov model (GMM-HMM) automatic speech recognition (ASR) system. In addition, researches on end-to-end (E2E) speech recognition systems integrating language modeling and decoding processes have been actively conducted to better utilize the advantages of deep learning techniques. In general, E2E ASR systems consist of multiple layers of encoder-decoder structure with attention. Therefore, E2E ASR systems require data with a large amount of speech-text paired data in order to achieve good performance. Obtaining speech-text paired data requires a lot of human labor and time, and is a high barrier to building E2E ASR system. Therefore, there are previous studies that improve the performance of E2E ASR system using relatively small amount of speech-text paired data, but most studies have been conducted by using only speech-only data or text-only data. In this study, we proposed a semi-supervised training method that enables E2E ASR system to perform well in corpus in different domains by using both speech or text only data. The proposed method works effectively by adapting to different domains, showing good performance in the target domain and not degrading much in the source domain.

A Generalized Adaptive Deep Latent Factor Recommendation Model (일반화 적응 심층 잠재요인 추천모형)

  • Kim, Jeongha;Lee, Jipyeong;Jang, Seonghyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.249-263
    • /
    • 2023
  • Collaborative Filtering, a representative recommendation system methodology, consists of two approaches: neighbor methods and latent factor models. Among these, the latent factor model using matrix factorization decomposes the user-item interaction matrix into two lower-dimensional rectangular matrices, predicting the item's rating through the product of these matrices. Due to the factor vectors inferred from rating patterns capturing user and item characteristics, this method is superior in scalability, accuracy, and flexibility compared to neighbor-based methods. However, it has a fundamental drawback: the need to reflect the diversity of preferences of different individuals for items with no ratings. This limitation leads to repetitive and inaccurate recommendations. The Adaptive Deep Latent Factor Model (ADLFM) was developed to address this issue. This model adaptively learns the preferences for each item by using the item description, which provides a detailed summary and explanation of the item. ADLFM takes in item description as input, calculates latent vectors of the user and item, and presents a method that can reflect personal diversity using an attention score. However, due to the requirement of a dataset that includes item descriptions, the domain that can apply ADLFM is limited, resulting in generalization limitations. This study proposes a Generalized Adaptive Deep Latent Factor Recommendation Model, G-ADLFRM, to improve the limitations of ADLFM. Firstly, we use item ID, commonly used in recommendation systems, as input instead of the item description. Additionally, we apply improved deep learning model structures such as Self-Attention, Multi-head Attention, and Multi-Conv1D. We conducted experiments on various datasets with input and model structure changes. The results showed that when only the input was changed, MAE increased slightly compared to ADLFM due to accompanying information loss, resulting in decreased recommendation performance. However, the average learning speed per epoch significantly improved as the amount of information to be processed decreased. When both the input and the model structure were changed, the best-performing Multi-Conv1d structure showed similar performance to ADLFM, sufficiently counteracting the information loss caused by the input change. We conclude that G-ADLFRM is a new, lightweight, and generalizable model that maintains the performance of the existing ADLFM while enabling fast learning and inference.

Application of spatiotemporal transformer model to improve prediction performance of particulate matter concentration (미세먼지 예측 성능 개선을 위한 시공간 트랜스포머 모델의 적용)

  • Kim, Youngkwang;Kim, Bokju;Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.329-352
    • /
    • 2022
  • It is reported that particulate matter(PM) penetrates the lungs and blood vessels and causes various heart diseases and respiratory diseases such as lung cancer. The subway is a means of transportation used by an average of 10 million people a day, and although it is important to create a clean and comfortable environment, the level of particulate matter pollution is shown to be high. It is because the subways run through an underground tunnel and the particulate matter trapped in the tunnel moves to the underground station due to the train wind. The Ministry of Environment and the Seoul Metropolitan Government are making various efforts to reduce PM concentration by establishing measures to improve air quality at underground stations. The smart air quality management system is a system that manages air quality in advance by collecting air quality data, analyzing and predicting the PM concentration. The prediction model of the PM concentration is an important component of this system. Various studies on time series data prediction are being conducted, but in relation to the PM prediction in subway stations, it is limited to statistical or recurrent neural network-based deep learning model researches. Therefore, in this study, we propose four transformer-based models including spatiotemporal transformers. As a result of performing PM concentration prediction experiments in the waiting rooms of subway stations in Seoul, it was confirmed that the performance of the transformer-based models was superior to that of the existing ARIMA, LSTM, and Seq2Seq models. Among the transformer-based models, the performance of the spatiotemporal transformers was the best. The smart air quality management system operated through data-based prediction becomes more effective and energy efficient as the accuracy of PM prediction improves. The results of this study are expected to contribute to the efficient operation of the smart air quality management system.