• Title/Summary/Keyword: pre-trained model

Search Result 295, Processing Time 0.023 seconds

A Multi-task Self-attention Model Using Pre-trained Language Models on Universal Dependency Annotations

  • Kim, Euhee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.39-46
    • /
    • 2022
  • In this paper, we propose a multi-task model that can simultaneously predict general-purpose tasks such as part-of-speech tagging, lemmatization, and dependency parsing using the UD Korean Kaist v2.3 corpus. The proposed model thus applies the self-attention technique of the BERT model and the graph-based Biaffine attention technique by fine-tuning the multilingual BERT and the two Korean-specific BERTs such as KR-BERT and KoBERT. The performances of the proposed model are compared and analyzed using the multilingual version of BERT and the two Korean-specific BERT language models.

Iceberg-Ship Classification in SAR Images Using Convolutional Neural Network with Transfer Learning

  • Choi, Jeongwhan
    • Journal of Internet Computing and Services
    • /
    • v.19 no.4
    • /
    • pp.35-44
    • /
    • 2018
  • Monitoring through Synthesis Aperture Radar (SAR) is responsible for marine safety from floating icebergs. However, there are limits to distinguishing between icebergs and ships in SAR images. Convolutional Neural Network (CNN) is used to distinguish the iceberg from the ship. The goal of this paper is to increase the accuracy of identifying icebergs from SAR images. The metrics for performance evaluation uses the log loss. The two-layer CNN model proposed in research of C.Bentes et al.[1] is used as a benchmark model and compared with the four-layer CNN model using data augmentation. Finally, the performance of the final CNN model using the VGG-16 pre-trained model is compared with the previous model. This paper shows how to improve the benchmark model and propose the final CNN model.

A Methodology on Estimating the Product Life Cycle Cost using Artificial Neural Networks in the Conceptual Design Phase (개념 설계 단계에서 인공 신경망을 이용한 제품의 Life Cycle Cost평가 방법론)

  • 서광규;박지형
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.21 no.9
    • /
    • pp.85-94
    • /
    • 2004
  • As over 70% of the total life cycle cost (LCC) of a product is committed at the early design stage, designers are in an important position to substantially reduce the LCC of the products they design by giving due to life cycle implications of their design decisions. During early design stages, there may be competing concepts with dramatic differences. In addition, the detailed information is scarce and decisions must be made quickly. Thus, both the overhead in developing parametric LCC models fur a wide range of concepts, and the lack of detailed information make the application of traditional LCC models impractical. A different approach is needed, because a traditional LCC method is to be incorporated in the very early design stages. This paper explores an approximate method for providing the preliminary LCC, Learning algorithms trained to use the known characteristics of existing products might allow the LCC of new products to be approximated quickly during the conceptual design phase without the overhead of defining new LCC models. Artificial neural networks are trained to generalize product attributes and LCC data from pre-existing LCC studies. Then the product designers query the trained artificial model with new high-level product attribute data to quickly obtain an LCC for a new product concept. Foundations fur the learning LCC approach are established, and then an application is provided.

Korean Machine Reading Comprehension for Patent Consultation Using BERT (BERT를 이용한 한국어 특허상담 기계독해)

  • Min, Jae-Ok;Park, Jin-Woo;Jo, Yu-Jeong;Lee, Bong-Gun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.145-152
    • /
    • 2020
  • MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.

MalEXLNet:A semantic analysis and detection method of malware API sequence based on EXLNet model

  • Xuedong Mao;Yuntao Zhao;Yongxin Feng;Yutao Hu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.10
    • /
    • pp.3060-3083
    • /
    • 2024
  • With the continuous advancements in malicious code polymorphism and obfuscation techniques, the performance of traditional machine learning-based detection methods for malware variant detection has gradually declined. Additionally, conventional pre-trained models could adequately capture the contextual semantic information of malicious code and appropriately represent polysemous words. To enhance the efficiency of malware variant detection, this paper proposes the MalEXLNet intelligent semantic analysis and detection architecture for malware. This architecture leverages malware API call sequences and employs an improved pre-training model for semantic vector representation, effectively utilizing the semantic information of API call sequences. It constructs a hybrid deep learning model, CBAM+AttentionBiLSTM,CBAM+AttentionBiLSTM, for training and classification prediction. Furthermore, incorporating the KMeansSMOTE algorithm achieves balanced processing of small sample data, ensuring the model maintains robust performance in detecting malicious variants from rare malware families. Comparative experiments on generalized datasets, Ember and Catak, the results show that the proposed MalEXLNet architecture achieves excellent performance in malware classification and detection tasks, with accuracies of 98.85% and 94.46% in the two datasets, and macro-averaged and micro-averaged metrics exceeding 98% and 92%, respectively.

Leveraging Reinforcement Learning for LLM-based Automated Software Vulnerability Repair (강화 학습을 활용한 대형 언어 모델 기반 자동 소프트웨어 취약점 패치 생성)

  • Woorim Han;Miseon Yu;Yunheung Paek
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.290-293
    • /
    • 2024
  • Software vulnerabilities impose a significant burden on developers, particularly in debugging and maintenance. Automated Software Vulnerability Repair has emerged as a promising solution to mitigate these challenges. Recent advances have introduced learning-based approaches that take vulnerable functions and their Common Weakness Enumeration (CWE) types as input and generate repaired functions as output. These approaches typically fine-tune large pre-trained language models to produce vulnerability patches, with performance evaluated using Exact Match (EM) and CodeBLEU metrics to assess similarity to ground-truth patches. However, current methods rely on teacher forcing during fine-tuning, where the model is trained with ground-truth inputs, but during inference, inputs are generated by the model itself, leading to exposure bias. Additionally, while models are trained using the cross-entropy loss function, they are evaluated using discrete, non-differentiable metrics, resulting in a mismatch between the training objective and the test objective. This mismatch can yield inconsistent results, as the model is not directly optimized to improve test-time performance metrics. To address these discrepancies, we propose the use of reinforcement learning (RL) to optimize patch generation. By directly using the CodeBLEU score as a reward signal during training, our approach encourages the generation of higher-quality patches that align more closely with evaluation metrics, thereby improving overall performance.

Development of Deep Learning AI Model and RGB Imagery Analysis Using Pre-sieved Soil (입경 분류된 토양의 RGB 영상 분석 및 딥러닝 기법을 활용한 AI 모델 개발)

  • Kim, Dongseok;Song, Jisu;Jeong, Eunji;Hwang, Hyunjung;Park, Jaesung
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.66 no.4
    • /
    • pp.27-39
    • /
    • 2024
  • Soil texture is determined by the proportions of sand, silt, and clay within the soil, which influence characteristics such as porosity, water retention capacity, electrical conductivity (EC), and pH. Traditional classification of soil texture requires significant sample preparation including oven drying to remove organic matter and moisture, a process that is both time-consuming and costly. This study aims to explore an alternative method by developing an AI model capable of predicting soil texture from images of pre-sorted soil samples using computer vision and deep learning technologies. Soil samples collected from agricultural fields were pre-processed using sieve analysis and the images of each sample were acquired in a controlled studio environment using a smartphone camera. Color distribution ratios based on RGB values of the images were analyzed using the OpenCV library in Python. A convolutional neural network (CNN) model, built on PyTorch, was enhanced using Digital Image Processing (DIP) techniques and then trained across nine distinct conditions to evaluate its robustness and accuracy. The model has achieved an accuracy of over 80% in classifying the images of pre-sorted soil samples, as validated by the components of the confusion matrix and measurements of the F1 score, demonstrating its potential to replace traditional experimental methods for soil texture classification. By utilizing an easily accessible tool, significant time and cost savings can be expected compared to traditional methods.

LEARNING PERFORMANCE AND DESIGN OF AN ADAPTIVE CONTROL FUCTION GENERATOR: CMAC(Cerebellar Model Arithmetic Controller)

  • Choe, Dong-Yeop;Hwang, Hyeon
    • 한국기계연구소 소보
    • /
    • s.19
    • /
    • pp.125-139
    • /
    • 1989
  • As an adaptive control function generator, the CMAC (Cerebellar Model Arithmetic or Articulated Controller) based learning control has drawn a great attention to realize a rather robust real-time manipulator control under the various uncertainties. There remain, however, inherent problems to be solved in the CMAC application to robot motion control or perception of sensory information. To apply the CMAC to the various unmodeled or modeled systems more efficiently, it is necessary to analyze the effects of the CMAC control parameters on the trained net. Although the CMAC control parameters such as size of the quantizing block, learning gain, input offset, and ranges of input variables play a key role in the learning performance and system memory requirement, these have not been fully investigated yet. These parameters should be determined, of course, considering the shape of the desired function to be trained and learning algorithms applied. In this paper, the interrelation of these parameters with learning performance is investigated under the basic learning schemes presented by authors. Since an analytic approach only seems to be very difficult and even impossible for this purpose, various simulations have been performed with pre specified functions and their results were analyzed. A general step following design guide was set up according to the various simulation results.

  • PDF

A Study on Automatic Classification of Profanity Sentences of Elementary School Students Using BERT (BERT를 활용한 초등학교 고학년의 욕설문장 자동 분류방안 연구)

  • Shim, Jaekwoun
    • Journal of Creative Information Culture
    • /
    • v.7 no.2
    • /
    • pp.91-98
    • /
    • 2021
  • As the amount of time that elementary school students spend online increased due to Corona 19, the amount of posts, comments, and chats they write increased, and problems such as offending others' feelings or using swear words are occurring. Netiquette is being educated in elementary school, but training time is insufficient. In addition, it is difficult to expect changes in student behavior. So, technical support through natural language processing is needed. In this study, an experiment was conducted to automatically filter profanity sentences by applying them to a pre-trained language model on sentences written by elementary school students. In the experiment, chat details of elementary school 4-6 graders were collected on an online learning platform, and general sentences and profanity sentences were trained through a pre-learned language model. As a result of the experiment, as a result of classifying profanity sentences, it was analyzed that the precision was 75%. It has been shown that if the learning data is sufficiently supplemented, it can be sufficiently applied to the online platform used by elementary school students.

Zero-shot Korean Sentiment Analysis with Large Language Models: Comparison with Pre-trained Language Models

  • Soon-Chan Kwon;Dong-Hee Lee;Beak-Cheol Jang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.43-50
    • /
    • 2024
  • This paper evaluates the Korean sentiment analysis performance of large language models like GPT-3.5 and GPT-4 using a zero-shot approach facilitated by the ChatGPT API, comparing them to pre-trained Korean models such as KoBERT. Through experiments utilizing various Korean sentiment analysis datasets in fields like movies, gaming, and shopping, the efficiency of these models is validated. The results reveal that the LMKor-ELECTRA model displayed the highest performance based on F1-score, while GPT-4 particularly achieved high accuracy and F1-scores in movie and shopping datasets. This indicates that large language models can perform effectively in Korean sentiment analysis without prior training on specific datasets, suggesting their potential in zero-shot learning. However, relatively lower performance in some datasets highlights the limitations of the zero-shot based methodology. This study explores the feasibility of using large language models for Korean sentiment analysis, providing significant implications for future research in this area.