• Title/Summary/Keyword: Trained Model

Search Result 1,510, Processing Time 0.027 seconds

Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation

  • Jeon, Hyung-Bae;Lee, Soo-Young
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.487-493
    • /
    • 2016
  • Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting.

A Survey on Deep Learning-based Pre-Trained Language Models (딥러닝 기반 사전학습 언어모델에 대한 이해와 현황)

  • Sangun Park
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.11-29
    • /
    • 2022
  • Pre-trained language models are the most important and widely used tools in natural language processing tasks. Since those have been pre-trained for a large amount of corpus, high performance can be expected even with fine-tuning learning using a small number of data. Since the elements necessary for implementation, such as a pre-trained tokenizer and a deep learning model including pre-trained weights, are distributed together, the cost and period of natural language processing has been greatly reduced. Transformer variants are the most representative pre-trained language models that provide these advantages. Those are being actively used in other fields such as computer vision and audio applications. In order to make it easier for researchers to understand the pre-trained language model and apply it to natural language processing tasks, this paper describes the definition of the language model and the pre-learning language model, and discusses the development process of the pre-trained language model and especially representative Transformer variants.

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development (사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구)

  • Lee, Chi Hoon;Lee, Yeon Ji;Lee, Dong Hee
    • Journal of Information Technology Services
    • /
    • v.19 no.5
    • /
    • pp.83-91
    • /
    • 2020
  • Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.

Medical Image Classification using Pre-trained Convolutional Neural Networks and Support Vector Machine

  • Ahmed, Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • Recently, pre-trained convolutional neural network CNNs have been widely used and applied for medical image classification. These models can utilised in three different ways, for feature extraction, to use the architecture of the pre-trained model and to train some layers while freezing others. In this study, the ResNet18 pre-trained CNNs model is used for feature extraction, followed by the support vector machine for multiple classes to classify medical images from multi-classes, which is used as the main classifier. Our proposed classification method was implemented on Kvasir and PH2 medical image datasets. The overall accuracy was 93.38% and 91.67% for Kvasir and PH2 datasets, respectively. The classification results and performance of our proposed method outperformed some of the related similar methods in this area of study.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

Robust Sentiment Classification of Metaverse Services Using a Pre-trained Language Model with Soft Voting

  • Haein Lee;Hae Sun Jung;Seon Hong Lee;Jang Hyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2334-2347
    • /
    • 2023
  • Metaverse services generate text data, data of ubiquitous computing, in real-time to analyze user emotions. Analysis of user emotions is an important task in metaverse services. This study aims to classify user sentiments using deep learning and pre-trained language models based on the transformer structure. Previous studies collected data from a single platform, whereas the current study incorporated the review data as "Metaverse" keyword from the YouTube and Google Play Store platforms for general utilization. As a result, the Bidirectional Encoder Representations from Transformers (BERT) and Robustly optimized BERT approach (RoBERTa) models using the soft voting mechanism achieved a highest accuracy of 88.57%. In addition, the area under the curve (AUC) score of the ensemble model comprising RoBERTa, BERT, and A Lite BERT (ALBERT) was 0.9458. The results demonstrate that the ensemble combined with the RoBERTa model exhibits good performance. Therefore, the RoBERTa model can be applied on platforms that provide metaverse services. The findings contribute to the advancement of natural language processing techniques in metaverse services, which are increasingly important in digital platforms and virtual environments. Overall, this study provides empirical evidence that sentiment analysis using deep learning and pre-trained language models is a promising approach to improving user experiences in metaverse services.

Performance Evaluation of a Dynamic Inverse Model with EnergyPlus Model Simulation for Building Cooling Loads (건물냉방부하에 대한 동적 인버스 모델링기법의 EnergyPlus 건물모델 적용을 통한 성능평가)

  • Lee, Kyoung-Ho;Braun, James E.
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.20 no.3
    • /
    • pp.205-212
    • /
    • 2008
  • This paper describes the application of an inverse building model to a calibrated forward building model using EnergyPlus program. Typically, inverse models are trained using measured data. However, in this study, an inverse building model was trained using data generated by an EnergyPlus model for an actual office building. The EnergyPlus model was calibrated using field data for the building. A training data set for a month of July was generated from the EnergyPlus model to train the inverse model. Cooling load prediction of the trained inverse model was tested using another data set from the EnergyPlus model for a month of August. Predicted cooling loads showed good agreement with cooling loads from the EnergyPlus model with root-mean square errors of 4.11%. In addition, different control strategies with dynamic cooling setpoint variation were simulated using the inverse model. Peak cooling loads and daily cooling loads were compared for the dynamic simulation.

KorPatELECTRA : A Pre-trained Language Model for Korean Patent Literature to improve performance in the field of natural language processing(Korean Patent ELECTRA)

  • Jang, Ji-Mo;Min, Jae-Ok;Noh, Han-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.15-23
    • /
    • 2022
  • In the field of patents, as NLP(Natural Language Processing) is a challenging task due to the linguistic specificity of patent literature, there is an urgent need to research a language model optimized for Korean patent literature. Recently, in the field of NLP, there have been continuous attempts to establish a pre-trained language model for specific domains to improve performance in various tasks of related fields. Among them, ELECTRA is a pre-trained language model by Google using a new method called RTD(Replaced Token Detection), after BERT, for increasing training efficiency. The purpose of this paper is to propose KorPatELECTRA pre-trained on a large amount of Korean patent literature data. In addition, optimal pre-training was conducted by preprocessing the training corpus according to the characteristics of the patent literature and applying patent vocabulary and tokenizer. In order to confirm the performance, KorPatELECTRA was tested for NER(Named Entity Recognition), MRC(Machine Reading Comprehension), and patent classification tasks using actual patent data, and the most excellent performance was verified in all the three tasks compared to comparative general-purpose language models.

Robust Tracking Control Based on Intelligent Sliding-Mode Model-Following Position Controllers for PMSM Servo Drives

  • El-Sousy Fayez F.M.
    • Journal of Power Electronics
    • /
    • v.7 no.2
    • /
    • pp.159-173
    • /
    • 2007
  • In this paper, an intelligent sliding-mode position controller (ISMC) for achieving favorable decoupling control and high precision position tracking performance of permanent-magnet synchronous motor (PMSM) servo drives is proposed. The intelligent position controller consists of a sliding-mode position controller (SMC) in the position feed-back loop in addition to an on-line trained fuzzy-neural-network model-following controller (FNNMFC) in the feedforward loop. The intelligent position controller combines the merits of the SMC with robust characteristics and the FNNMFC with on-line learning ability for periodic command tracking of a PMSM servo drive. The theoretical analyses of the sliding-mode position controller are described with a second order switching surface (PID) which is insensitive to parameter uncertainties and external load disturbances. To realize high dynamic performance in disturbance rejection and tracking characteristics, an on-line trained FNNMFC is proposed. The connective weights and membership functions of the FNNMFC are trained on-line according to the model-following error between the outputs of the reference model and the PMSM servo drive system. The FNNMFC generates an adaptive control signal which is added to the SMC output to attain robust model-following characteristics under different operating conditions regardless of parameter uncertainties and load disturbances. A computer simulation is developed to demonstrate the effectiveness of the proposed intelligent sliding mode position controller. The results confirm that the proposed ISMC grants robust performance and precise response to the reference model regardless of load disturbances and PMSM parameter uncertainties.

Investigation on the nonintrusive multi-fidelity reduced-order modeling for PWR rod bundles

  • Kang, Huilun;Tian, Zhaofei;Chen, Guangliang;Li, Lei;Chu, Tianhui
    • Nuclear Engineering and Technology
    • /
    • v.54 no.5
    • /
    • pp.1825-1834
    • /
    • 2022
  • Performing high-fidelity computational fluid dynamics (HF-CFD) to predict the flow and heat transfer state of the coolant in the reactor core is expensive, especially in scenarios that require extensive parameter search, such as uncertainty analysis and design optimization. This work investigated the performance of utilizing a multi-fidelity reduced-order model (MF-ROM) in PWR rod bundles simulation. Firstly, basis vectors and basis vector coefficients of high-fidelity and low-fidelity CFD results are extracted separately by the proper orthogonal decomposition (POD) approach. Secondly, a surrogate model is trained to map the relationship between the extracted coefficients from different fidelity results. In the prediction stage, the coefficients of the low-fidelity data under the new operating conditions are extracted by using the obtained POD basis vectors. Then, the trained surrogate model uses the low-fidelity coefficients to regress the high-fidelity coefficients. The predicted high-fidelity data is reconstructed from the product of extracted basis vectors and the regression coefficients. The effectiveness of the MF-ROM is evaluated on a flow and heat transfer problem in PWR fuel rod bundles. Two data-driven algorithms, the Kriging and artificial neural network (ANN), are trained as surrogate models for the MF-ROM to reconstruct the complex flow and heat transfer field downstream of the mixing vanes. The results show good agreements between the data reconstructed with the trained MF-ROM and the high-fidelity CFD simulation result, while the former only requires to taken the computational burden of low-fidelity simulation. The results also show that the performance of the ANN model is slightly better than the Kriging model when using a high number of POD basis vectors for regression. Moreover, the result presented in this paper demonstrates the suitability of the proposed MF-ROM for high-fidelity fixed value initialization to accelerate complex simulation.