• Title/Summary/Keyword: contextualized task

Search Result 3, Processing Time 0.022 seconds

Comparative study of text representation and learning for Persian named entity recognition

  • Pour, Mohammad Mahdi Abdollah;Momtazi, Saeedeh
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.794-804
    • /
    • 2022
  • Transformer models have had a great impact on natural language processing (NLP) in recent years by realizing outstanding and efficient contextualized language models. Recent studies have used transformer-based language models for various NLP tasks, including Persian named entity recognition (NER). However, in complex tasks, for example, NER, it is difficult to determine which contextualized embedding will produce the best representation for the tasks. Considering the lack of comparative studies to investigate the use of different contextualized pretrained models with sequence modeling classifiers, we conducted a comparative study about using different classifiers and embedding models. In this paper, we use different transformer-based language models tuned with different classifiers, and we evaluate these models on the Persian NER task. We perform a comparative analysis to assess the impact of text representation and text classification methods on Persian NER performance. We train and evaluate the models on three different Persian NER datasets, that is, MoNa, Peyma, and Arman. Experimental results demonstrate that XLM-R with a linear layer and conditional random field (CRF) layer exhibited the best performance. This model achieved phrase-based F-measures of 70.04, 86.37, and 79.25 and word-based F scores of 78, 84.02, and 89.73 on the MoNa, Peyma, and Arman datasets, respectively. These results represent state-of-the-art performance on the Persian NER task.

The Relationship between Children's Reading Ability of Environmental Print, Vocabulary and Print Concepts (유아의 환경인쇄물 읽기능력과 어휘력 및 인쇄물 개념 간의 관계)

  • Lee, Shin Hee;Kim, Myung Soon;Son, Seung Hee
    • Korean Journal of Child Studies
    • /
    • v.34 no.3
    • /
    • pp.75-92
    • /
    • 2013
  • This study investigated the differences and relationships between environmental print reading ability, vocabulary, and print concepts of children at ages 3 and 4. The subjects comprised 90 children, who could not read letters. The Children's Reading Ability of Environmental Print Scale(Son, 2012), Receptive and Expressive Vocabulary Test(Kim et al., 2009) and Concepts About Print(Kim & Kim, 2004) were used in this study. The collected data were analyzed using t-test and Pearson's correlations. The results of this study were as follows; in terms of Illiterate Korean children, aged 3 to 4 years, their scores on the environmental print reading tasks were positively correlated with vocabulary and print concepts.

Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words

  • Lee, Tae-Seok;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Information Processing Systems
    • /
    • v.18 no.3
    • /
    • pp.344-358
    • /
    • 2022
  • Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE-1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.