• Title/Summary/Keyword: Korean language training

Search Result 436, Processing Time 0.025 seconds

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

  • Junseok Oh;Eunsoo Cho;Ji-Hwan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.6
    • /
    • pp.1692-1705
    • /
    • 2024
  • In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

High-Quality Multimodal Dataset Construction Methodology for ChatGPT-Based Korean Vision-Language Pre-training (ChatGPT 기반 한국어 Vision-Language Pre-training을 위한 고품질 멀티모달 데이터셋 구축 방법론)

  • Jin Seong;Seung-heon Han;Jong-hun Shin;Soo-jong Lim;Oh-woog Kwon
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.603-608
    • /
    • 2023
  • 본 연구는 한국어 Vision-Language Pre-training 모델 학습을 위한 대규모 시각-언어 멀티모달 데이터셋 구축에 대한 필요성을 연구한다. 현재, 한국어 시각-언어 멀티모달 데이터셋은 부족하며, 양질의 데이터 획득이 어려운 상황이다. 따라서, 본 연구에서는 기계 번역을 활용하여 외국어(영문) 시각-언어 데이터를 한국어로 번역하고 이를 기반으로 생성형 AI를 활용한 데이터셋 구축 방법론을 제안한다. 우리는 다양한 캡션 생성 방법 중, ChatGPT를 활용하여 자연스럽고 고품질의 한국어 캡션을 자동으로 생성하기 위한 새로운 방법을 제안한다. 이를 통해 기존의 기계 번역 방법보다 더 나은 캡션 품질을 보장할 수 있으며, 여러가지 번역 결과를 앙상블하여 멀티모달 데이터셋을 효과적으로 구축하는데 활용한다. 뿐만 아니라, 본 연구에서는 의미론적 유사도 기반 평가 방식인 캡션 투영 일치도(Caption Projection Consistency) 소개하고, 다양한 번역 시스템 간의 영-한 캡션 투영 성능을 비교하며 이를 평가하는 기준을 제시한다. 최종적으로, 본 연구는 ChatGPT를 이용한 한국어 멀티모달 이미지-텍스트 멀티모달 데이터셋 구축을 위한 새로운 방법론을 제시하며, 대표적인 기계 번역기들보다 우수한 영한 캡션 투영 성능을 증명한다. 이를 통해, 우리의 연구는 부족한 High-Quality 한국어 데이터 셋을 자동으로 대량 구축할 수 있는 방향을 보여주며, 이 방법을 통해 딥러닝 기반 한국어 Vision-Language Pre-training 모델의 성능 향상에 기여할 것으로 기대한다.

  • PDF

An analysis of English as a foreign language learners' perceptual confusions and phonemic awareness of English fricatives

  • KyungA Lee
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.37-44
    • /
    • 2023
  • This study investigates perceptual confusions of English fricatives among 121 Korean elementary school English as a foreign language (EFL) learners with shorter periods of learning English. The objective is to examine how they perceive English fricative consonants and to provide educational guidelines. Two sets of English fricative identification tasks-voiceless fricatives and voiced fricatives-were administered to participants in a High Variability Phonetic Training (HVPT) setting. Their phonemic awareness of the fricatives was visualized in perceptual confusion maps via multidimensional scaling analysis. The findings are explored in terms of the impacts of Korean EFL learners' L1 linguistic aspects and a comparison with L1 learners. Learners' phonemic awareness patterns are then compared with their relative importance in speech intelligibility based on a functional load hierarchy. The results indicated that Korean elementary EFL learners recognized English fricatives in a manner largely akin to L1 learners, suggesting their ongoing acquisition progress. Additionally, the findings demonstrated that the young EFL learners possess sufficient phonemic awareness for most high functional load segments but encounter some difficulties with one high and one low functional pair. The findings of this study offer suggestions for diagnosing language learners' phonemic awareness abilities, thereby aiding in the development of practical guidelines for language instructional design and helping educators make informed decisions regarding teaching priority in L2 classes.

A Study on the Development of Occupational Purpose Korean Language Curriculum for Foreign Deck Crews (외국인 갑판부선원을 위한 직업목적 한국어 교육과정 개발에 관한 연구)

  • Park, Kyeung-Eun;Park, Jin-Soo;Ha, Weon-Jae
    • Journal of Navigation and Port Research
    • /
    • v.42 no.4
    • /
    • pp.253-266
    • /
    • 2018
  • This study aims to develop the occupational Korean language curriculum for the foreign seafarers working on the Korean coastal vessel. In the recent years, the employment of foreign seafarer has increased significantly to meet the shortage of the Korean seafarers. As the number of the mixed-crew vessels has increased, communication and cross-cultural awareness among different nationalities have emerged as an important issue. Therefore, the foreign seafarers are obliged to undergo Korean language training to help them in adapting to the Korean vessel according to the 'Guidelines of foreign seafarers management'. However, the time for language training is very short, and there are no systematically developed textbooks. Therefore it is essential to develop curriculum and textbooks for foreign seafarers to acquire training in fundamental Korean dialogues for their daily life and work on board. This study was carried out using the DACUM method to draw core tasks from various works and tasks on board. Firstly, the existing Korean language teaching materials for the foreign crew should be analyzed. Secondly, the job analysis committee should be organized based on the analysis. Then, the list of the tasks for the crew through the committee workshop should be prepared. Thereafter, a questionnaire survey should be carried out to identify the level of importance and frequency by the seafarers working on the coastal vessel. Finally, the core curriculum of the Korean language should be developed for foreign deck crews.

Hypernetwork-based Natural Language Sentence Generation by Word Relation Pattern Learning (단어 간 관계 패턴 학습을 통한 하이퍼네트워크 기반 자연 언어 문장 생성)

  • Seok, Ho-Sik;Bootkrajang, Jakramate;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.3
    • /
    • pp.205-213
    • /
    • 2010
  • We introduce a natural language sentence generation (NLG) method based on learning of word-association patterns. Existing NLG methods assume the inherent grammar rules or use template based method. Contrary to the existing NLG methods, the presented method learns the words-association patterns using only the co-occurrence of words without additional information such as tagging. We employ the hypernetwork method to analyze and represent the words-association patterns. As training going on, the model complexity is increased. After completing each training phase, natural language sentences are generated using the learned hyperedges. The number of grammatically plausible sentences increases after each training phase. We confirm that the proposed method has a potential for learning grammatical properties of training corpuses by comparing the diversity of grammatical rules of training corpuses and the generated sentences.

Korean Machine Reading Comprehension for Patent Consultation Using BERT (BERT를 이용한 한국어 특허상담 기계독해)

  • Min, Jae-Ok;Park, Jin-Woo;Jo, Yu-Jeong;Lee, Bong-Gun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.145-152
    • /
    • 2020
  • MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.

Intervention Efficacy of Mother Training on Social Reciprocity for Children with Autism (자폐아동을 위한 어머니 훈련 프로그램이 가정에서의 사회적 상호작용에 미치는 효과)

  • Won, Dae-Young;Seung, Hye-Kyeung;Elder, Jennifer
    • Child Health Nursing Research
    • /
    • v.11 no.4
    • /
    • pp.444-455
    • /
    • 2005
  • Purpose: This study examined the efficacy of parent training interventions to facilitate social reciprocity and language development in children with autism. Methods: The social interaction behaviors of mothers and children over time were compared using single subject design experimentation methodology. five children who were diagnosed with autism and their mothers participated in the study. The participants were recruited from U city, Korea. The mothers were trained using training videotapes and demonstrations on how to facilitate social interaction with their children as well as promoting language development. following the training, data were collected three times per week by video taping mother-child interaction in their homes. Results: Four of the five mothers demonstrated increases in the use of imitation with animation and expectant waiting after the intervention compared to the baseline sessions; the children demonstrated noticeable increases in the use of initiation of interaction, vocalizations, and verbal production after their mothers received the training intervention. Conclusion : Results of this study demonstrate the efficacy of mother training to improve social interactions of children with autism. Additional important information can be gained by replicating this study with more participants and comparing intervention and control groups. Clearly, this intervention shows promise and has implications far clinical practice.

  • PDF

Development of the Guidelines on the VTS English Competency Test

  • 최승희;장은규
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.249-250
    • /
    • 2022
  • The purpose of this paper is to suggest the development of the Guidelines on VTS English language competency test, according to IALA Guideline 1132 - VTS Voice Communications and Phraseology. As the foundation for improving VTSOs' communication capabilities throughout their career lifecycle in terms of training, accreditation, and revalidation, a development of a VTS-specific language testing system with explicit language testing evaluation criteria becomes more critical. With the aim of facilitating the discussion, a range of suggestions to be considered in the development of Guidelines on the VTS English competency test are made

  • PDF

The Effects of Reading Pronunciation Training of Korean Phonological Process Words for Chinese Learners (중국인 학습자의 우리말 음운변동 단어의 읽기 발음 훈련효과)

  • Lee, Yu-Ra;Kim, Soo-Jin
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.77-86
    • /
    • 2009
  • This study observes how the combined intervention program effects on the acquisition reading pronunciation of Korean phonological process words and the acquisition aspects of each phonological process rules to four Korean learners whose first language is Chinese. The training program is the combination of multisensory Auditory, Visual and Kinethetic (AVK) approach, wholistic approach, and metalinguistic approach. The training purpose is to evaluate how accurately they read the words of the phonological process which have fortisization, nasalization, lateralization, intermediate sound /ㅅ/ (/${\int}iot"$/). We access how they read the untrained words which include the four factors above. The intervention effects are analyzed by the multiple probe across subjects design. The results indicate that the combined phonological process rule explanation and the words activity intervention affects the four Chinese subjects in every type of word. The implications of the study are these: First, it suggests the effect of Korean pronunciation intervention in a concrete way. Second, it offers how to evaluate the phonological process and how to train people who are learning Korean language.

  • PDF