• Title/Summary/Keyword: Language Training

Search Result 696, Processing Time 0.026 seconds

Performance Comparison and Error Analysis of Korean Bio-medical Named Entity Recognition (한국어 생의학 개체명 인식 성능 비교와 오류 분석)

  • Jae-Hong Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.701-708
    • /
    • 2024
  • The advent of transformer architectures in deep learning has been a major breakthrough in natural language processing research. Object name recognition is a branch of natural language processing and is an important research area for tasks such as information retrieval. It is also important in the biomedical field, but the lack of Korean biomedical corpora for training has limited the development of Korean clinical research using AI. In this study, we built a new biomedical corpus for Korean biomedical entity name recognition and selected language models pre-trained on a large Korean corpus for transfer learning. We compared the name recognition performance of the selected language models by F1-score and the recognition rate by tag, and analyzed the errors. In terms of recognition performance, KlueRoBERTa showed relatively good performance. The error analysis of the tagging process shows that the recognition performance of Disease is excellent, but Body and Treatment are relatively low. This is due to over-segmentation and under-segmentation that fails to properly categorize entity names based on context, and it will be necessary to build a more precise morphological analyzer and a rich lexicon to compensate for the incorrect tagging.

Development of Sensor and Block expandable Teaching-Aids-robot (센서 및 블록 확장 가능한 교구용 보조 로봇 개발)

  • Sim, Hyun;Lee, Hyeong-Ok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.2
    • /
    • pp.345-352
    • /
    • 2017
  • In this paper, we design and implement an educational robot system that can use scratch education with the function of user demanding to perform robot education in actual school site in an embedded environment. It is developed to enable physical education for sensing information processing, software design and programming practice training that is the basis of robotic system. The development environment of the system is Arduino Uno based product using Atmega 328 core, debugging environment based on Arduino Sketch, firmware development language using C language, OS using Windows, Linux, Mac OS X. The system operation process receives the control command of the server using the Bluetooth communication, and drives various sensors of the educational robot. The curriculum includes Scratch program and Bluetooth communication, which enables real-time scratch training. It also provides smartphone apps and is designed to enable education like C and Python through expansion. Teachers at the school site used the developed products and presented performance processing results satisfying the missionary needs of the missionaries.

Automatic Generation of Concatenate Morphemes for Korean LVCSR (대어휘 연속음성 인식을 위한 결합형태소 자동생성)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.407-414
    • /
    • 2002
  • In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.

A Study on the Analysis for Learning Difficulties of Foreign Students in University of South Korea - Focusing on Chinese Foreign Students - (한국 대학에서 유학생이 겪는 학습의 어려움 분석 - 중국인 유학생을 중심으로 -)

  • Lee, Eun-Hwa;Cho, Yong-Gae;Kim, Nan-Hee
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.26 no.6
    • /
    • pp.1261-1277
    • /
    • 2014
  • In recently, attracting foreign students are very active in lots of universities of South Korea. According to trend, foreign students who are studying in Korea have increased steadily. The programs to support and help them for adaptation of the university and academic success are rising though, on account of language barrier, it seems not easy to adapt to those people who finished studying korean language training that roles incubator and entered their major. More over, to expect personal training for them by professors is also difficult because of short of educational or executive and bankroll support in reality. Therefore it became a social issue about managing foreign students of South Korea. This study aimed to analyse the difficulties of learning from chinese international student's view. For this, we analysed focus group interview which intended 16 chinese foreign students and the collected data through reflective journal record using Nvivo program. In the results of focus group interview, learning difficulties of chinese foreign students are itemized 4 sections of personal aspect, environmental aspect, educational contents' aspect and educational methodic aspect. It is subdivided 11 sections and identified of requirement for studying support corresponding each part of difficulties. This research finding will be able to expect to provide a suggestion to looking for options for learning support plan of chinese foreign students.

A study on effective diction training in choral communication (합창 커뮤니케이션에서 효과적인 딕션 훈련을 위한 연구)

  • Kim, Hyung-il
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.3
    • /
    • pp.237-245
    • /
    • 2021
  • The puropose of this study is to propose an effective diction training techniqe that conductors can use in choral communication. In chorus, the phonology of the language used in the lyrics influences the diction. Therefore, Korean lyrics must be pronounced according to Korean phonology. In verbal language, accuracy of pronunciation is important, but when expressing lyrics in a song, both vocalization and diction are important. In particular, chorus is sung by many people, so if the diction is not accurate, the lyrics will not be delivered properly. In this study, the dictions of lyrics frequently used in actual Korean choral songs were systematically analyzed according to Korean phonological rules. As a result of the study, the main factor that makes choral diction difficult is the phenomenon of phonological fluctuations in Korean. In particular, phonological fluctuations often occurred when pronouncing the final sound and when consonants and consonants were combined. A follow-up study intends to contribute to the development of choral communication by presenting a systematic choral diction based on Korean phonology.

Sentence Filtering Dataset Construction Method about Web Corpus (웹 말뭉치에 대한 문장 필터링 데이터 셋 구축 방법)

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1505-1511
    • /
    • 2021
  • Pretrained models with high performance in various tasks within natural language processing have the advantage of learning the linguistic patterns of sentences using large corpus during the training, allowing each token in the input sentence to be represented with appropriate feature vectors. One of the methods of constructing a corpus required for a pre-trained model training is a collection method using web crawler. However, sentences that exist on web may contain unnecessary words in some or all of the sentences because they have various patterns. In this paper, we propose a dataset construction method for filtering sentences containing unnecessary words using neural network models for corpus collected from the web. As a result, we construct a dataset containing a total of 2,330 sentences. We also evaluated the performance of neural network models on the constructed dataset, and the BERT model showed the highest performance with an accuracy of 93.75%.

Whom does Harry's Magic Power Benefit?: Imperialistic Ideas of Children in The Harry Potter Books ("누구를 위한 마법능력인가?" -『해리 포터』와 영국 제국주의 아동관)

  • Park, Sojin
    • Journal of English Language & Literature
    • /
    • v.55 no.1
    • /
    • pp.3-24
    • /
    • 2009
  • The Harry Potter series is considered to represent the multicultural aspect of contemporary British society and to show critical perspectives of racism. This series, however, also includes many elements of British imperialism. This paper examines the ideas about education and Harry's role in relation to British imperialism. One of the main ideas prevalent in 19th century British boys' public schools was that people's blood origin is the most important element in determining their characteristics, ability and moral qualities. The students' inherited capacity and their family background are more highly regarded than their secondary learning and training. This reflects a 19th century concept that ultimately, inborn quality makes 'a hero', a truth presented in the educational policies of Hogwarts. Hogwarts' educational policies and systems can also be related to 'developmentalism', which defines children as imperfect, in-progress and incomplete, thus needing proper training and discipline. As this concept functioned to justify the control of children while educating them, Hogwarts adopts diverse controlling devices and oppressive policies, which are mainly justified in the name of education. On the one hand, child characters are controlled and oppressed by the school authorities, on the other hand, some of the students such as Harry have remarkable magic powers enough to resist the adult authority and even to save the magic society from the evil power. Harry plays dual roles, which the British boys of the Empire were assigned from their society; they are important heirs to conquer the 'evil' or 'barbarous' world but need to be obedient to a 'good' authority to achieve the mission. Harry's magic power and self-discipline ultimately contribute to fulfilling Dumbledore's mission, which mirrors 19th century British boys' roles as the heirs of the British Empire.

The Development of an ADDIE Based Instructional Model for ELT in Early Childhood Education

  • MARIAM, Nuzhat;NAM, Chang-woo
    • Educational Technology International
    • /
    • v.20 no.1
    • /
    • pp.25-55
    • /
    • 2019
  • The core purpose of the study is to develop and validate an ADDIE model based instructional model for English Language Teaching (ELT) in early childhood classroom in Bangladesh as an aid to teachers to reconstruct their knowledge and experience more strategically, and for them to design and implement their instruction more structurally. This study is developmental in nature which has been divided in five phases as follows. Phase I: Existing methods and instructional strategy review, Phase II: Instructional model development, Phase III: Delphi 1st round, Phase IV: Delphi 2nd round and Phase V: Model validation. After reviewing relevant literature and existing strategy in phase I, the 1st version of instructional model is made phase II. Next in phase III and phase IV, two rounds of Delphi have been conducted where experts related to different concerning areas of this study reviewed the 1st version and gradually the final version of the instructional model is made. Finally, the instructional model for English teachers of early childhood classroom in Bangladesh got validated by the same Delphi panelists in Phase V. In respect with each phases of ADDIE, the instructional model elaborates the 1) representative key points, 2) instructors' activities prescribed for the instructors, 3) supporting strategies. Both the conceptual and procedural models are included in this study for clearer identification of the whole process. Lastly the study provides some recommendations for instructors and practitioners on choosing the instructional model like doing prior need analysis, incorporating teacher training programs, training students, keeping on researching for finding effective teaching technique and tools and being open to changes etc. In addition, the study also acknowledges its limitations like not being able to consider the psychological factors due to time limitation. Finally, at the end the study points out the areas that welcome further research.

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

  • Modi, Deepa;Nain, Neeta;Nehra, Maninder
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.147-154
    • /
    • 2018
  • Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.