• Title/Summary/Keyword: 컴퓨터 어휘 학습

Search Result 47, Processing Time 0.018 seconds

Automatic Generation of Multiple-Choice Questions Based on Statistical Language Model (통계 언어모델 기반 객관식 빈칸 채우기 문제 생성)

  • Park, Youngki
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.2
    • /
    • pp.197-206
    • /
    • 2016
  • A fill-in-the-blank with choices are widely used in classrooms in order to check whether students' understand what is being taught. Although there have been proposed many algorithms for generating this type of questions, most of them focus on preparing sentences with blanks rather than generating multiple choices. In this paper, we propose a novel algorithm for generating multiple choices, given a sentence with a blank. Because the algorithm is based on a statistical language model, we can generate relatively unbiased result and adjust the level of difficulty with ease. The experimental results show that our approach automatically produces similar multiple-choices to those of the exam writers.

Automatic Text Summarization based on Selective Copy mechanism against for Addressing OOV (미등록 어휘에 대한 선택적 복사를 적용한 문서 자동요약)

  • Lee, Tae-Seok;Seon, Choong-Nyoung;Jung, Youngim;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.58-65
    • /
    • 2019
  • Automatic text summarization is a process of shortening a text document by either extraction or abstraction. The abstraction approach inspired by deep learning methods scaling to a large amount of document is applied in recent work. Abstractive text summarization involves utilizing pre-generated word embedding information. Low-frequent but salient words such as terminologies are seldom included to dictionaries, that are so called, out-of-vocabulary(OOV) problems. OOV deteriorates the performance of Encoder-Decoder model in neural network. In order to address OOV words in abstractive text summarization, we propose a copy mechanism to facilitate copying new words in the target document and generating summary sentences. Different from the previous studies, the proposed approach combines accurate pointing information and selective copy mechanism based on bidirectional RNN and bidirectional LSTM. In addition, neural network gate model to estimate the generation probability and the loss function to optimize the entire abstraction model has been applied. The dataset has been constructed from the collection of abstractions and titles of journal articles. Experimental results demonstrate that both ROUGE-1 (based on word recall) and ROUGE-L (employed longest common subsequence) of the proposed Encoding-Decoding model have been improved to 47.01 and 29.55, respectively.

Summarization of Korean Dialogues through Dialogue Restructuring (대화문 재구조화를 통한 한국어 대화문 요약)

  • Eun Hee Kim;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.77-85
    • /
    • 2023
  • After COVID-19, communication through online platforms has increased, leading to an accumulation of massive amounts of conversational text data. With the growing importance of summarizing this text data to extract meaningful information, there has been active research on deep learning-based abstractive summarization. However, conversational data, compared to structured texts like news articles, often contains missing or transformed information, necessitating consideration from multiple perspectives due to its unique characteristics. In particular, vocabulary omissions and unrelated expressions in the conversation can hinder effective summarization. Therefore, in this study, we restructured by considering the characteristics of Korean conversational data, fine-tuning a pre-trained text summarization model based on KoBART, and improved conversation data summary perfomance through a refining operation to remove redundant elements from the summary. By restructuring the sentences based on the order of utterances and extracting a central speaker, we combined methods to restructure the conversation around them. As a result, there was about a 4 point improvement in the Rouge-1 score. This study has demonstrated the significance of our conversation restructuring approach, which considers the characteristics of dialogue, in enhancing Korean conversation summarization performance.

A Design and Implementation of Music & Image Retrieval Recommendation System based on Emotion (감성기반 음악.이미지 검색 추천 시스템 설계 및 구현)

  • Kim, Tae-Yeun;Song, Byoung-Ho;Bae, Sang-Hyun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.73-79
    • /
    • 2010
  • Emotion intelligence computing is able to processing of human emotion through it's studying and adaptation. Also, Be able more efficient to interaction of human and computer. As sight and hearing, music & image is constitute of short time and continue for long. Cause to success marketing, understand-translate of humanity emotion. In this paper, Be design of check system that matched music and image by user emotion keyword(irritability, gloom, calmness, joy). Suggested system is definition by 4 stage situations. Then, Using music & image and emotion ontology to retrieval normalized music & image. Also, A sampling of image peculiarity information and similarity measurement is able to get wanted result. At the same time, Matched on one space through pared correspondence analysis and factor analysis for classify image emotion recognition information. Experimentation findings, Suggest system was show 82.4% matching rate about 4 stage emotion condition.

Korean Part-Of-Speech Tagging by using Head-Tail Tokenization (Head-Tail 토큰화 기법을 이용한 한국어 품사 태깅)

  • Suh, Hyun-Jae;Kim, Jung-Min;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.17-25
    • /
    • 2022
  • Korean part-of-speech taggers decompose a compound morpheme into unit morphemes and attach part-of-speech tags. So, here is a disadvantage that part-of-speech for morphemes are over-classified in detail and complex word types are generated depending on the purpose of the taggers. When using the part-of-speech tagger for keyword extraction in deep learning based language processing, it is not required to decompose compound particles and verb-endings. In this study, the part-of-speech tagging problem is simplified by using a Head-Tail tokenization technique that divides only two types of tokens, a lexical morpheme part and a grammatical morpheme part that the problem of excessively decomposed morpheme was solved. Part-of-speech tagging was attempted with a statistical technique and a deep learning model on the Head-Tail tokenized corpus, and the accuracy of each model was evaluated. Part-of-speech tagging was implemented by TnT tagger, a statistical-based part-of-speech tagger, and Bi-LSTM tagger, a deep learning-based part-of-speech tagger. TnT tagger and Bi-LSTM tagger were trained on the Head-Tail tokenized corpus to measure the part-of-speech tagging accuracy. As a result, it showed that the Bi-LSTM tagger performs part-of-speech tagging with a high accuracy of 99.52% compared to 97.00% for the TnT tagger.

The Influence and Impact of syntactic-grammatical knowledge on the Phonetic Outputs of a 'Reading Machine' (통사문법적 지식이 '독서기계'의 음성출력에 미치는 영향과 중요성)

  • Hong, Sungshim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.225-230
    • /
    • 2020
  • This paper highlights the influence and the importance of the syntactic-grammatical knowledge on "the reading machine", appeared in Jackendoff (1999). Due to the lack of the detailed testing and implementation in his research, this paper tests an extensive data array using a component of Google Translate, currently available freely and most widely on the internet. Although outdated, Jackendoff's paper, "Why can't Computers use English?", argues that syntactic-grammatical knowledge plays a key role in the outputs of computers and computer-based reading machines. The current research has implemented some testings of his thought-provoking examples, in order to find out whether Google Translate can handle the same problems after two decades or so. As a result, it is argued that in the field of NLP, I-language in the sense of Chomsky (1986, 1995 etc) is real and the syntactic, grammatical, and categorial knowledge is essential in the faculty of language. Therefore, it is reassured in this paper that when it comes to human language, even the most advanced "machine" is still no match for human faculty of language, the syntactic-grammatical knowledge.

Deep learning-based speech recognition for Korean elderly speech data including dementia patients (치매 환자를 포함한 한국 노인 음성 데이터 딥러닝 기반 음성인식)

  • Jeonghyeon Mun;Joonseo Kang;Kiwoong Kim;Jongbin Bae;Hyeonjun Lee;Changwon Lim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.33-48
    • /
    • 2023
  • In this paper we consider automatic speech recognition (ASR) for Korean speech data in which elderly persons randomly speak a sequence of words such as animals and vegetables for one minute. Most of the speakers are over 60 years old and some of them are dementia patients. The goal is to compare deep-learning based ASR models for such data and to find models with good performance. ASR is a technology that can recognize spoken words and convert them into written text by computers. Recently, many deep-learning models with good performance have been developed for ASR. Training data for such models are mostly composed of the form of sentences. Furthermore, the speakers in the data should be able to pronounce accurately in most cases. However, in our data, most of the speakers are over the age of 60 and often have incorrect pronunciation. Also, it is Korean speech data in which speakers randomly say series of words, not sentences, for one minute. Therefore, pre-trained models based on typical training data may not be suitable for our data, and hence we train deep-learning based ASR models from scratch using our data. We also apply some data augmentation methods due to small data size.