• Title/Summary/Keyword: Korean Language Model

Search Result 1,580, Processing Time 0.028 seconds

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

The Pragmatics of Automatic Query Expansion Based on Search Results of Natural Language Queries (탐색결과에 근거한 자연어질의 자동확장 및 응용에 관한 연구 고찰)

  • 노정순
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.2
    • /
    • pp.49-80
    • /
    • 1999
  • This study analyses the researches on automatic query modification, expansion and combination based on search results of natural language queries and gives a conceptual framework for the factors affecting the effectiveness of the relevance feedback. The operating and experimental systems based on the vector space model, the binary independence model and the inference net model are reviewed, and it is found that the effectiveness of query expansion is affected by conceptual models, algorithms for weighting terms and documents and selecting query terms to be added, size of relevant and non-relevant documents to be used and size of terms to be added in relevance feedback, query length, type and size of DBs, etc.

  • PDF

Opacity and Presupposition Inheritance in Belief Contexts

  • Kim, Kyoung-Ae
    • Language and Information
    • /
    • v.3 no.2
    • /
    • pp.67-83
    • /
    • 1999
  • This paper attempts to provide an account for the problems of intensional opacity of referring expressions and the presupposition inheritance in the belief contexts from the discourse perspective. I discuss Jaszczolt's discourse model based on DRT to account for the belief reports. Jaszczolt analyzes referring expressions in terms of the three readings(de re, de $dicto_1$ and de $dicto_2$) and attempts to represent the differences between them in the DRS's via different anchoring modes; external anchoring, formal anchoring and nonanchoring. I propose an extended model to account for the presupposition inheritance in the belief contexts and attempt to analyze the data in Korean based on this model. The differences in the PI and in the representations of DRS's which are induced by the different complement types, ${\ldots}ko(mitta)\;and\;{\ldots}kesul(mitta)$, are discussed.

  • PDF

What L2 Learners' Processing Strategy Reveals about the Modal System in Japanese: A Cue-based Analytical Perspective

  • Tamaji, Mizuho;Horie, Kaoru
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.471-480
    • /
    • 2007
  • Japanese does not exhibit deontic-epistemic polysemy which is recognized among typologically different languages. Hence, in Japanese linguistics, it has been debated which of the two types of modality is more prototypical. This study brings Chinese learner's acquisition data of Japanese modality to bear on the question of which of the two types of modality is more prototypical, using the Competition Model (Bates and MacWhinney 1981). The Competition Model notion of 'cues' as processing strategy adopted by learners reveals the continuity/discontinuity between these two modality domains.

  • PDF

Research on a Model of Extracting Persons' Information Based on Statistic Method and Conceptual Knowledge

  • Wei, XiangFeng;Jia, Ning;Zhang, Quan;Zang, HanFen
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.508-514
    • /
    • 2007
  • In order to extract some important information of a person from text, an extracting model was proposed. The person's name is recognized based on the maximal entropy statistic model and the training corpus. The sentences surrounding the person's name are analyzed according to the conceptual knowledge base. The three main elements of events, domain, situation and background, are also extracted from the sentences to construct the structure of events about the person.

  • PDF

The Statistical Relationship between Linguistic Items and Corpus Size (코퍼스 빈도 정보 활용을 위한 적정 통계 모형 연구: 코퍼스 규모에 따른 타입/토큰의 함수관계 중심으로)

  • 양경숙;박병선
    • Language and Information
    • /
    • v.7 no.2
    • /
    • pp.103-115
    • /
    • 2003
  • In recent years, many organizations have been constructing their own large corpora to achieve corpus representativeness. However, there is no reliable guideline as to how large corpus resources should be compiled, especially for Korean corpora. In this study, we have contrived a new statistical model, ARIMA (Autoregressive Integrated Moving Average), for predicting the relationship between linguistic items (the number of types) and corpus size (the number of tokens), overcoming the major flaws of several previous researches on this issue. Finally, we shall illustrate that the ARIMA model presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, corpus representativeness and linguistic comprehensiveness.

  • PDF

YOLOv5 in ESL: Object Detection for Engaging Learning (ESL의 YOLOv5: 참여 학습을 위한 객체 감지)

  • John Edward Padilla;Kang-Hee Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.45-46
    • /
    • 2023
  • In order to improve and promote immersive learning experiences for English as a Second Language (ESL) students, the deployment of a YOLOv5 model for object identification in videos is proposed. The procedure includes collecting annotated datasets, preparing the data, and then fine-tuning a model using the YOLOv5 framework. The study's major objective is to integrate a well-trained model into ESL instruction in order to analyze the effectiveness of AI application in the field.

  • PDF

Sign Language Translation Using Deep Convolutional Neural Networks

  • Abiyev, Rahib H.;Arslan, Murat;Idoko, John Bush
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.631-653
    • /
    • 2020
  • Sign language is a natural, visually oriented and non-verbal communication channel between people that facilitates communication through facial/bodily expressions, postures and a set of gestures. It is basically used for communication with people who are deaf or hard of hearing. In order to understand such communication quickly and accurately, the design of a successful sign language translation system is considered in this paper. The proposed system includes object detection and classification stages. Firstly, Single Shot Multi Box Detection (SSD) architecture is utilized for hand detection, then a deep learning structure based on the Inception v3 plus Support Vector Machine (SVM) that combines feature extraction and classification stages is proposed to constructively translate the detected hand gestures. A sign language fingerspelling dataset is used for the design of the proposed model. The obtained results and comparative analysis demonstrate the efficiency of using the proposed hybrid structure in sign language translation.

Development of Block-based Code Generation and Recommendation Model Using Natural Language Processing Model (자연어 처리 모델을 활용한 블록 코드 생성 및 추천 모델 개발)

  • Jeon, In-seong;Song, Ki-Sang
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.3
    • /
    • pp.197-207
    • /
    • 2022
  • In this paper, we develop a machine learning based block code generation and recommendation model for the purpose of reducing cognitive load of learners during coding education that learns the learner's block that has been made in the block programming environment using natural processing model and fine-tuning and then generates and recommends the selectable blocks for the next step. To develop the model, the training dataset was produced by pre-processing 50 block codes that were on the popular block programming language web site 'Entry'. Also, after dividing the pre-processed blocks into training dataset, verification dataset and test dataset, we developed a model that generates block codes based on LSTM, Seq2Seq, and GPT-2 model. In the results of the performance evaluation of the developed model, GPT-2 showed a higher performance than the LSTM and Seq2Seq model in the BLEU and ROUGE scores which measure sentence similarity. The data results generated through the GPT-2 model, show that the performance was relatively similar in the BLEU and ROUGE scores except for the case where the number of blocks was 1 or 17.

A Problem Based Teaching and Learning Model for Scratch Programming Education (문제 중심 학습을 적용한 스크래치 프로그래밍 교수 학습 모형)

  • Bae, HakJjn;Lee, EunKyoung;Lee, YoungJun
    • The Journal of Korean Association of Computer Education
    • /
    • v.12 no.3
    • /
    • pp.11-22
    • /
    • 2009
  • Scratch, one of the educational programming languages, provides a media-rich programming environment and easy interface to users. It supports Korean language and is utilized usefully in programming classes in elementary and middle schools. However, programming causes cognitive loads to young students. Because the programming process is a complex problem solving procedure that requires logical and abstract thinking abilities. Therefore, we developed a problem based scratch programming teaching and learning model to enhance intrinsic motivation of learners and to maximize the effects of using the scratch, educational programming language. The developed problem based teaching and learning model considered elementary students' characteristics. It was implemented in fifth grade elementary school classes and the educational effects of the model was analysed. The developed model was helpful in enhancing students' problem solving potential and logical thinking abilities.

  • PDF