• Title/Summary/Keyword: 오류 어휘

Search Result 107, Processing Time 0.038 seconds

Mapping Heterogenous Ontologies for the HLP Applications - Sejong Semantic Classes and KorLexNoun 1.5 - (인간언어공학에의 활용을 위한 이종 개념체계 간 사상 - 세종의미부류와 KorLexNoun 1.5 -)

  • Bae, Sun-Mee;Im, Kyoung-Up;Yoon, Ae-Sun
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.95-126
    • /
    • 2010
  • This study proposes a bottom-up and inductive manual mapping methodology for integrating two heterogenous fine-grained ontologies which were built by a top-down and deductive methodology, namely the Sejong semantic classes (SJSC) and the upper nodes in KorLexNoun 1.5 (KLN), for HLP applications. It also discusses various problematics in the mapping processes of two language resources caused by their heterogeneity and proposes the solutions. The mapping methodology of heterogeneous fine-grained ontologies uses terminal nodes of SJSC and Least Upper Bounds (LUB) of KLN as basic mapping units. Mapping procedures are as follows: first, the mapping candidate groups are decided by the lexfollocorrelation between the synsets of KLN and the noun senses of Sejong Noun Dfotionaeci(SJND) which are classified according to SJSC. Secondly, the meanings of the candidate groups are precisely disambiguated by linguistic information provided by the two ontologies, i.e. the hierarchicllostructures, the definitions, and the exae les. Thirdly, the level of LUB is determined by applying the appropriate predicates and definitions of SJSC to the upper-lower and sister nodes of the candidate LUB. Fourthly, the mapping possibility ic inthe terminal node of SJSC is judged by che aring hierarchicllorelations of the two ontologies. Finally, the ituorrect synsets of KLN and terminologiollocandidate groups are excluded in the mapping. This study positively uses various language information described in each ontology for establishing the mapping criteria, and it is indeed the advantage of the fine-grained manual mapping. The result using the proposed methodology shows that 6,487 LUBs are mapped with 474 terminal and non-terminal nodes of SJSC, excluding the multiple mapped nodes, and that 88,255 nodes of KLN are mapped including all lower-level nodes of the mapped LUBs. The total mapping coverage is 97.91% of KLN synsets. This result can be applied in many elaborate syntactic and semantic analyses for Korean language processing.

  • PDF

Development of a Java Compiler for Verification System of DTV Contents (DTV 콘텐츠 검증 시스템을 위한 Java 컴파일러의 개발)

  • Son, Min-Sung;Park, Jin-Ki;Lee, Yang-Sun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.1487-1490
    • /
    • 2007
  • 디지털 위성방송의 시작과 더불어 본격적인 데이터 방송의 시대가 열렸다. 데이터방송이 시작 되면서 데이터방송용 양방향 콘텐츠에 대한 수요가 급속하게 증가하고 있다. 하지만 양방향 콘텐츠 개발에 필요한 저작 도구 및 검증 시스템은 아주 초보적인 수준에 머물러 있는 것이 현실이다. 그러나 방송의 특성상 콘텐츠 상에서의 오류는 방송 사고에까지 이를 수 있는 심각한 상황이 연출 될 수 있다. 본 연구 팀은 이러한 DTV 콘텐츠 개발 요구에 부응하여, 개발자의 콘텐츠 개발 및 사업자 또는 기관에서의 콘텐츠 검증이 원활이 이루어 질수 있도록 하는 양방향 콘텐츠 검증 시스템을 개발 중이다. 양방향 콘텐츠 검증 시스템은 Java 컴파일러, 디버거, 미들웨어, 가상머신, 그리고 IDE 등으로 구성된다. 본 논문에서 제시한 자바 컴파일러는 양방향 콘텐츠 검증 시스템에서 데이터 방송용 자바 애플리케이션(Xlet)을 컴파일하여 에뮬레이팅 하거나 런타임 상에서 디버깅이 가능하도록 하는 바이너리형태의 class 파일을 생성한다. 이를 위해 Java 컴파일러는 *.java 파일을 입력으로 받아 어휘 분석과 구문 분석 과정을 거친 후 SDT(syntax-directed translation)에 의해 AST(Abstract Syntax Tree)를 생성한다. 클래스링커는 생성된 AST를 탐색하여 동적으로 로딩 되는 파일들을 연결하여 AST를 확장한다. 의미 분석과정에서는 확장된 AST를 입력으로 받아 참조된 명칭의 사용이 타당한지 등을 검사하고 코드 생성이 용이하도록 AST를 변형하고 부가적인 정보를 삽입하여 ST(Semantic Tree)를 생성한다. 코드 생성 단계에서는 ST를 입력으로 받아 이미 정해 놓은 패턴에 맞추어 Bytecode를 출력한다.ovoids에서도 각각의 점들에 대한 선량을 측정하였다. SAS와 SSAS의 직장에 미치는 선량차이는 실제 임상에서의 관심 점들과 가장 가까운 25 mm(R2)와 30 mm(R3)거리에서 각각 8.0% 6.0%였고 SAS와 FWAS의 직장에 미치는 선량차이는 25 mm(R2) 와 30 mm(R3)거리에서 각각 25.0% 23.0%로 나타났다. SAS와 SSAS의 방광에 미치는 선량차이는 20 m(Bl)와 30 mm(B2)거리에서 각각 8.0% 3.0%였고 SAS와 FWAS의 방광에 미치는 선량차이는 20 mm(Bl)와 30 mm(B2)거리에서 각각 23.0%, 17.0%로 나타났다. SAS를 SSAS나 FWAS로 대체하였을 때 직장에 미치는 선량은 SSAS는 최대 8.0 %, FWAS는 최대 26.0 %까지 감소되고 방광에 미치는 선량은 SSAS는 최대 8.0 % FWAS는 최대 23.0%까지 감소됨을 알 수 있었고 FWAS가 SSAS 보다 차폐효과가 더 좋은 것으로 나타났으며 이 두 종류의 shielded applicator set는 부인암의 근접치료시 직장과 방광으로 가는 선량을 감소시켜 환자치료의 최적화를 이룰 수 있을 것으로 생각된다.)한 항균(抗菌) 효과(效果)를 나타내었다. 이상(以上)의 결과(結果)로 보아 선방활명음(仙方活命飮)의 항균(抗菌) 효능(效能)은 군약(君藥)인 대황(大黃)의 성분(成分) 중(中)의 하나인 stilbene 계열(系列)의 화합물(化合物)인 Rhapontigenin과 Rhaponticin의 작용(作用)에 의(依)한 것이며, 이는 한의학(韓醫學) 방제(方劑) 원리(原理)인 군신좌사(君臣佐使) 이론(理論)에서 군약(君藥)이 주증(主症)에 주(主)로 작용(作用)하는 약물(藥物)이라는 것을 밝혀주는 것이라고

  • PDF

Analysis of Korean Spontaneous Speech Characteristics for Spoken Dialogue Recognition (대화체 연속음성 인식을 위한 한국어 대화음성 특성 분석)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.330-338
    • /
    • 2002
  • Spontaneous speech is ungrammatical as well as serious phonological variations, which make recognition extremely difficult, compared with read speech. In this paper, for conversational speech recognition, we analyze the transcriptions of the real conversational speech, and then classify the characteristics of conversational speech in the speech recognition aspect. Reflecting these features, we obtain the baseline system for conversational speech recognition. The classification consists of long duration of silence, disfluencies and phonological variations; each of them is classified with similar features. To deal with these characteristics, first, we update silence model and append a filled pause model, a garbage model; second, we append multiple phonetic transcriptions to lexicon for most frequent phonological variations. In our experiments, our baseline morpheme error rate (WER) is 31.65%; we obtain MER reductions such as 2.08% for silence and garbage model, 0.73% for filled pause model, and 0.73% for phonological variations. Finally, we obtain 27.92% MER for conversational speech recognition, which will be used as a baseline for further study.

Examine the Features of Evidence Based Instruction in Elementary Mathematics Teacher's Guidebook For Students with Math Learning Disabilities and Students with Underachievement - Only about Number and Operations (초등 수학 교사용지도서의 학습장애 학생 및 학습부진학생을 위한 증거기반교수 요인 포함수준 분석 - 수와 연산 영역을 중심으로)

  • Kim, Byeong-Ryong
    • Journal of Elementary Mathematics Education in Korea
    • /
    • v.20 no.2
    • /
    • pp.353-370
    • /
    • 2016
  • This study examined elementary mathematics teacher's guidebook to determine the inclusion level of 11 critical features of evidence based instruction. And the inclusion level of the features in teacher's guidebook were interpreted as 'Low', 'Middle' and 'High'. The results are as followings. First, The overall inclusion level of the features in teacher's guidebook is 'Middle' The inclusion level of the features in teacher's guidebook for 1st, 2nd, 3rd and 4th were 'Middle' but for 5th and 6th were 'Low'. Second, the inclusion level of the features 'Clarity of Objective', 'Single Concepts and Skill Taught', 'Use of Manipulatives and Representation', 'Explicit Instruction', 'Provision of Examples for new concepts and skill', 'Adequate Independent Practice Opportunities' and 'Progress Monitoring' were 'Middle' The inclusion level of the features 'Review of Prerequisite Mathematical Skills', 'Error correction and Corrective Feedback' and 'Instruction of Strategies' were 'Low'. And discussed the results.

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance (한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교)

  • Cho, Heeryon;Im, Hyeonyeol;Yi, Yumi;Cha, Junwoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.133-140
    • /
    • 2022
  • We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.

The narrative inquiry on Korean Language Learners' Korean proficiency and Academic adjustment in College Life (학문 목적 한국어 학습자의 한국어 능력과 학업 적응에 관한 연구)

  • Cheong Yeun Sook
    • Journal of the International Relations & Interdisciplinary Education
    • /
    • v.4 no.1
    • /
    • pp.57-83
    • /
    • 2024
  • This study aimed to investigate the impact of scores on the Test of Proficiency in Korean (TOPIK) among foreign exchange students on academic adaptation. Recruited students, approved by the Institutional Review Board (IRB), totaled seven, and their interview contents were analyzed using a comprehensive analysis procedure based on pragmatic eclecticism (Lee, Kim, 2014), utilizing six stages. As a result, factors influencing academic adaptation of Korean language learners for academic purposes were categorized into three dimensions: academic, daily life, and psychological-emotional aspects. On the academic front, interviewees pointed out difficulties in adapting to specialized terminology and studying in their majors, as well as experiencing significant challenges with Chinese characters and Sino-Korean words. Next, from a daily life perspective, even participants holding advanced TOPIK scores faced difficulties in adapting to university life, emphasizing the necessity of practical expressions and extensive vocabulary for proper adjustment to Korean life. Lastly, within the psychological-emotional dimension, despite being advanced TOPIK holders, they were found to experience considerable stress in conversations or presentations with Koreans. Their lack of knowledge in social-cultural and everyday life culture also led to linguistic errors and contributed to psychological-emotional difficulties, despite proficiency in Korean. Based on these narratives, the conclusion was reached that in order to promote the academic adaptation of Korean language learners, it is essential to provide opportunities for Korean language learning. With this goal in mind, efforts should be directed towards enhancing learners' academic proficiency in their majors, improving Korean language fluency, and fostering interpersonal relationships within the academic community. Furthermore, the researchers suggested as a solution to implement various extracurricular activities tailored for foreign learners.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.