• Title/Summary/Keyword: 어휘정보

Search Result 1,062, Processing Time 0.031 seconds

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.

Context-sensitive Spelling Error Correction using Eojeol N-gram (어절 N-gram을 이용한 문맥의존 철자오류 교정)

  • Kim, Minho;Kwon, Hyuk-Chul;Choi, Sungki
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1081-1089
    • /
    • 2014
  • Context-sensitive spelling-error correction methods are largely classified into rule-based methods and statistical data-based methods, the latter of which is often preferred in research. Statistical error correction methods consider context-sensitive spelling error problems as word-sense disambiguation problems. The method divides a vocabulary pair, for correction, which consists of a correction target vocabulary and a replacement candidate vocabulary, according to the context. The present paper proposes a method that integrates a word-phrase n-gram model into a conventional model in order to improve the performance of the probability model by using a correction vocabulary pair, which was a result of a previous study performed by this research team. The integrated model suggested in this paper includes a method used to interpolate the probability of a sentence calculated through each model and a method used to apply the models, when both methods are sequentially applied. Both aforementioned types of integrated models exhibit relatively high accuracy and reproducibility when compared to conventional models or to a model that uses only an n-gram.

Comparison of Readability between Documents in the Community Question-Answering (질의응답 커뮤니티에서 문서 간 이독성 비교)

  • Mun, Gil-Seong
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.25-34
    • /
    • 2020
  • Community question and answering service is one of the main sources of information and knowledge in the Web. The quality of information in question and answer documents is determined by the clarity of the question and the relevance of the answers, and the readability of a document is a key factor for evaluating the quality. This study is to measure the quality of documents used in community question and answering service. For this purpose, we compare the frequency of occurrence by vocabulary level used in community documents and measure the readability index of documents by institution of author. To measure the readability index, we used the Dale-Chall formula which is calculated by vocabulary level and sentence length. The results show that the vocabulary used in the answers is more difficult than in the questions and the sentence length is longer. The gap in readability between questions and answers is also found by writing institution. The results of this study can be used as basic data for improving online counseling services.

A Development of the Automatic Predicate-Argument Analyzer for Construction of Semantically Tagged Korean Corpus (한국어 의미 표지 부착 말뭉치 구축을 위한 자동 술어-논항 분석기 개발)

  • Cho, Jung-Hyun;Jung, Hyun-Ki;Kim, Yu-Seop
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.43-52
    • /
    • 2012
  • Semantic role labeling is the research area analyzing the semantic relationship between elements in a sentence and it is considered as one of the most important semantic analysis research areas in natural language processing, such as word sense disambiguation. However, due to the lack of the relative linguistic resources, Korean semantic role labeling research has not been sufficiently developed. We, in this paper, propose an automatic predicate-argument analyzer to begin constructing the Korean PropBank which has been widely utilized in the semantic role labeling. The analyzer has mainly two components: the semantic lexical dictionary and the automatic predicate-argument extractor. The dictionary has the case frame information of verbs and the extractor is a module to decide the semantic class of the argument for a specific predicate existing in the syntactically annotated corpus. The analyzer developed in this research will help the construction of Korean PropBank and will finally play a big role in Korean semantic role labeling.

Brain activation areas associated with L1 and L2 vocabulary retrieval and language switching (모국어와 외국어 어휘 산출과 언어 switch 에 따른 뇌 활성화 영역)

  • 남기춘;이동훈;김동휘;문양호
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2002.05a
    • /
    • pp.203-207
    • /
    • 2002
  • 본 연구에서는 한국사람이 모국어인 한국어 단어를 산출할 때와 외국어인 영어 단어를 산출할 때 관여하는 대뇌 영역을 fMRI 를 통해 조사하였다. 또한, 단일 언어를 산출할 때와 두 언어를 수시로 바꾸어서 인출할 때 관련되는 뇌 영역이 어디인지를 조사하였다. 실험에 참가한 피험자는 외국어를 공식적인 교육을 통해 12 세 근처에서 배우기 시작한 대학생이었다. 흔히 분류하는 방식으로 late learner로 구분되는 학생들이었다. 한 피험자가 세 종류의 실험 모두에 참여하였다. 피험자의 실험과제는 그림을 보고 그림에 해당되는 이름을 인출하여 말하는 과제였다. 실험 1, 2, 3 모두에서 사건관련 fMRI(event-related fMRI) 기법을 사용하였다. 실험 1에서는 그림을 보고 그림 이름에 해당되는 한국어 어휘와 외래어 어휘를 산출하게 하였다. 언어관련 뇌영역인 Wernicke 영역, Broca 영역, SMA 영역, SMG 영역 등에서 유의미한 활성화가 있었다. 실험 2 에서는 실험 1 에서 사용하지 않았던 그림을 사용하여 그림의 영어 이름과 외래어 이름을 인출하게 하였다. 외국어인 영어 단어를 산출할 때에도 모국어 단어를 산출할 때와 유사한 영역이 활성화되었다. 특히 외래어 산출 시에는 뇌 활성화 영역이 모국어와 영어 단어 산출할 때와 모국어 산출할 때 활성화되는 공통 영역이 활성화되었다. 모국어 산출과 영어 단어 산출의 차이점은 외국어 산출 시에 활성화 영역이 전반적으로 더 컸다는 것과 외국어 단어 산출 시에 Broca 영역보다 조금 밑쪽에서 그리고 모국어 단어 산출시에는 전전두엽 영역에서 더 많은 활성화가 있었다. 실험 3 에서도 실험 1 과 실험 2 에 사용하지 않았던 그림을 사용하였다. 실험 3 의 특이한 결과는 언어 switching 이 있는 경우에 전통적인 언어 영역 활성화 외에 전전두엽의 활성화가 컸다는 것이다. 아마도 언어를 바꾸어 가면서 단어를 산출하는 것이 전전두엽의 정보선택과정에 많은 영향을 주었던 것으로 해석된다. 전체적으로 어휘 산출시에 모국어 어휘, 외국어 어휘, 외래어 등을 산출할 때 공통되는 언어 영역과 언어 특성적 영역이 활성화된다고 결론지을 수 있을 것 같다.

  • PDF

A Relationship Between Korean EFL Learners' Working Memory Capacity, English Vocabulary Size, and Listening Competence (한국인 영어 학습자의 작업 기억 용량과 영어 어휘 수준 및 듣기 능력 관계 연구)

  • Yi, Koeon;Choi, Sunhee
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.365-370
    • /
    • 2021
  • The current study aims to investigate the relationship between working memory capacity, vocabulary size, and listening competence of Korean EFL (English as a Foreign Language) learners. 30 English education majors from a university in Korea were recruited. The backward digit span and the operation span tasks were used to measure the participants' working memory capacity, while the Listening Vocabulary Level Test (LVLT) and the Michigan English Test (MET) were employed to measure their vocabulary size and listening proficiency in English, respectively. The correlational analyses revealed that the bigger one's working memory storage was, the better the person processed incoming input. However, no statistically significant correlation was found between working memory capacity, English vocabulary size, and listening proficiency, possibly due to the small sample size and the homogeneous subjects.

A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel (시맨틱 구문 트리 커널을 이용한 생명공학 분야 전문용어간 관계 식별 및 분류 연구)

  • Choi, Sung-Pil;Jeong, Chang-Hoo;Chun, Hong-Woo;Cho, Hyun-Yang
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.45 no.2
    • /
    • pp.251-275
    • /
    • 2011
  • In this paper, we propose a novel kernel called a semantic parse tree kernel that extends the parse tree kernel previously studied to extract protein-protein interactions(PPIs) and shown prominent results. Among the drawbacks of the existing parse tree kernel is that it could degenerate the overall performance of PPI extraction because the kernel function may produce lower kernel values of two sentences than the actual analogy between them due to the simple comparison mechanisms handling only the superficial aspects of the constituting words. The new kernel can compute the lexical semantic similarity as well as the syntactic analogy between two parse trees of target sentences. In order to calculate the lexical semantic similarity, it incorporates context-based word sense disambiguation producing synsets in WordNet as its outputs, which, in turn, can be transformed into more general ones. In experiments, we introduced two new parameters: tree kernel decay factors, and degrees of abstracting lexical concepts which can accelerate the optimization of PPI extraction performance in addition to the conventional SVM's regularization factor. Through these multi-strategic experiments, we confirmed the pivotal role of the newly applied parameters. Additionally, the experimental results showed that semantic parse tree kernel is superior to the conventional kernels especially in the PPI classification tasks.

The Effect of the Orthographic and Phonological Priming in Korean Visual Word Recognition (한국어 시각 단어재인과정에서 음운정보와 표기정보의 역할)

  • Tae, Jini;Lee, ChangHwan;Lee, Yoonhyoung
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.1
    • /
    • pp.1-26
    • /
    • 2015
  • The purpose of this study was to examine whether the phonological information or the orthographic information plays a major role in visual word recognition. To do so, we used a non-word lexical decision task(LDT) in Experiment 1 and masked priming tasks in Experiement 2 and 3. The results of Experiment 1 showed that reaction times and the error rates were affected by the orthographic characteristics of the non-word stimuli such that orthographically similar non-words condition showed prolonged reaction times and higher error rates than control condition. In Experiment 2 and Experiment 3, the participants performed masked priming lexical decision tasks in two SOA conditions(60ms, 150ms). The results of the both experiments showed that the orthographically identical first syllable priming facilitated lexical decision of the target words while both of the pseudo-homophone priming and the phonologically identical first syllable priming did not. The dual route hypothesis(Coltheart et al, 2001), assuming that orthographic information rather than phonological information is the major source for the visual word recognition processes, fits well with the results of the current study.

A Basic Study of Verbs List for Vocabulary Learning Based on Augmented Reality (증강현실 기반 어휘 지도에서 동사 목록에 대한 기초 연구)

  • Hwang, BoMyung;Kwon, SoonBok;Kim, SeonJong;Shin, BeomJoo
    • 재활복지
    • /
    • v.21 no.2
    • /
    • pp.233-246
    • /
    • 2017
  • The present study is a basic study for application of Augmented Reality (AR) to verb teaching for children with language developmental disorders and is intended to examine validity for the list of verbs at the beginning of development. To confirm the validity of the verbs list, the appropriateness of the verbs was evaluated by three professors with certification of KSLP (Korean Speech-Language Pathologist) working in the department of Speech-Language Pathology at the university. The motion validity test was conducted by showing motion implemented as AR to eight master's students in Speech-Language Pathology major, having them record verbs that came to their mind, and evaluating in the conformity. The second motion validity test was conducted by using 5-point Likert scales to 87 undergraduates in Speech-Language Pathology major and having them see the motions in AR and marked the degrees to which them see the motions conform to the relevant verbs on the scales. Using the SPSS 21.0 program, descriptive statics analyses of the results were conducted. Through this all process, thirty verbs were selected as having content validity. It could be seen that when AR based communication system are applied, things and backgrounds that complement the insufficient movements of motions and help motion recognition should be also provided. In future studies, the 3D images of the AR based communication system will be complemented and the content validity will be verified with typically developing children and the children with language developmental disorders.

Word Sense Disambiguation of Predicate using Semi-supervised Learning and Sejong Electronic Dictionary (세종 전자사전과 준지도식 학습 방법을 이용한 용언의 어의 중의성 해소)

  • Kang, Sangwook;Kim, Minho;Kwon, Hyuk-chul;Oh, Jyhyun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.107-112
    • /
    • 2016
  • The Sejong Electronic(machine-readable) Dictionary, developed by the 21st century Sejong Plan, contains systematically organized information on Korean words. It helps to solve problems encountered in the electronic formatting of the still-commonly-used hard-copy dictionary. The Sejong Electronic Dictionary, however has a limitation relate to sentence structure and selection-restricted nouns. This paper discuses the limitations of word-sense disambiguation(WSD) that uses subcategorization information suggested by the Sejong Electronic Dictionary and generalized selection-restricted nouns from the Korean Lexico-semantic network. An alternative method that utilized semi-supervised learning, the chi-square test and some other means to make WSD decisions is presented herein.