• 제목/요약/키워드: Lexical Dictionary

검색결과 41건 처리시간 0.024초

어휘정보구축을 위한 사전텍스트의 구조분석 및 변환 (A Structural Analysis of Dictionary Text for the Construction of Lexical Data Base)

  • 최병진
    • 한국언어정보학회지:언어와정보
    • /
    • 제6권2호
    • /
    • pp.33-55
    • /
    • 2002
  • This research aims at transforming the definition tort of an English-English-Korean Dictionary (EEKD) which is encoded in EST files for the purpose of publishing into a structured format for Lexical Data Base (LDB). The construction of LDB is very time-consuming and expensive work. In order to save time and efforts in building new lexical information, the present study tries to extract useful linguistic information from an existing printed dictionary. In this paper, the process of extraction and structuring of lexical information from a printed dictionary (EEKD) as a lexical resource is described. The extracted information is represented in XML format, which can be transformed into another representation for different application requirements.

  • PDF

Unknown Word Lexical Dictionary의 자동 생성 방법 (Automatic Construction Method of Unknown Word Lexical Dictionary)

  • 황명권;윤병수;정일용;김판구
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2008년도 춘계학술발표대회
    • /
    • pp.3-6
    • /
    • 2008
  • 본 연구는 의미적 정보 검색을 위한 연구 중의 하나로, 현재까지의 의미적 문서 검색에서 큰 걸림돌이었던 사전에 정의되지 않은 단어(Unknown Word)들의 어휘 사전(Lexical Dictionary)을 자동으로 생성하기 위한 것이다. 이를 위해 UW를 기존의 영어 어휘 사전인 워드넷(WordNet)에 정의되지 않은 단어로 간주하고, 웹 문서의 입력을 통하여 UW와 관련된 단어들을 추출하여 의미적 관련 정도를 확률적, 의미적 방법으로 측정한다. 본 논문에서는 UW Lexical Dictionary를 자동으로 구축하기 위한 방법에 대해서만 기술하였고, 정량적이고 객관적인 평가는 포함하지 않고 있다. 하지만 본 연구의 효용성을 확인하기 위한 몇 가지 문서로부터 추출된 결과는 본 연구가 상당히 의미적이며 가치가 높을 것으로 기대되고 있다.

한국어 어휘습득의 계산주의적 모델 (A Computational Model for Lexical Acquisition in Korean)

  • 유원희;박기남;류기곤;임희석;남기춘
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.135-137
    • /
    • 2007
  • This study has experimented and materialized a computational lexical processing model which hybridizes full model and decomposition model as applying lexical acquisition, one of early stages of human lexical processes, to Korean. As the result of the study, we could simulate the lexical acquisition process of linguistic input through experiments and studying, and suggest a theoretical foundation for the order of acquitting certain grammatical categories. Also, the model of this study has shown proofs with which we can infer the type of the mental lexicon of the human cerebrum through fu1l-list dictionary and decomposition dictionary which were automatically produced in the study.

  • PDF

어절 내 형태소 출현 정보와 클러스터링 기법을 이용한 어휘지식 자동 획득 (The automatic Lexical Knowledge acquisition using morpheme information and Clustering techniques)

  • 유원희;서태원;임희석
    • 컴퓨터교육학회논문지
    • /
    • 제13권1호
    • /
    • pp.65-73
    • /
    • 2010
  • 본 논문은 자연어처리 연구를 위하여 지도학습(supervised learning)방식의 어휘지식(lexical knowledge) 수동 구축 방법의 한계점을 극복하기 위하여 비지도학습(unsupervised learning)방식의 자동 어휘지식 획득 모델을 제안한다. 제안하는 모델은 벡터화, 클러스터링, 어휘지식 획득 과정을 통하여 입력으로 주어지는 어휘목록에서 어휘지식을 자동으로 획득한다. 모델의 어휘지식 획득 과정에서 파라미터 변화에 따른 어휘지식 개수의 변화와 어휘지식의 특징이 나타나는 어휘 지식 사전의 일부 모습을 보인다. 실험결과 어휘지식 중 하나로 획득되는 어휘범주 지식의 클러스터가 일정한 개수에서 수렴하는 것이 관찰되어 어휘지식을 필요로 하는 전자사전 자동구축의 가능성을 확인하였다. 또한 한국어 특성이 반영되어 좌 우 통사정보가 포함된 어휘사전을 구축하였다.

  • PDF

Extraction of Thematic Roles from Dictionary Definitions

  • Mc-Hale, Michael-L.;Myaeng, Sung-H.
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 1996년도 Language, Information and Computation = Selected Papers from the 11th Pacific Asia Conference on Language, Information and Computation, Seoul
    • /
    • pp.137-146
    • /
    • 1996
  • Our research goal has been the development of a domain independent natural language processing (NLP) system suitable for information retrieval. As part of that research, we have investigated ways to automatically extend the semantics of a lexicon derived from machine-readable lexical sources. This paper details the extraction of thematic roles derived from lexical patterns in a machine-readable dictionary.

  • PDF

한국어 용언 어절 재인에 미치는 어휘 변인의 영향 -모어 화자와 고급 학습자의 예- (The Influence of Lexical Factors on Verbal Eojeol Recognition: Evidence from L1 Korean Speakers and L2 Korean Learners)

  • 김영주;이선진;이은하;남기춘;전현애;이선영
    • 한국어교육
    • /
    • 제29권3호
    • /
    • pp.25-53
    • /
    • 2018
  • This study examined the influence of lexical factors on verbal Eojeol recognition. To meet the goal, forty-five L2 Korean learners and twenty-two Korean native speakers took Eojeol decision tasks measured with the lexical factors such as 'number of strokes', 'number of consonants and vowels', 'number of syllables', 'number of morphemes', 'whole Eojeol frequency', 'root frequency', 'first-syllable-sharing frequency', and 'number of dictionary meanings.' As a result, 'whole Eojeol frequency' was the most effective factor to predict Eojeol recognition reaction time for native speakers and L2 learners, which supports the full-list model. Other lexical factors influencing Eojeol recognition reaction time in L2 learners were different following their proficiency level.

한국어 품사 기반 온톨로지 구축 방법 및 차량 서비스 적용 방안 (Constructing Ontology based on Korean Parts of Speech and Applying to Vehicle Services)

  • 차시호;류민우
    • 디지털산업정보학회논문지
    • /
    • 제17권4호
    • /
    • pp.103-108
    • /
    • 2021
  • Knowledge graph is a technology that improves search results by using semantic information based on various resources. Therefore, due to these advantages, the knowledge graph is being defined as one of the core research technologies to provide AI-based services recently. However, in the case of the knowledge graph, since the form of knowledge collected from various service domains is defined as plain text, it is very important to be able to analyze the text and understand its meaning. Recently, various lexical dictionaries have been proposed together with the knowledge graph, but since most lexical dictionaries are defined in a language other than Korean, there is a problem in that the corresponding language dictionary cannot be used when providing a Korean knowledge service. To solve this problem, this paper proposes an ontology based on the parts of speech of Korean. The proposed ontology uses 9 parts of speech in Korean to enable the interpretation of words and their semantic meaning through a semantic connection between word class and word class. We also studied various scenarios to apply the proposed ontology to vehicle services.

基于汉语语料库的中韩词典词汇释义的准确性研究 - 以D3H1区的词汇为中心

  • 곽준화
    • 중국학논총
    • /
    • 제65호
    • /
    • pp.23-38
    • /
    • 2020
  • The dictionary is the most important tool for every Chinese learner to confirm the meaning and usage of words. Therefore, accuracy of headword's interpretation in the dictionary is crucial. This study aims to discuss the accuracy and the adequacy of headwords' interpretation in the Chinese-Korean dictionary through the Chinese corpus and Baidu. The scope of this study are 3000 words in the D3H1 region. According to the research results, the main problems of the vocabulary in this region can be divided into three categories: the first is the problem of lexical interpretation, the second is the problem of missing interpretation, and the third is other problems. In the D3H1 area, there are a total of 719 low-frequency vocabularies, and 54 headword's interpretations are not accurate or appropriate. This study is a detailed investigation and analysis of the problems of these 54 vocabularies.

A Machine Learning Approach to Korean Language Stemming

  • Cho, Se-hyeong
    • 한국지능시스템학회논문지
    • /
    • 제11권6호
    • /
    • pp.549-557
    • /
    • 2001
  • Morphological analysis and POS tagging require a dictionary for the language at hand . In this fashion though it is impossible to analyze a language a dictionary. We also have difficulty if significant portion of the vocabulary is new or unknown . This paper explores the possibility of learning morphology of an agglutinative language. in particular Korean language, without any prior lexical knowledge of the language. We use unsupervised learning in that there is no instructor to guide the outcome of the learner, nor any tagged corpus. Here are the main characteristics of the approach: First. we use only raw corpus without any tags attached or any dictionary. Second, unlike many heuristics that are theoretically ungrounded, this method is based on statistical methods , which are widely accepted. The method is currently applied only to Korean language but since it is essentially language-neutral it can easily be adapted to other agglutinative languages.

  • PDF