• Title/Summary/Keyword: word dictionary

Search Result 277, Processing Time 0.025 seconds

The Review about the Development of Korean Linguistic Inquiry and Word Count (언어적 특성을 이용한 '심리학적 한국어 글분석 프로그램(KLIWC)' 개발 과정에 대한 고찰)

  • Lee Chang H.;Sim Jung-Mi;Yoon Aesun
    • Korean Journal of Cognitive Science
    • /
    • v.16 no.2
    • /
    • pp.93-121
    • /
    • 2005
  • Substantial amounts of research have been accumulated by the attempt to use linguistic styles as the dependent measure in conducting psychological research. This research was condoned to develope a Korean text analysis program(KLIWC) based on the English text analysis program, LIWC(Linguistic Inquiry and Word Count), and the program reflects the Korean linguistic characteristics and culture that is related with language. We made it possible to analyze agglutinative phrase of many morphemes by linguistic tagging, and basic form dictionary and inflection rule were built. In addition, the face-saving weeds and emotional words were included as the analysis variables. The process of development and characteristics of Korean text analysis have been reviewed, and future direction for the improvement of the program has been discussed.

  • PDF

Corpus-Based Ontology Learning for Semantic Analysis (의미 분석을 위한 말뭉치 기반의 온톨로지 학습)

  • 강신재
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.1
    • /
    • pp.17-23
    • /
    • 2004
  • This paper proposes to determine word senses in Korean language processing by corpus-based ontology learning. Our approach is a hybrid method. First, we apply the previously-secured dictionary information to select the correct senses of some ambiguous words with high precision, and then use the ontology to disambiguate the remaining ambiguous words. The mutual information between concepts in the ontology was calculated before using the ontology as knowledge for disambiguating word senses. If mutual information is regarded as a weight between ontology concepts, the ontology can be treated as a graph with weighted edges, and then we locate the least weighted path from one concept to the other concept. In our practical machine translation system, our word sense disambiguation method achieved a 9% improvement over methods which do not use ontology for Korean translation.

  • PDF

Target Word Selection for English-Korean Machine Translation System using Multiple Knowledge (다양한 지식을 사용한 영한 기계번역에서의 대역어 선택)

  • Lee, Ki-Young;Kim, Han-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.75-86
    • /
    • 2006
  • Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the translation accuracy of machine translation systems. In this paper, we present a new approach to select Korean target word for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean local context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. The experiment showed promising results for diverse sentences from web documents.

  • PDF

A Study on the Inter-constructive Design Dictionary through the Internet. (인터넷을 통한 상호구축적 디자인 용어사전의 연구)

  • 김태균
    • Archives of design research
    • /
    • v.14 no.4
    • /
    • pp.25-33
    • /
    • 2001
  • With the increasing access to the internet, the number of designers who rely on internet to use information on design is on the rise. Therefore common dictionary of design terminology need to be formed and shared among designers. To do so, internet is very useful medium. However as relating terminology increases rapidly through interactivity among designers, it will be far from taking full advantage of features of internet to set up and provide such information unilaterally on internet. This indicates that providing data on the internet, not via traditional books, requires in-depth study on process of establishment of database structure and appropriate interface design. Thus this study will show design terms database model that harnesses internet feature that enables establishment of information spontaneously through user's interactivity, departing from a model that conveys information unilaterally. This report summarized and analyzed various models and suggested classification system in accordance with user's learning cognition. Problems on existing dictionary of design terminology were identified and new methods addressing such problems were exploited. In a word, this report is intended to propose user oriented inter-constructive database model that highlights high level of openness and interactivity by enabling changes of text in the cyber space and encouraging user to participate in making design dictionary.

  • PDF

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

Database metadata standardization processing model using web dictionary crawling (웹 사전 크롤링을 이용한 데이터베이스 메타데이터 표준화 처리 모델)

  • Jeong, Hana;Park, Koo-Rack;Chung, Young-suk
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.209-215
    • /
    • 2021
  • Data quality management is an important issue these days. Improve data quality by providing consistent metadata. This study presents algorithms that facilitate standard word dictionary management for consistent metadata management. Algorithms are presented to automate synonyms management of database metadata through web dictionary crawling. It also improves the accuracy of the data by resolving homonym distinction issues that may arise during the web dictionary crawling process. The algorithm proposed in this study increases the reliability of metadata data quality compared to the existing passive management. It can also reduce the time spent on registering and managing synonym data. Further research on the new data standardization partial automation model will need to be continued, with a detailed understanding of some of the automatable tasks in future data standardization activities.

Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs (Bidirectional LSTM CRF 기반의 개체명 인식을 위한 단어 표상의 확장)

  • Yu, Hongyeon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.3
    • /
    • pp.306-313
    • /
    • 2017
  • Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.

A Study on the Transformation and Issue of the Japanese-Chinese Word 'Library' (화제한어 '도서관' 명칭의 변용과 쟁점에 관한 연구)

  • Hee-Yoon Yoon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.1
    • /
    • pp.23-44
    • /
    • 2023
  • The word library(図書館) is a Japanese translation of the Western library or Bibliothek in the mid-Meiji period. This word has been accepted in Chinese(图书馆), Taiwan(圖書館), Korea(도서관), and Vietnam(Dđồ thư quán), which are Chinese-speaking countries. If so, when and who first introduced the term library to Japan and China? In Japan, the enlightenment thinker Fukuzawa's 『Seiyo Jijo, 1866』 is regarded as the first document to introduce the Western library, and in China, the article published in 『Qing Yi Bao, 1896』 by the reformed thinker Liang Qichao referred to as the first example. Therefore, this study traced and demonstrated the time and person in which the word library appeared, focusing on modern dictionaries, books, translations, papers, and newspaper articles that were introduced in both countries. As a result, the theory of the introduction to Fukuzawa in 1866 is wrong because Western libraries are described in various terms in many diaries and dictionaries, including Motoki's 『An English Japanese Dictionary of the Spoken Language, 1814』. Also, in China, the theory of introduction of Liang Qichao in 1896 is not true because the term library first appeared in Ryu Jeong-dam's 『A Dictionary of Loan Words and Hybrid Words in Chinese, 1884』. In the same context, it is necessary to trace and argue the history of the first use of the term library in Korea and the name of the first library in Korea established by the Busan Branch of the Japan Hongdo Association in 1901.

Cognitive Dictionaries Inferred from Word Associations (인지어휘 유형개념)

  • Tieszen, Helen R.
    • Korean Journal of Child Studies
    • /
    • v.5
    • /
    • pp.47-52
    • /
    • 1984
  • 인지 어휘 유형(cognitive dictionary)이란 단어 연상의 반응 어휘를 인지 유형에 따라 분류, 분석하는 것을 가리킨다. 인지 어휘 유형 개념을 McNeill의 언어 발달 연구에 준하여 논의하였다. 즉 아동의 어의(語義) 발달은 자작문(自作文) 형식(形式) 표현에서 시작되어 어휘 사용에 이른다는 것이다. 한편 Moran은 범세계적으로 유아들의 인지 어휘 유형은 단어의 동작적(動作的) 특성에 주로 의거한다는 것을 발견했는데 이는 언어의 효시에 관한 Piaget 나 Bruner의 이론과 일치하는 것이다. Moran의 인지 어휘 유형의 추가 개념은 Bruner의 심상(心象)(ikonic representation)에 의한 관계, 기능적 관계 (functional representation), 논리적(logical)관계를 포함한 단어의 연합 관계에 반영시켰다.

  • PDF

Extension Sejong Electronic Dictionary Using Word Embedding (워드 임베딩을 이용한 세종 전자사전 확장)

  • Park, Da-Sol;Cha, Jeong-Won
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.75-78
    • /
    • 2016
  • 본 논문에서는 워드 임베딩과 유의어를 이용하여 세종 전자사전을 확장하는 방법을 제시한다. 세종 전자사전에 나타나지 않은 단어에 대해 의미 범주 할당의 시스템 성능은 32.19%이고, 확장한 의미 범주 할당의 시스템 성능은 51.14%의 성능을 보였다. 의미 범주가 할당되지 않은 새로운 단어에 대해서도 논문에서 제안한 방법으로 의미 범주를 할당하여 세종 전자사전의 의미 범주 단어 확장에 대해 도움이 됨을 증명하였다.

  • PDF