• Title/Summary/Keyword: Korean nouns

Search Result 232, Processing Time 0.024 seconds

Lexical Mismatches between English and Korean: with Particular Reference to Polysemous Nouns and Verbs

  • Lee, Yae-Sheik
    • Language and Information
    • /
    • v.4 no.1
    • /
    • pp.43-65
    • /
    • 2000
  • Along with the flourishign development of computational linguistics, research on the meanings of individual words has started to resume. Polyusemous words are especially brought into focus since their multiple senses have placed a real challenge to linguists and computer scientists. This paper mainly concerns the following three questions with regard to the treatments of such polysemous nouns and verbs in English and Korean. Firstly, what types of information should be represented in individual lexical entries for those polysemous words\ulcorner Secondly, how different are corresponding polysemous lexical entries in both languages\ulcorner Thirdly, what does a mental lexicon look like with regard to polysemous lexical entries\ulcorner For the first and second questions, Pustejosky's (1995) Generative Lexicon Theory (hereafter GLT) will be discussed in detail: the main focus falls on developing alternative way of representing (polysemous) lexical entries. For the third question, a brief discussion is made on mapping between concepts and their lexicalizations. Furthermore, a conceptual graph around conept 'bake' is depicted in terms of Sowa(2000)

  • PDF

Analysis Disambiguation of Compound Nouns by Using the Semantic Information of Nouns in Korean (명사의 의미 정보를 이용한 복합명사 분석의 중의성 해소)

  • Kang, Yu-Hwan;Jang, Cheon-Young;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.171-175
    • /
    • 2002
  • 접사 처리는 복합명사 분석에서 중요한 문제인데 접사가 복합명사에 포함되어 있을 경우 여러 중의적 형태로의 분석이 가능하고 또한 미등록어 문제를 발생시킬 수 있기 때문이다. 단순한 접사 사전 정보만으로는 효율적인 분석을 수행할 수 없으므로 추가적인 정보가 필요하다. 본 논문에서는 접사로 인한 복합명사의 분석 중의성을 해소하기 위하여 명사의 의미 정보를 이용하는 방법에 대해 제안한다. 명사 의미 정보는 시소러스의 의미계층 정보로 최상위 계층 정보와 하위 4계층의 정보로 구성된다. 명사+접미사 형태의 의미 결합 정보를 구한 추, 접미사를 포함하는 복합명사의 단위 명사들 간의 의미 결합 정보를 구한다. 이렇게 구해진 명사들 간의 의미 결합 정보는 사전 정보에 추가되며 접사로 인한 중의적 분석 문제가 발생할 경우 명사들 간의 결합 정보를 이용하여 올바른 분석 후보를 선택한다.

  • PDF

Intelligent Wordcloud Using Text Mining (텍스트 마이닝을 이용한 지능적 워드클라우드)

  • Kim, Yeongchang;Ji, Sangsu;Park, Dongseo;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.325-326
    • /
    • 2019
  • This paper proposes an intelligent word cloud by improving the existing method of representing word cloud by examining the frequency of nouns with text mining technique. In this paper, we propose a method to visually show word clouds focused on other parts, such as verbs, by effectively adding newly-coined words and the like to a dictionary that extracts noun words in text mining. In the experiment, the KoNLP package was used for extracting the frequency of existing nouns, and 80 new words that were not supported were added manually by examining frequency.

  • PDF

Proper Noun Embedding Model for the Korean Dependency Parsing

  • Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.93-102
    • /
    • 2022
  • Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network (U-WIN을 이용한 한국어 복합명사 분해 및 의미태깅 시스템)

  • Lee, Yong-Hoon;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.63-76
    • /
    • 2012
  • We propose a Korean compound noun semantic tagging system using statistical compound noun decomposition and semantic relation information extracted from a lexical semantic network(U-WIN) and dictionary definitions. The system consists of three phases including compound noun decomposition, semantic constraint, and semantic tagging. In compound noun decomposition, best candidates are selected using noun location frequencies extracted from a Sejong corpus, and re-decomposes noun for semantic constraint and restores foreign nouns. The semantic constraints phase finds possible semantic combinations by using origin information in dictionary and Naive Bayes Classifier, in order to decrease the computation time and increase the accuracy of semantic tagging. The semantic tagging phase calculates the semantic similarity between decomposed nouns and decides the semantic tags. We have constructed 40,717 experimental compound nouns data set from Standard Korean Language Dictionary, which consists of more than 3 characters and is semantically tagged. From the experiments, the accuracy of compound noun decomposition is 99.26%, and the accuracy of semantic tagging is 95.38% respectively.

An Analyses of the Terms used in the Information Boards of Geosites at Jeonbuk West Coast National Geopark (전북 서해안권 국가지질공원 지질명소 안내 표지판에 사용된 용어 분석)

  • Shin, Young-Jun;Cho, Kyu-Seong
    • Journal of the Korean earth science society
    • /
    • v.41 no.1
    • /
    • pp.40-47
    • /
    • 2020
  • The purpose of this study was to analyze the terms used in the Information Boards of Geosites at Jeonbuk West Coast National Geopark. Among the terms used in the Information Boards, nouns were extracted and listed based on the Standard Korean Language Dictionary, a glossary of earth and the data for the development of textbooks according to the 2015 revision of curriculum, by which eight types were classified. Seventy-one nouns (10.8%) of the extracted terms were not listed in any glossary. Most of these terms were compound words derived by combining [noun]+[noun] or [noun]+[affix] so that they were not easy to comprehend. In addition, two hundred fifty-six nouns (46%) of the terms were identified as jargons used in specific disciplines. Therefore, it is strongly suggested that when creating the National Geopark Information Boards, the academic jargon embedded terminologies be explained with annotation for general public visitors and students to understand without difficulty.

Korean Noun Extractor using Occurrence Patterns of Nouns and Post-noun Morpheme Sequences (한국어 명사 출현 특성과 후절어를 이용한 명사추출기)

  • Park, Yong-Hyun;Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.919-927
    • /
    • 2010
  • Since the performance of mobile devices is recently improved, the requirement of information retrieval is increased in the mobile devices as well as PCs. If a mobile device with small memory uses a tradition language analysis tool to extract nouns from korean texts, it will impose a burden of analysing language. As a result, the need for the language analysis tools adequate to the mobile devices is increasing. Therefore, this paper proposes a new method for noun extraction using post-noun morpheme sequences and noun patterns from a large corpus. The proposed noun extractor has only the dictionary capacity of 146KB and its performance shows 0.86 $F_1$-measure; the capacity of noun dictionary corresponds to only the 4% capacity of the existing noun extractor with a POS tagger. In addition, it easily extract nouns for unknown word because its dependence for noun dictionaries is low.

An LSTM Method for Natural Pronunciation Expression of Foreign Words in Sentences (문장에 포함된 외국어의 자연스러운 발음 표현을 위한 LSTM 방법)

  • Kim, Sungdon;Jung, Jaehee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.4
    • /
    • pp.163-170
    • /
    • 2019
  • Korea language has postpositions such as eul, reul, yi, ga, wa, and gwa, which are attached to nouns and add meaning to the sentence. When foreign notations or abbreviations are included in sentences, the appropriate postposition for the pronunciation of the foreign words may not be used. Sometimes, for natural expression of the sentence, two postpositions are used with one in parentheses as in "eul(reul)" so that both postpositions can be acceptable. This study finds examples of using unnatural postpositions when foreign words are included in Korean sentences and proposes a method for using natural postpositions by learning the final consonant pronunciation of nouns. The proposed method uses a recurrent neural network model to naturally express postpositions connected to foreign words. Furthermore, the proposed method is proven by learning and testing with the proposed method. It will be useful for composing perfect sentences for machine translation by using natural postpositions for English abbreviations or new foreign words included in Korean sentences in the future.