• Title/Summary/Keyword: Compound words

Search Result 97, Processing Time 0.03 seconds

A Deterministic Method for Structural Analysis of Compound Words in Japanese

  • Han, Dongli;Ito, Takeshi;Furugori, Teiji
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.79-91
    • /
    • 2002
  • Structural analysis of compound words is necessary and an important process in natural language processing. Proposed here is a corpus- and statistics- based method for the structural analysis of compound words in Japanese. We determine the structure of a compound word by using Internet corpus and calculating the strength of word association among its constituent words. Experiments with 5, 6, 7, and 8 kanji compound words show that our method works well and its performance is better than those of other comparable studies.

  • PDF

Segmentation of Korean Compound Nouns Using Semantic Category Analysis of Unregistered Nouns (미등록어의 의미 범주 분석을 이용한 복합명사 분해)

  • Kang Yu-Hwan;Seo Young-Hoon
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.4
    • /
    • pp.95-102
    • /
    • 2004
  • This paper proposes a method of segmenting compound nouns which include unregistered nouns into a correct combination of unit nouns using characteristics of person's names, loanwords, and location names. Korean person's name is generally composed of 3 syllables, only relatively small number of syllables is used as last names, and the second and the third syllables combination is somewhat restrictive. Also many person's names appear with clue words in compound nouns. Most loanwords have one or more syllables which cannot appear in Korean words, or have sequences of syllables different from usual Korean words. Location names are generally used with clue words designating districts in compound nouns. Use of above characteristics to analyze compound nouns not only makes segmentation more accurate, helps natural language systems use semantic categories of those unregistered nouns. Experimental results show that the precision of our method is approximately 98% on average. The precision of human names and loanwords recognition is about 94% and about 92% respectively.

  • PDF

Effective Thematic Words Extraction from a Book using Compound Noun Phrase Synthesis Method

  • Ahn, Hee-Jeong;Kim, Kee-Won;Kim, Seung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.107-113
    • /
    • 2017
  • Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result.

Compound Noun Decomposition by using Syllable-based Embedding and Deep Learning (음절 단위 임베딩과 딥러닝 기법을 이용한 복합명사 분해)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.74-79
    • /
    • 2019
  • Traditional compound noun decomposition algorithms often face challenges of decomposing compound nouns into separated nouns when unregistered unit noun is included. It is very difficult for those traditional approach to handle such issues because it is impossible to register all existing unit nouns into the dictionary such as proper nouns, coined words, and foreign words in advance. In this paper, in order to solve this problem, compound noun decomposition problem is defined as tag sequence labeling problem and compound noun decomposition method to use syllable unit embedding and deep learning technique is proposed. To recognize unregistered unit nouns without constructing unit noun dictionary, compound nouns are decomposed into unit nouns by using LSTM and linear-chain CRF expressing each syllable that constitutes a compound noun in the continuous vector space.

Modifiers and Compound Sentences Processing of a Korean-Japanese Machine Translation System (한국어-일본어 기계번역 시스템의 수식어 처리와 중문처리)

  • Joo, I.S.;Paik, M.H.;Jin, J.H.;Lim, S.T.;Lim, I.C.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1046-1049
    • /
    • 1987
  • This paper proposes a Korean-Japanese Machine Translation System that processes unregistered words, modifiers and compound sentences. In mophological analysis, the unregistered words are processed by using unregistered word processing algorithm. The modifiers are processed by consulting noun-attributes and grammar rules. The compound sentence processing algorithm recognizes whether the sentence that includes commas is compound sentence or not. This system performs on IBM-PC/AT DOS using Prolog-1.

  • PDF

Automatic Recognition of Translation Phrases Enclosed with Parenthesis in Korean-English Mixed Documents (한영 혼용문에서 괄호 안 대역어구의 자동 인식)

  • Lee, Jae-Sung;Seo, Young-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.445-452
    • /
    • 2002
  • In Korean-English mixed documents, translated technical words are usually used with the attached full words or original words enclosed with parenthesis. In this paper, a collective method is presented to recognize and extract the translation phrases with using a base translation dictionary. In order to process the unregistered title words and translation words in the dictionary, a phonetic similarity matching method, a translation partial matching method, and a compound word matching method are newly proposed. The experiment result of each method was measured in F-measure(the alpha is set to 0.4) ; exact matching of dictionary terms as a baseline method showed 23.8%, the hybrid method of translation partial matching and phonetic similarity matching 75.9%, and the compound word matching method including the hybrid method 77.3%, which is 3.25 times better than the baseline method.

Processing of Korean Compounds with Saisios (사이시옷이 단어 재인에 미치는 영향)

  • Bae, Sung-Bong;Yi, Kwang-Oh
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.349-366
    • /
    • 2012
  • Two experiments were conducted to examine the processing of Korean compounds in relation to saisios. Saisios is a letter interposed between constituents when a phonological change takes place on the onset of the first syllable of the second constituent. This saisios rule is often violated by writers, resulting in many words having two spellings: one with saisios and the other without saisios. Among two spellings, some words are more familiar with saisios, some are usually spelled without saisios, and some are balanced. In Experiment 1 using the go/no-go lexical decision task, participants were asked to judge compounds with/without saisios. Saisios-dominant words (나뭇잎 > 나무잎) were responded faster when they appeared with saisios, whereas the opposite was true for words that usually appear without saisios (북엇국 < 북어국). In experiment 2, we presented participants compound words that were balanced on saisios. The results showed that words without saisios were responded faster than words with saisios. To summarize, the results of Experiment 1 and 2 were consistent with the APPLE model. Some problems related to the saisios rule were discussed in terms of reading process.

  • PDF

A Study of the New Chinese Words Under the Influence of Culture Content (문화 콘텐츠 영향의 신조 중국어 고찰)

  • Meng, Xiang-Shan;Lee, Kwang-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.131-142
    • /
    • 2019
  • This paper is intended to examine and analyze the new Chinese words as the result of culture content. The development of the Korean entertainment industry has created a Korean wave around the world. Through this, many Korean words, Internet vocabulary, and cultural concepts have begun to enter China. Among them, there are many new words that have appeared on the Chinese Internet due to the culture content. As the number of Korean fans and Korean learners increases, new words on the Internet are widely used. The new Chinese words, which are influenced by Korean cultural content, are considered an important part of new Chinese vocabulary. To accurately recognize and understand this, first of all six categories of the new Chinese words were analyzed, which were figurative meaning, substitution, loan of foreign words, abbreviation, compound word, derivation. This formulation also works on the Chinese words with the influence of cultural content. There are three types of the Internet new words form Korean cultural. Which were new words in Chinese characters, new words in alphabets, extended meanings. And had analyzed new words through the acquisition of new meanings. Also took specific news titles and songs according to each category. Through new Chinese words, The influence of cultural content had been confirmed. It is expected that these new Chinese words enrich Chinese vocabulary, also help to facilitate communication. And these new Chinese words are often used in public media or in everyday life. We should recognize the existence of these new Chinese words, and have an accurate perception of them.

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

A Comparative Study on New Words of Korean and Chinese According to Changes in Popular Culture Contents (대중문화 콘텐츠 변화에 따른 한중 신조어 비교 연구)

  • Meng, Xiang-Shan;Lee, Kwang-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.6
    • /
    • pp.125-137
    • /
    • 2020
  • The purpose of this study is to analyze new words in Korean and Chinese based on changes in popular culture. As China and Korea embrace increasingly close communication in recent years, their languages have influenced each other. A lot of new Korean and Chinese words have been discovered to have the same linguistic characteristics. New words are considered as new developments of a language. They are welcomed and widely used by young people in Korea and China. Therefore, in terms of the communicative function of languages, it is worthwhile to understand new words in Korean and Chinese from the perspective of academic research. This study takes Chinese words created in 2018 as the research object. Firstly, a morphological and semantic comparison of Chinese words created in 2018 and those created in 2017 is carried out to extract the characteristic indicators of Chinese words created in 2018, with emphasis on compound words, abbreviations, substitutions, patters and rhetorical expressions. Secondly, the similarities and differences of these Chinese words with Korean words created in 2018 in terms of morphology are analyzed. Finally, after conducting sample classification and comparison, the characteristics of new Chinese and Korean words and the interaction mechanism under mutual influence are concluded. According to the study, the majority of the new words are created on the basis of existing words. Thus, it is important to explore the morphology of new words as a standard language.