• Title/Summary/Keyword: Compound noun

Search Result 68, Processing Time 0.025 seconds

The Incredible Shrinking Noun Phrase: Ongoing Change in Japanese Word Formation

  • Kevin Heffernan;Yusuke Imanishi
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.1
    • /
    • pp.1-23
    • /
    • 2023
  • The Japanese language, as a typical agglutinating language, permits large noun phrases (NP) containing ten or more morphemes. In this paper, we argue that the nature of the NP in Japanese is changing. Our data are drawn from the Balanced Corpus of Contemporary Written Japanese. We conduct a series of apparent-time studies of ongoing changes in complex NPs. We first examine the length of compound nouns, followed by the usage of bound suffixes. We then examine ongoing changes in complex NPs that contain genitive case markers. Finally, we examine noun incorporation. All of our studies show a trend towards shorter, less complex NPs. Furthermore, our results suggest that the usage rate of phrases that modify the noun inside the NP (compound nouns, bound nouns, NPs containing genitive case, noun incorporation) appears to be decreasing over time. On the other hand, the usage rate of modifying material outside of the NP (positional phrases, relative clauses) appears to be increasing over time. We conclude by suggesting that our results reflect a diachronic change of decreasing synthetic morphology and increasing analytic morphology. We end by pointing out the implications of this work on our understanding syntheticity and analyticity.

A Method Of Compound Noun Phrase Indexing for Resolving Syntactic Diversity (구문 다양성 해소를 위한 복합명사구 색인 방법)

  • Cho, Min-Hee;Jeong, Do-Heon
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.3
    • /
    • pp.467-476
    • /
    • 2011
  • Compound noun phrase (CNP) is important factor for semantic information process because the meaning of the CNP is more disambiguous than that of single word. However, the CNP can be expressed in various types even though it expresses same meaning. It is called syntactic diversity. It makes information system difficult to grasp sense identity. In order to resolve the syntactic diversity in this research, we propose an indexing method for compound noun phrase. The main purpose is to make identical index term for various types of CNPs which has same meaning. To do so, the research follows next steps. For the first, we make rule template and utilize the template to extract CNPs from set of domestic research papers. In general, the CNP has a unique meaning. Considering the characteristic, we suggest synthesis rules of index terms and apply the rule to CNPs extracted in previous step. For the objective performance evaluation of the research, a test set, HANTEC 2.0, was utilized and the result was compared to baseline model. Through the experiment and the evaluation, we have confirmed that the indexing method suggested in this paper could positively affect retrieval precision and improve performance of the information retrieval.

Korean Compound Noun Decomposition using Noun Bigram Model (명사 brigram 모델을 이용한 한국어 복합명사 분해)

  • Kang, Min-Kyu;Kang, Seung-Shik
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.9-14
    • /
    • 2010
  • 본 논문에서는 명사의 띄어쓰기 bigram과 단일명사 정보를 이용하여 복합명사를 분해하는 방법을 제시한다. 붙여쓰기와 띄어쓰기를 모두 허용하는 복합명사의 특징에 따라 띄어쓰기 bigram으로 후보를 선정할 경우, 분해시간과 후보의 수를 크게 줄일 수 있으며, 긴 음절의 복합명사도 bigram의 chain을 통해 빠르게 후보 조합이 가능하다. 분해 후보가 복수일 경우, 명사 간 bigram 확률을 계산하여 최적의 분해 후보를 선정한다.

  • PDF

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

  • Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1073-1089
    • /
    • 2006
  • The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

A Method for Compound Noun Extraction to Improve Accuracy of Keyword Analysis of Social Big Data

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.55-63
    • /
    • 2021
  • Since social big data often includes new words or proper nouns, statistical morphological analysis methods have been widely used to process them properly which are based on the frequency of occurrence of each word. However, these methods do not properly recognize compound nouns, and thus have a problem in that the accuracy of keyword extraction is lowered. This paper presents a method to extract compound nouns in keyword analysis of social big data. The proposed method creates a candidate group of compound nouns by combining the words obtained through the morphological analysis step, and extracts compound nouns by examining their frequency of appearance in a given review. Two algorithms have been proposed according to the method of constructing the candidate group, and the performance of each algorithm is expressed and compared with formulas. The comparison result is verified through experiments on real data collected online, where the results also show that the proposed method is suitable for real-time processing.

Segmentation of Korean Compound Nouns Using Semantic Category Analysis of Unregistered Nouns (미등록어의 의미 범주 분석을 이용한 복합명사 분해)

  • Kang Yu-Hwan;Seo Young-Hoon
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.4
    • /
    • pp.95-102
    • /
    • 2004
  • This paper proposes a method of segmenting compound nouns which include unregistered nouns into a correct combination of unit nouns using characteristics of person's names, loanwords, and location names. Korean person's name is generally composed of 3 syllables, only relatively small number of syllables is used as last names, and the second and the third syllables combination is somewhat restrictive. Also many person's names appear with clue words in compound nouns. Most loanwords have one or more syllables which cannot appear in Korean words, or have sequences of syllables different from usual Korean words. Location names are generally used with clue words designating districts in compound nouns. Use of above characteristics to analyze compound nouns not only makes segmentation more accurate, helps natural language systems use semantic categories of those unregistered nouns. Experimental results show that the precision of our method is approximately 98% on average. The precision of human names and loanwords recognition is about 94% and about 92% respectively.

  • PDF

Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information

  • Kim, Chang-Geun;Tack, Han-Ho
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.3
    • /
    • pp.306-310
    • /
    • 2004
  • This paper suggests a reverse segmentation algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns is mostly derived from Chinese characters, and it includes some preference patterns utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36,061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory results from the comparative experimentation with other algorithms. Especially, most of the four-syllable or five-syllable compound nouns were successfully segmented without fail.

A Study on the Similarity of Compound Nouns and Noun Phrases in Sentences (문장의 복합명사와 명사구의 유사정도에 대한 고찰)

  • 이태영
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1999.08a
    • /
    • pp.43-46
    • /
    • 1999
  • 문장간의 유사정도와 명사구나 복합어간에서 유사한 그룹을 식별하는 연구를 수행하였다. 명사 어구는 형태소의 대체나 생략 등으로, 문장은 절간의 전체적 일치와 부분적 일치로 유사도를 측정하였다. 유사도가 50%이상되는 경우들에 유사성을 인정하였다.

  • PDF

An Analysis of Korean Word Spacing Errors Made by Chinese Learners (중국인 한국어 학습자의 글쓰기에 나타난 띄어쓰기 오류 양상 및 지도 방향)

  • Wang, Yuan
    • Korean Educational Research Journal
    • /
    • v.40 no.1
    • /
    • pp.59-79
    • /
    • 2019
  • The purpose of this study is to analyze, through questionnaires and interviews, spacing errors in Chinese students' Korean writing and to propose changes for the teaching methods used for Chinese learners by analyzing the causes of errors. By analyzing the learners' writing samples, a total of 148 space errors were found. The rates of errors (77.6%) that were made by combining separate words is much higher than the errors (22.4%) that were made by placing a space within a compound word. Among the error types, "noun + noun," "observer (type) + dependent noun," and postpositional particle errors occur most frequently. In this paper, we propose the direction of spacing starting from the deductive side and the inductive side for nouns and investigations.

  • PDF

A Framework for Semantic Interpretation of Noun Compounds Using Tratz Model and Binary Features

  • Zaeri, Ahmad;Nematbakhsh, Mohammad Ali
    • ETRI Journal
    • /
    • v.34 no.5
    • /
    • pp.743-752
    • /
    • 2012
  • Semantic interpretation of the relationship between noun compound (NC) elements has been a challenging issue due to the lack of contextual information, the unbounded number of combinations, and the absence of a universally accepted system for the categorization. The current models require a huge corpus of data to extract contextual information, which limits their usage in many situations. In this paper, a new semantic relations interpreter for NCs based on novel lightweight binary features is proposed. Some of the binary features used are novel. In addition, the interpreter uses a new feature selection method. By developing these new features and techniques, the proposed method removes the need for any huge corpuses. Implementing this method using a modular and plugin-based framework, and by training it using the largest and the most current fine-grained data set, shows that the accuracy is better than that of previously reported upon methods that utilize large corpuses. This improvement in accuracy and the provision of superior efficiency is achieved not only by improving the old features with such techniques as semantic scattering and sense collocation, but also by using various novel features and classifier max entropy. That the accuracy of the max entropy classifier is higher compared to that of other classifiers, such as a support vector machine, a Na$\ddot{i}$ve Bayes, and a decision tree, is also shown.