• Title/Summary/Keyword: Compound Word

Search Result 107, Processing Time 0.024 seconds

An n-gram-based Indexing Method for Effective Retrieval of Hangul Texts (한글 문서의 효과적인 검색을 위한 n-gram 기반의 색인 방법)

  • 이준호;안정수;박현주;김명호
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.1
    • /
    • pp.47-63
    • /
    • 1996
  • Conventional automatic indexing methods for Hangul texts can be classified into two groups as follows: One is to extract index terms by removing non-indexable segments from word-phrases, and the other is to generate index terms from the morphemes of word-phrases. The former suffers from the problem of word boundaries when documents contain many compound nouns. The latter can overcome the word boundary problem by extracting simple nouns, but has many overheads to develop a lot of linguistic knowledges needed in the indexing procedure. In this paper we propose a new indexing method based on n-grams. This method alleviates the problems of previous indexing methods related with word boundaries and linguistic knowledges. We also compare the effectiveness of the n-gram based indexing method with that of the previous ones.

  • PDF

Nominal Compound Analysis Using Statistical Information and WordNet (통계정보와 WordNet을 이용한 복합명사 분석)

  • Lyu, Min-Hong;Ra, Dong-Yul;Jang, Myung-Gil
    • Annual Conference on Human and Language Technology
    • /
    • 2000.10d
    • /
    • pp.33-40
    • /
    • 2000
  • 복합명사의 한 구조는 구성 명사간의 수식관계의 집합이라고 본다. 한 복합명사에 대하여 가능한 여러 구조 중에서 올바른 구조를 알아 내는 것이 본 논문의 목표이다. 이를 위하여 우리는 최근에 유행하는 통계 기반 분석 기법을 이용한다. 먼저 우리의 복합 명사 분석 문제에 알맞은 통계 모델을 개발하였다. 이 모델을 이용하면 분석하려는 복합명사의 가능한 분석 구조마다 확률 값을 얻게 된다. 그 다음 가능한 구조들 중에서 가장 확률값이 큰 구조를 복합명사의 구조로 선택한다. 통계 기반 기법에서 항상 문제가 되는 것이 데이터 부족문제이다. 우리는 이를 해결하기 위해 개념적 계층구조의 하나인 워드넷(WordNet)을 이용한다.

  • PDF

A Reverse Segmentation Algorithm of Compound Nouns Using Affix Information and Preference Pattern (접사정보 및 선호패턴을 이용한 복합명사의 역방향 분해 알고리즘)

  • Ryu, Bang;Baek, Hyun-Chul;Kim, Sang-Bok
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.3
    • /
    • pp.418-426
    • /
    • 2004
  • This paper suggests a reverse segmentation Algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns are mostly derived from the Chinese characters and it includes some preference patterns, which are going to be utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory result from the comparative experimentation with other algorithm, especially most of the four or five-syllable compound nouns were successfully segmented without fail.

  • PDF

Concept-based Compound Keyword Extraction (개념기반 복합키워드 추출방법)

  • Lee, Sangkon;Lee, Taehun
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.23-31
    • /
    • 2003
  • In general, people use a key word or a phrase as the name of field or subject word in document. This paper has focused on keyword extraction. First of all, we investigate that an author suggests keywords that are not occurred as contents words in literature, and present generation rules to combine compound keywords based on concept of lexical information. Moreover, we present a new importance measurement to avoid useless keywords that are not related to documents' contents. To verify the validity of extraction result, we collect titles and abstracts from research papers about natural language and/or voice processing studies, and obtain the 96% precision in a top rank of extraction result.

  • PDF

Two Statistical Models for Automatic Word Spacing of Korean Sentences (한글 문장의 자동 띄어쓰기를 위한 두 가지 통계적 모델)

  • 이도길;이상주;임희석;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.358-371
    • /
    • 2003
  • Automatic word spacing is a process of deciding correct boundaries between words in a sentence including spacing errors. It is very important to increase the readability and to communicate the accurate meaning of text to the reader. The previous statistical approaches for automatic word spacing do not consider the previous spacing state, and thus can not help estimating inaccurate probabilities. In this paper, we propose two statistical word spacing models which can solve the problem of the previous statistical approaches. The proposed models are based on the observation that the automatic word spacing is regarded as a classification problem such as the POS tagging. The models can consider broader context and estimate more accurate probabilities by generalizing hidden Markov models. We have experimented the proposed models under a wide range of experimental conditions in order to compare them with the current state of the art, and also provided detailed error analysis of our models. The experimental results show that the proposed models have a syllable-unit accuracy of 98.33% and Eojeol-unit precision of 93.06% by the evaluation method considering compound nouns.

Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network (U-WIN을 이용한 한국어 복합명사 분해 및 의미태깅 시스템)

  • Lee, Yong-Hoon;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.63-76
    • /
    • 2012
  • We propose a Korean compound noun semantic tagging system using statistical compound noun decomposition and semantic relation information extracted from a lexical semantic network(U-WIN) and dictionary definitions. The system consists of three phases including compound noun decomposition, semantic constraint, and semantic tagging. In compound noun decomposition, best candidates are selected using noun location frequencies extracted from a Sejong corpus, and re-decomposes noun for semantic constraint and restores foreign nouns. The semantic constraints phase finds possible semantic combinations by using origin information in dictionary and Naive Bayes Classifier, in order to decrease the computation time and increase the accuracy of semantic tagging. The semantic tagging phase calculates the semantic similarity between decomposed nouns and decides the semantic tags. We have constructed 40,717 experimental compound nouns data set from Standard Korean Language Dictionary, which consists of more than 3 characters and is semantically tagged. From the experiments, the accuracy of compound noun decomposition is 99.26%, and the accuracy of semantic tagging is 95.38% respectively.

HMM-based Korean Named Entity Recognition (HMM에 기반한 한국어 개체명 인식)

  • Hwang, Yi-Gyu;Yun, Bo-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.229-236
    • /
    • 2003
  • Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

A Study on Ma Je Kai Shi(麻帝核試) (麻帝核試의 硏究)

  • 김진구
    • The Research Journal of the Costume Culture
    • /
    • v.5 no.4
    • /
    • pp.6-11
    • /
    • 1997
  • The purpose of this study was to identify and to trace the source of origins of 麻帝核試 that appears in Kei Rim Yu Sa(鷄林類事). Comparative liguistic analytical approaches ware employed for this study. Results of this study revealed that madi(마디) survives as a dialect for m ∂ri[머리(頭)] in Kyung Sang Province Thus, it si considered that the dialect madi(마디) is a survival of 마디(麻帝) of Koryo. Similar words to 核試 of Koryo were found in Hebrew and Japanese : Heb. k-u-tsi(zi) means locks of hair and Japanese ku-shi(くシ) has several meanings : comb, head, and the hair of the head. The word 麻帝核試 of Koryo is a compound ward of madi(麻帝), head and k ∂ shi(그시) 核試 locks of hair(hair of the head). 核試 of Koryo , Jao. ku shi(くシ), and Heb. k-u-tsi(zi) showed close relationships to one another. The word ku shi(si) 그시 核試 was derived from Heb. k-u-tsi(zi) and Jap. ku shi(くシ) was originated from 核試 of Koryo. Kor. ku shi(si) 그시 核試 is a transliteration of Heb. k-u-tsi(zi) and Jap. ku shi(くシ) is a trans-literation of Kor. ku shi 그시 核試.

  • PDF

A Method for Compound Noun Extraction to Improve Accuracy of Keyword Analysis of Social Big Data

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.55-63
    • /
    • 2021
  • Since social big data often includes new words or proper nouns, statistical morphological analysis methods have been widely used to process them properly which are based on the frequency of occurrence of each word. However, these methods do not properly recognize compound nouns, and thus have a problem in that the accuracy of keyword extraction is lowered. This paper presents a method to extract compound nouns in keyword analysis of social big data. The proposed method creates a candidate group of compound nouns by combining the words obtained through the morphological analysis step, and extracts compound nouns by examining their frequency of appearance in a given review. Two algorithms have been proposed according to the method of constructing the candidate group, and the performance of each algorithm is expressed and compared with formulas. The comparison result is verified through experiments on real data collected online, where the results also show that the proposed method is suitable for real-time processing.

Effects of FIN-TECH use motivation on User Attitude and Word Of Mouth Intention: Focus on a Innovation Resistance Tendency and Type of Message (Rational, Emotional) (핀테크 이용 동기에 따른 이용자 태도와 구전의도의 관계 - 혁신저항과 메시지 유형의 조절효과 -)

  • Seol, Sang-Cheol;Jung, Sung-Gwang;Choi, Woo-Young
    • Management & Information Systems Review
    • /
    • v.36 no.5
    • /
    • pp.195-222
    • /
    • 2017
  • Today's economy is becoming more and more convergence between different industries as the demarcation of the boundaries of all areas is leading to innovations such as mobile and social network services. So Fin-tech is a new technology that can combine the advantages of mobile and Internet with the technology revolution to easily handle financial and IT tasks. This Fin-tech is a compound word of finance and technology. The purpose of this study is to investigate the overall structural relationship between Fin-Tech use motivation (usefulness, enjoyment) on user attitude and word of mouth intention. In addition, we investigated how FinTech use motivation, user attitude, and word of mouth intention change according to innovation resistance tendency. And, we examined how the motivation, user attitude, and word of mouth intention of FinTech change according to the message type (rational and emotional). The main results of this study are as follows. First, the usefulness and enjoyment of motivation for using FinTech have a positive effect on user attitude, and user attitude also has a positive effect on word of mouse intention. Second, the relationship between FinTech use motivation, user attitude, and word of mouth intention was found to difference according to consumers' innovation resistance. Third, it was found that the relationship between FinTech use motivation, user attitude, and word of mouth intention differs according to type of message (rational, emotional). At the conclusion of the study, the summary of the research results, implications and limitations, and future research direction are described.