• Title/Summary/Keyword: lexical approach

Search Result 75, Processing Time 0.031 seconds

Automatic Construction of Korean Two-level Lexicon using Lexical and Morphological Information (어휘 및 형태 정보를 이용한 한국어 Two-level 어휘사전 자동 구축)

  • Kim, Bogyum;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.865-872
    • /
    • 2013
  • Two-level morphology analysis method is one of rule-based morphological analysis method. This approach handles morphological transformation using rules and analyzes words with morpheme connection information in a lexicon. It is independent of language and Korean Two-level system was also developed. But, it was limited in practical use, because of using very small set of lexicon built manually. And it has also a over-generation problem. In this paper, we propose an automatic construction method of Korean Two-level lexicon for PC-KIMMO from morpheme tagged corpus. We also propose a method to solve over-generation problem using lexical information and sub-tags. The experiment showed that the proposed method reduced over-generation by 68% compared with the previous method, and the performance increased from 39% to 65% in f-measure.

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

Korean Part-of-Speech Tagging System Using Resolution Rules for Individual Ambiguous Word (어절별 중의성 해소 규칙을 이용한 혼합형 한국어 품사 태깅 시스템)

  • Park, Hee-Geun;Ahn, Young-Min;Seo, Young-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.6
    • /
    • pp.427-431
    • /
    • 2007
  • In this paper we describe a Korean part-of-speech tagging approach using resolution rules for individual ambiguous word and statistical information. Our tagging approach resolves lexical ambiguities by common rules, rules for individual ambiguous word, and statistical approach. Common rules are ones for idioms and phrases of common use including phrases composed of main and auxiliary verbs. We built resolution rules for each word which has several distinct morphological analysis results to enhance tagging accuracy. Each rule may have morphemes, morphological tags, and/or word senses of not only an ambiguous word itself but also words around it. Statistical approach based on HMM is then applied for ambiguous words which are not resolved by rules. Experiment shows that the part-of-speech tagging approach has high accuracy and broad coverage.

A Muti-Resolution Approach to Restaurant Named Entity Recognition in Korean Web

  • Kang, Bo-Yeong;Kim, Dae-Won
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.277-284
    • /
    • 2012
  • Named entity recognition (NER) technique can play a crucial role in extracting information from the web. While NER systems with relatively high performances have been developed based on careful manipulation of terms with a statistical model, term mismatches often degrade the performance of such systems because the strings of all the candidate entities are not known a priori. Despite the importance of lexical-level term mismatches for NER systems, however, most NER approaches developed to date utilize only the term string itself and simple term-level features, and do not exploit the semantic features of terms which can handle the variations of terms effectively. As a solution to this problem, here we propose to match the semantic concepts of term units in restaurant named entities (NEs), where these units are automatically generated from multiple resolutions of a semantic tree. As a test experiment, we applied our restaurant NER scheme to 49,153 nouns in Korean restaurant web pages. Our scheme achieved an average accuracy of 87.89% when applied to test data, which was considerably better than the 78.70% accuracy obtained using the baseline system.

A Focus Account for Contrastive Reduplication: Prototypicality and Contrastivity

  • Lee, Bin-Na;Lee, Chung-Min
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.259-267
    • /
    • 2007
  • This paper sets forth the phenomenon of Contrastive Reduplication (CR) in English relevant to the notion of contrastive focus (CF). CF differs from other reduplicative patterns in that rather than the general intensive function, denotation of a more prototypical and default meaning of a lexical item appears from the reduplicated form resulting as a semantic contrast with the meaning of the non-reduplicated word. Thus, CR is in concordance with CF under the concept of contrastivity. However, much of the previous works on CF associated contrastivity with a manufacture of a set of alternatives taking a semantic approach. We claim that a recent discourse-pragmatic account takes advantage of explaining the vague contrast in informativeness of CR. Zimmermann's (2006) Contrastive Focus Hypothesis characterizes contrastivity in the sense of speaker's assumptions about the hearer's expectation of the focused element. This approach makes possible adaptation to CR and recovers the possible subsets of meaning of a reduplicated form in a more refined way showing contrastivity in informativeness. Additionally, CR in other languages along with similar set-limiting phenomenon in various languages will be introduced in general.

  • PDF

Effects of Corpus Use on Error Identification in L2 Writing

  • Yoshiho Satake
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.1
    • /
    • pp.61-71
    • /
    • 2023
  • This study examines the effects of data-driven learning (DDL)-an approach employing corpora for inductive language pattern learning-on error identification in second language (L2) writing. The data consists of error identification instances from fifty-five participants, compared across different reference materials: the Corpus of Contemporary American English (COCA), dictionaries, and no use of reference materials. There are three significant findings. First, the use of COCA effectively identified collocational and form-related errors due to inductive inference drawn from multiple example sentences. Secondly, dictionaries were beneficial for identifying lexical errors, where providing meaning information was helpful. Finally, the participants often employed a strategic approach, identifying many simple errors without reference materials. However, while maximizing error identification, this strategy also led to mislabeling correct expressions as errors. The author has concluded that the strategic selection of reference materials can significantly enhance the effectiveness of error identification in L2 writing. The use of a corpus offers advantages such as easy access to target phrases and frequency information-features especially useful given that most errors were collocational and form-related. The findings suggest that teachers should guide learners to effectively use appropriate reference materials to identify errors based on error types.

Generalized Command Mode Finite Element Method Toolbox in CEMTool

  • Ahn, Choon-Ki;Kwon, Wook-Hyun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1349-1353
    • /
    • 2003
  • CEMTool is a command style design and analyzing package for scientific and technological algorithm and a matrix based computation language. In this paper, we present a compiler based approach to the implementation of the command mode generalized PDE solver in CEMTool. In contrast to the existing MATLAB PDE Toolbox, our proposed FEM package can deal with the combination of the reserved words such as "laplace" and "convect". Also, we can assign the border lines and the boundary conditions in a very easy way. With the introduction of the lexical analyzer and the parser, our FEM toolbox can handle the general boundary condition and the various PDEs represented by the combination of equations. That is why we need not classify PDE as elliptic, hyperbolic, parabolic equations. Consequently, with our new FEM toolbox, we can overcome some disadvantages of the existing MATLAB PDE Toolbox.

  • PDF

Sentence-Chain Based Seq2seq Model for Corpus Expansion

  • Chung, Euisok;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.455-466
    • /
    • 2017
  • This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.

A study of English affixes: Concentrated on the affixes -en and -ing (영어의 접사 연구: 접사 -en, -ing 를 중심으로)

  • Park, Soon-Bong
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.301-314
    • /
    • 2009
  • This study explores the function of the affixes -en and -ing that could influence the theta-roles of verbs to which the affixes are attached. The two affixes often appear in the synthetic compounds in English. The results are as follows. First, the affixes -en and -ing link the theta-role realized in the subject of the verb to the noun followed, which is proposed as Theta-linking Principle: that is, the affixes -en and -ing link the theta-role realized in the subject of the verb to the noun followed. Second, in the synthetic compounds including the affixes -en and -ing, the left element must not be the subject of the verb, which is the Synthetic Compound Constraint. And the affix -er link thematic roles of the sentential subject, such as Agent, Instrument. Thus, this study aims to find out the function of the affixes on the point of lexical functional approach.

  • PDF

Ranking Translation Word Selection Using a Bilingual Dictionary and WordNet

  • Kim, Kweon-Yang;Park, Se-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.1
    • /
    • pp.124-129
    • /
    • 2006
  • This parer presents a method of ranking translation word selection for Korean verbs based on lexical knowledge contained in a bilingual Korean-English dictionary and WordNet that are easily obtainable knowledge resources. We focus on deciding which translation of the target word is the most appropriate using the measure of semantic relatedness through the 45 extended relations between possible translations of target word and some indicative clue words that play a role of predicate-arguments in source language text. In order to reduce the weight of application of possibly unwanted senses, we rank the possible word senses for each translation word by measuring semantic similarity between the translation word and its near synonyms. We report an average accuracy of $51\%$ with ten Korean ambiguous verbs. The evaluation suggests that our approach outperforms the default baseline performance and previous works.