• Title/Summary/Keyword: Korean Linguistic Feature

Search Result 43, Processing Time 0.021 seconds

A Character Identification Method using Postpositions for Animate Nouns in Korean Novels (한국어 소설에서 유정명사용 조사 기반의 인물 추출 기법)

  • Park, Taekeun;Kim, Seung-Hoon
    • Journal of Information Technology Services
    • /
    • v.15 no.3
    • /
    • pp.115-125
    • /
    • 2016
  • Novels includes various character names, depending on the genre and the spatio-temporal background of the novels and the nationality of characters. Besides, characters and their names in a novel are created by the author's pen and imagination. As a result, any proper noun dictionary cannot include all kind of character names which have been created or will be created by authors. In addition, since Korean does not have capitalization feature, character names in Korean are harder to detect than those in English. Fortunately, however, Korean has postpositions, such as "-ege" and "hante", used by a sentient being or an animate object (noun). We call such postpositions as animate postpositions in this paper. In a previous study, the authors manually selected character names by referencing both Wikipedia and well-known people dictionaries after utilizing Korean morpheme analyzer, a proper noun dictionary, postpositions (e.g., "-ga", "-eun", "-neun", "-eui", and "-ege"), and titles (e.g., "buin"), in order to extract social networks from three novels translated into or written in Korean. But, the precision, recall, and F-measure rates of character identification are not presented in the study. In this paper, we evaluate the quantitative contribution of animate postpositions to character identification from novels, in terms of precision, recall, and F-measure. The results show that utilizing animate postpositions is a valuable and powerful tool in character identification without a proper noun dictionary from novels translated into or written in Korean.

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

Processing Three Types of Korean Cleft Constructions in a Typed Feature Structure Grammar (유형화된 자질문법에서의 한국어 분열구문의 전산학적 처리)

  • Kim, Jong-Bok;Yang, Jae-Hyung
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.1
    • /
    • pp.1-28
    • /
    • 2009
  • The expression KES, one of the most commonly used words in the Korean language, has various usages. This expression is also used to express English-like cleft constructions. Korean seems to employ at least three different types of cleft constructions: predicational, identificational, and eventual. The paper tries to provide a constraint-based analysis of these three types of Korean cleft constructions and implement the analysis in the LKB(Linguistic Knowledge Building) system to check the feasibility of the analysis. In particular, the paper shows how a typed feature structure grammar, couched upon HPSG, can provide a robust basis for parsing Korean cleft constructions.

  • PDF

A Reconsideration on the Efficiency of the Extended Projection Principle (데이터분석을 통한 확대투사원리의 효율성 제고)

  • Joo, Chi-Woon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.219-228
    • /
    • 2011
  • Main concern will be put at suggesting an alternative idea about the basic notion of the Extended Projection Principle (henceforth, ECP) which has been slightly changed since the initial appearance of the EPP. The EPP had been dependent on Case and theta-role under the era of the early generative grammar, whereas it was reduced only to the categorial feature [D] under the minimalism. Various data such as Locative Inversion constructions, there-expletive constructions, and sentences related to binding theory will be dealt with to suggest an plausible alternative idea. As a conclusion, it will be attested that the SPEC position of the inflectional clause should be filled with a maximally projected lexical item. This conclusion will be reached by analyzing lots of linguistic data.

Korean Nominal Bank, Using Language Resources of Sejong Project (세종계획 언어자원 기반 한국어 명사은행)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.17 no.2
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF

The Variable Acquisition of Discourse Marker Use in Korean American Speakers of English

  • Lee, Hi-Kyoung
    • English Language & Literature Teaching
    • /
    • v.11 no.2
    • /
    • pp.1-18
    • /
    • 2005
  • This study is a preliminary investigation of the nature of discourse marker acquisition in Korean American speakers of English. Discourse markers are of interest because they are not an aspect of language taught through formal instruction either to native or non-native speakers. Therefore, discourse marker use serves as indirect evidence of face-to-face interaction with native speakers and an indicator of integration. In this light, the present study examines the presence of discourse markers in Korean Americans. The markers chosen for analysis were you know, like, and I mean. The data consist of spontaneous speech elicited from interviews. Sociolinguistic variables such as age, sex, and generation (i.e., $1^{st}$, 1.5, $2^{nd}$) were examined. Results show that there appears to be interaction between the variables and discourse marker use. While all speakers showed variable acquisition of markers, younger, female, and 1.5 generation speakers were found to use discourse markers more than other speakers. Although discourse marker use is optional and thus not a linguistic feature that must be necessarily acquired, it is clear that use is pervasive and acquired differentially by English speakers irrespective of whether they are native or not.

  • PDF

THE SEMANTIC AND PRAGMATIC NATURE OF HONORIFIC AGREEMENT IN KOREAN:A CONSTRAINT-BASED APPROACH

  • Park, Byung-Soo
    • Language and Information
    • /
    • v.2 no.1
    • /
    • pp.116-156
    • /
    • 1998
  • This paper is an HPSG approach to agreement phenomena involving the Korean honorific expressions. it is shown that the theoretical devices developed by the constraint-based theory of HPSG can be fruitfully used to capture the interactions between syntactic constraints and semantic of pragmatic factors in Korean honorific agreement. The HPSG's semantic feature 'referential index' plays a key rele in discribing the multiple interaction. The constraint-based theory of agreement proves successful in accounting for the phenomenon that may be called 'inconsistent' honorific agreement as well as 'consistent' regular honorific usages. However, this paper acknowledges its limit. Recognizing an important distinction between basic and 'coercive' honorific expressions, it is argued that a systactic-semantic-pragmatic approach such as the present one can only be applied to basic honorific agreement. Being sociolinguistic in nature, coercive honorific agreement is perhaps not amenable to formal linguistic investigation.

  • PDF

Modelling Duration In Text-to-Speech Systems

  • Chung Hyunsong
    • MALSORI
    • /
    • no.49
    • /
    • pp.159-174
    • /
    • 2004
  • The development of the durational component of prosody modelling was overviewed and discussed in text-to-speech conversion of spoken English and Korean, showing the strengths and weaknesses of each approach. The possibility of integrating linguistic feature effects into the duration modelling of TTS systems was also investigated. This paper claims that current approaches to language timing synthesis still require an understanding of how segmental duration is affected by context. Three modelling approaches were discussed: sequential rule systems, Classification and Regression Tree (CART) models and Sums-of-Products (SoP) models. The CART and SoP models show good performance results in predicting segment duration in English, while it is not the case in the SoP modelling of spoken Korean.

  • PDF

Processing Korean Passives for Database Semantics (데이터베이스 의미론을 위한 한국어 피동형의 전산적 처리)

  • 홍정하;최승철;이기용
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2000.06a
    • /
    • pp.411-418
    • /
    • 2000
  • Hausser (1999)와 이기용 (1999a, 1999c)에서는 데이터베이스 관리 시스템(DBMS)을 이용하여 자연언어의 의미를 다루는 데이터베이스 의미론을 제안하였다. 특히 이기용 (1999c)에서는 수형도(tree), 논리 형태(logical fomulas), 자질 구조(feature structure)와 같은 다양한 언어 표상 형식들을 관계형 데이터베이스 관리 시스템(DBMS)의 표상 형식인 테이블 형식으로 전환 가능함을 보임으로써 데이터베이스 의미론에 관계형 데이터베이스 관리 시스템을 도입할 수 있음을 제시하였다. 한편, Lee (2000)에서 제시한 데이터베이스 의미론 모형에서는 데이터베이스 관리 시스템과 사용자(end-user)를 연결하는 언어 정보 처리 시스템(LIPS; Linguistic Information Processing System)을 제안하였다. 이 언어 정보 처리 시스템은 사용자에 의해 입력된 언어 자료를 처리하여 그 분석 결과를 데이터베이스 관리 시스템에 전달하고, 이를 통해 구축된 데이터베이스에서 추출한 정보를 다시 사용자에게 전달하는 시스템이다. 이 논문은 한국어 '이, 히, 리, 기' 피동형을 전산처리 할 수 있도록, 데이터베이스 의미론에서 핵심 요소인 언어정보 처리 시스템과 데이터베이스 관리 시스템을 구현하는 것이 목적이다.

  • PDF

Mother culture interference on EFL writing (외국어로서의 영작문에 있어서 모문화의 간섭)

  • Choe, Yong-Jae
    • English Language & Literature Teaching
    • /
    • no.3
    • /
    • pp.1-12
    • /
    • 1997
  • Errors in EFL writing are very often attributable to learner's inadequate understanding of the target culture. Most of these errors are very hard to identify because they are grammatically correct notwithstanding the meaning. EFL learners almost habitually equate the meaning and usage of a linguistic item when it is present both in the native and the target languages. However, seemingly identical items in both languages sometimes prove themselves to be distinct from each other because of cultural differences. Some expressions in the native language are neither socially acceptable nor meaningful in the target language. Out of sheer ignorance, moreover, one puts a target item in the way he may use it in his native language. For instance. the primary feature of the term "friend" in Korean is [+same age group]. So, a Korean young man is not supposed to call his teacher a friend. This paper aims to clarify patterns of college level writing errors caused by interference of mother culture.

  • PDF