• 제목/요약/키워드: computational linguistics

검색결과 50건 처리시간 0.02초

Implementation of Pronoun Readings in English: A Categorial Grammar Approach.

  • Lee, Yong-Hun
    • 한국영어학회지:영어학
    • /
    • 제1권4호
    • /
    • pp.609-627
    • /
    • 2001
  • Pronouns are frequently used in English, and their resolution is important to capture meaning of sentences. This paper provides a computational implementation for pronoun readings in English, based on Chierchia's (1988) Binding Theory in Categorial Grammar. A CCG-like system is newly devised for implementing his ideas, where syntactic phenomena are represented by the functor-argument relations of categories. This relation triggers resolution algorithms, and reflexives and pronominals are resolved succinctly. In sum, this paper gives an efficient resolution algorithm for English pronouns within Categorial Grammar.

  • PDF

An Algorithm for Predicting the Relationship between Lemmas and Corpus Size

  • Yang, Dan-Hee;Gomez, Pascual Cantos;Song, Man-Suk
    • ETRI Journal
    • /
    • 제22권2호
    • /
    • pp.20-31
    • /
    • 2000
  • Much research on natural language processing (NLP), computational linguistics and lexicography has relied and depended on linguistic corpora. In recent years, many organizations around the world have been constructing their own large corporal to achieve corpus representativeness and/or linguistic comprehensiveness. However, there is no reliable guideline as to how large machine readable corpus resources should be compiled to develop practical NLP software and/or complete dictionaries for humans and computational use. In order to shed some new light on this issue, we shall reveal the flaws of several previous researches aiming to predict corpus size, especially those using pure regression or curve-fitting methods. To overcome these flaws, we shall contrive a new mathematical tool: a piecewise curve-fitting algorithm, and next, suggest how to determine the tolerance error of the algorithm for good prediction, using a specific corpus. Finally, we shall illustrate experimentally that the algorithm presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, compiling methodology, corpus representativeness and linguistic comprehensiveness.

  • PDF

A Term Importance-based Approach to Identifying Core Citations in Computational Linguistics Articles

  • Kang, In-Su
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권9호
    • /
    • pp.17-24
    • /
    • 2017
  • Core citation recognition is to identify influential ones among the prior articles that a scholarly article cite. Previous approaches have employed citing-text occurrence information, textual similarities between citing and cited article, etc. This study proposes a term-based approach to core citation recognition, which exploits the importance of individual terms appearing in in-text citation to calculate influence-strength for each cited article. Term importance is computed using various frequency information such as term frequency(tf) in in-text citation, tf in the citing article, inverse sentence frequency in the citing article, inverse document frequency in a collection of articles. Experiments using a previous test set consisting of computational linguistics articles show that the term-based approach performs comparably with the previous approaches. The proposed technique could be easily extended by employing other term units such as n-grams and phrases, or by using new term-importance formulae.

A Transformation-Based Learning Method on Generating Korean Standard Pronunciation

  • Kim, Dong-Sung;Roh, Chang-Hwa
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.241-248
    • /
    • 2007
  • In this paper, we propose a Transformation-Based Learning (TBL) method on generating the Korean standard pronunciation. Previous studies on the phonological processing have been focused on the phonological rule applications and the finite state automata (Johnson 1984; Kaplan and Kay 1994; Koskenniemi 1983; Bird 1995). In case of Korean computational phonology, some former researches have approached the phonological rule based pronunciation generation system (Lee et al. 2005; Lee 1998). This study suggests a corpus-based and data-oriented rule learning method on generating Korean standard pronunciation. In order to substituting rule-based generation with corpus-based one, an aligned corpus between an input and its pronunciation counterpart has been devised. We conducted an experiment on generating the standard pronunciation with the TBL algorithm, based on this aligned corpus.

  • PDF

Using Collective Citing Sentences to Recognize Cited Text in Computational Linguistics Articles

  • Kang, In-Su
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권11호
    • /
    • pp.85-91
    • /
    • 2016
  • This paper proposes a collective approach to cited text recognition by exploiting a set of citing text from different articles citing the same article. First, the proposed method gathers highly-ranked cited sentences from the cited article using a group of citing text to create a collective information of probable cited sentences. Then, such collective information is used to determine final cited sentences among highly-ranked sentences from similarity-based cited text recognition. Experiments have been conducted on the data set which consists of research articles from a computational linguistics domain. Evaluation results showed that the proposed method could improve the performance of similarity-based baseline approaches.

21세기 세종계획 관용표현 전자사전 구축에 대하여 (On the development of a computational lexical database of idiomatic expressions in the frmework of 21st Sejong Project)

  • 박만규;이선웅;나윤희;이광호
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2001년도 제13회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.334-340
    • /
    • 2001
  • 본고는 올해 처음 시도하는 세종계획 관용표현 전자사전 구축에 관한 글이다. 본 전자사전이 완성되면 관용표현의 총체적 정보(형태, 통사, 의미, 화용 정보)를 수록하는 최초의 업적이 될 뿐만 아니라 실제 언어 자료에서 흔히 볼 수 있는 관습적 표현까지 모두 포괄하는 4만 표제어의 대규모 사전이 될 것이다. 본 사전에서는 관용표현의 형태 통사적 구성과 그 분포적 속성뿐 아니라, 관용표현이 가지는 논항의 존재 유무, 구조, 조사 통합 양상, 그리고 고정명사에 대한 수식어 제약, 어휘적 통사적 변형 양상, 선어말어미 제약, 어말어미 제약, 문장 유형 제약 등이 수록된다. 또한 각 논항의 의미역과 선택제약에 관한 정보, 그 외 다양한 의미 화용 정보 어원 표기 정보 등도 담기게 된다. 본고에서는 그러한 정보의 표기 양식을 하나하나 명시적으로 설명할 것이다.

  • PDF

채식주의자: 랭귀지 모델 접근 (A Language Model Approach to "The Vegetarian")

  • 김재준;권준혁;김유래;박명관;송상헌
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2017년도 제29회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.260-263
    • /
    • 2017
  • This paper is to broaden the possible spectrums of analyzing the Korean-written novel "The Vegetarian" by using the computational linguistics program. Through the use of language model, which was usually used in bi-gram analysis in corpus linguistics, to the International Man Booker award winning novel, the characteristics of "The Vegetarian" is investigated by comparing it to the English-written novel "A Little Life".

  • PDF

일치와 이동 (Agree and Move.)

  • 박승혁
    • 한국영어학회지:영어학
    • /
    • 제1권4호
    • /
    • pp.561-585
    • /
    • 2001
  • It has been claimed recently that the two computational operations Move and Agree of Chomsky (2000, 2001a) should be separated into distinct and independent operations. According to this view, Move is an “operation that applies only to meet an EPP-feature of a functional category.” It is also claimed under that analysis that “a candidate for Move is simply a syntactic object with phonetic content.” The purpose of this short paper is to show that the operation Move should still be viewed as composite; hence it must have the operation Agree as one of its prerequisites. We argue that the EPP feature of T may not be analyzed as an independent feature that triggers overt displacement in syntax. Under Chomsky's (2000, 2001a) theory, displacement in syntax must require the probe-goal (P, G) association before the actual movement takes place. It is shown in this paper that in order for an element $\beta$ to raise to the [Spec, T] position, the $\varphi$-features of T must establish a (P, G) relation with those of $\beta$ prior to movement. In short, Move requires Agree, the EPP feature being dependent on the minimal $\varphi$-feature [person] of nominals.

  • PDF

Agreement and Movement

  • Lee, Hong-Bae
    • 한국영어학회지:영어학
    • /
    • 제1권1호
    • /
    • pp.145-162
    • /
    • 2001
  • The operation Move is defined in Chomsky (1999, 2000) as a composite operation consisting of three components: Agree, Identify and Merge, taking Agree as a necessary condition for Move. Therefore, I call this definition of Move as the Agree-based Move. In this paper, I argue that the Agree-based approach to Move cannot be maintained; I claim that the Selection-based approach to Move, in which the EPP-feature is analyzed as an s-selectional property of a head, offers a more natural account of the sentences under consideration. I believe that the three components of Move as defined in (6) happen to co-occur in the derivation of certain sentences, as the composite transformation called Passivization does in the derivation of a passive sentence like “the city was destroyed by the enemy.” On the basis of these observations, I conclude that Agree and Move should be regarded as separate computational operations; the task of Agree is to erase uninterpretable features of both probe and goal, and that of Move is to satisfy the EPP-feature, which should be taken as an s-selectional feature.

  • PDF