• 제목/요약/키워드: free word-order language

검색결과 16건 처리시간 0.023초

한국어의 어순 구조를 고려한 Two-Path 언어모델링 (Two-Path Language Modeling Considering Word Order Structure of Korean)

  • 신중휘;박재현;이정태;임해창
    • 한국음향학회지
    • /
    • 제27권8호
    • /
    • pp.435-442
    • /
    • 2008
  • n-gram 모델은 영어와 같이 어순이 문법적으로 제약을 받는 언어에 적합하다. 그러나 어순이 비교적 자유로운 한국어에는 적합하지 않다. 기존 연구는 어절 간 어순의 고려가 어려운 한국어의 특성을 반영한 twoply HMM을 제안했으나, 인접 어절 간 어순 구조를 반영하지 못하였다. 본 논문에서는 용언형태소 사이에 나타나는 인접 어절 간에 어순 특성을 반영하기 위해 두 어절을 결합하는 세그먼트 단위를 정의하고, 제안한 세그먼트 단위에서 문맥에 따라 확률을 달리 추정하는 two-path 언어모델을 제안한다. 그 결과 기존 한국어 언어모델에 비해 제안하는 two-path 언어모델은 기존 연구보다 25.68% 혼잡도를 줄였으며, 어절 간에 결합이 일어나는 경계인 용언형태소에서는 94.03%의 혼잡도를 줄였다.

A Simple Syntax for Complex Semantics

  • Lee, Kiyong
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2002년도 Language, Information, and Computation Proceedings of The 16th Pacific Asia Conference
    • /
    • pp.2-27
    • /
    • 2002
  • As pact of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processing strings of words or sentences to construct semantic data structures. For design arid modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction arid temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser's (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity arid two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be conceptually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax cart incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes arid that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

  • PDF

Combinatory Categorial Grammar for Korean

  • 한성국;박찬곤
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 1990년도 제2회 한글 및 한국어정보처리 학술대회
    • /
    • pp.164-171
    • /
    • 1990
  • A commutative productive category is proposed to the current CCG for the syntactic analysis of free word order languages like Korean. The introduction of this sort of category is quite natural for categorial lexicon and functional operations. We present the theorical basis of productive category and examine the linguistic availability through typical syntactic structures of Korean.

  • PDF

양화사유동과 관련된 한국어의 분석과 전산처리 (Analysis and Computational Processing of Quantifier Floating in Korean)

  • 이진복;박종철
    • 한국언어정보학회지:언어와정보
    • /
    • 제7권1호
    • /
    • pp.1-22
    • /
    • 2003
  • Quantifier floating is one of the much studied phenomena in natural languages where quantifying expressions may appear in places other than their original prenominal one. Its presence is especially prominent in languages such as Korean that allow more or less free word order. We find that, in addition to what is described in the literature, there are other remarkable regularities in the way the language allows quantifiers to “float” with respect to various constructions including coordination, relative clauses, and embedded clauses. These regularities are captured syntactically in a combinatory categorial grammar (CCG) framework for Korean. We also show how to derive semantic representations for Korean quantifier floating in the same CCG framework.

  • PDF

화이트헤드의 언어 이해와 상징적 연관 (Language and Symbolic Reference in Whitehead′s Philosophy)

  • 문창옥
    • 인문언어
    • /
    • 제6권
    • /
    • pp.147-166
    • /
    • 2004
  • Whitehead's discussion of language is not to be found in any one book or article. It is interwoven with his discussion of many other questions. He was, however, greatly concerned with the problem of symbolism in general and the uses of language. He regards language, spoken or written, as an instrument devised by men to aid them in their adjustment to the environment in which they live Language is used for many specific purposes in the process of this adjustment. Words are employed not only to refer to data and to express emotions. They may be used also to record experiences, and thoughts about these experiences. Worts also function as instruments in the organization of experiences as they are considered in retrospect. Thus words free us from the bondage of the immediate. And Whitehead's theory of meaning is implicit in his discussion of the functions of language. According to him, the human mind is functioning symbolically when some components of its experience elicit consciousness, beliefs, emotions, and usages, respecting other components of its experiences. The former set of components are the 'symbols', and the latter set constitute the 'meaning' of the symbols. Whitehead points out that one word may have several meanings, i.e. refer to several different data. In order to understand, thus, the meaning to which a word refers, it is sometimes very important to appreciate the system of thought within which a person is operating. Further, Whitehead's discussion of language includes a number of cogent warning the deficiencies of language, and hence the need for great care in the use of words. In fact, language developed gradually. For the most part we have created words designed to deal with practical problems. Attention focuses on the prominent features in a situation, in particular the changing aspects of things. With reference to such data our words are relatively adequate. However, this issues in an unfortunate superficiality. The enduring, the subtle, the complex and the general aspects of the universe do not have adequate verbal representation. for this reason, Whitehead's position concerning the uses of language in speculative philosophy is stated with pungent directness. The uncritical trust in the adequacy of language is one of the main errors to which philosophy is liable. Since ordinary language does not do justice to the generalities, profundities and complexities of life, it is obvious that philosophy requires new words and phrases, or at least the revision of familiar words and phrases. Proceeding to develop the theme Whitehead contends that words and phrases must be stretched towards a generality foreign to their ordinary usage. In the same vein Whitehead refers to the need to realize that language which is the tool of philosophy needs to be redesigned just as in physical science available physical apparatus needs to be redesigned. But even these words and phrases, stretched or redesigned, are never completely adequate in philosophical speculations. They are, in his opinion, merely a great improvement over ordinary language or the language science, mathematics or symbolic logic.

  • PDF

Romanian-Lexicon-Based Sentiment Analysis for Assesing Teachers' Activity

  • Barila, Adina;Danubianu, Mirela;Gradinaru, Bogdanel
    • International Journal of Computer Science & Network Security
    • /
    • 제22권10호
    • /
    • pp.43-50
    • /
    • 2022
  • The students' feedback is important to measure and improve teaching performance. Many teacher performance evaluation systems are based on responses to closed question, but the free text answers can contain useful information which had to be explored. In this paper we present a lexicon-based sentiment analysis to explore students' text feedback. The data was collected from a system for the evaluation of teachers by students developed and used in our university. The students comments are in Romanian language so we built a Romanian sentiment word lexicon. We used this to categorize the feeback text as positive, negative or neutral. In addition, we added a new polarity - indifferent - in order to categorize blank and "I don't answer" responses.

정보 검색을 위한 숫자의 해석에 관한 구문적.의미적 판별 기법 (Syntactic and Semantic Disambiguation for Interpretation of Numerals in the Information Retrieval)

  • 문유진
    • 한국컴퓨터정보학회논문지
    • /
    • 제14권8호
    • /
    • pp.65-71
    • /
    • 2009
  • 월드 와이드 웹의 정보 검색에서 산출되어지는 수많은 정보를 효율적으로 검색하기 위해서 자연어 정보처리가 필수적이다. 이 논문은 텍스트에서 숫자의 의미 파악을 위한 판별기법을 제안한 것이다. 숫자 의미 판별기법은 챠트 파싱 기법과 함께 문맥자유 문법을 활용하여 숫자 스트링과 연관된 접사를 해석하였으며, N-그램 기반의 단어에 의거하여 조직화된 의미 파악을 하도록 설계되었다. 그리고 POS 태거를 사용하여 트라이그램 단어의 제한조건이 자동 인식되도록 시스템을 구성하여, 점진적으로 효율적인 숫자의 의미 파악을 하도록 하였다. 이 논문에서 제안한 숫자 해석 시스템을 실험한 결과, 빈도수 비례 방법은 86.3%의 정확률을 나타냈고 조건수 비례 방법은 82.8%의 정확률을 나타냈다.