• Title/Summary/Keyword: Collocations

Search Result 21, Processing Time 0.025 seconds

Extracting Collocations Using Entropy in Korean (엔트로피를 이용한 한국어 연어 추출)

  • 박경미;송만석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.451-453
    • /
    • 2002
  • 연어는 습관적으로 같이 자주 나타나는 단어열로 각 단어로 분리하기보다 통합해 처리하는 것이 효율적이기 때문에 기계 번역과 음성 인식등에서 유용만 정보로 사용된다. 이러한 연어를 추출하기 위해 본 논문에서는 2가지 경우를 고려했는데, 첫 번째로 인어를 말뭉치에 자주 나타나는 단어열이라고 했을 때 단어열들의 엔트로피가 일정값 이상이면 연어로 추출했다 두 번째로 통사적 제약이 있는 연어를 주술하기 위해 앞 또는 뒤에 올 단어를 제약하는 단어의 엔트로피를 구해 일정값 미만이면 그 단어를 포함한 단어열을 연어로 추출했다. 실험은 품사 부착된 HANTCE 말뭉치를 가지고 수행했고, 젓 번째 방법으로 실험했을 때 엔드로피가 2이상인 단어열을 가지고 분리된 연어도 유도해냈다.

  • PDF

'Because of Doing' and 'Because of Happening': A Corpus-based Analysis of Korean Causal Conjunctives, -nula(ko) and -nun palamey

  • Oh, Sang-Suk
    • Language and Information
    • /
    • v.8 no.2
    • /
    • pp.131-147
    • /
    • 2004
  • the two Korean causal conjunctive suffixes, -nula(ko) and -nun palamey, based on corpus linguistic analysis. Many of the linguistic accounts available, both in pedagogical reference and in the literature on linguistics, provide incomplete analyses of these suffixes, based on fabricated linguistic data. Using naturally occurring, real linguistic data, this paper examines the syntactic and semantic structures of the two causal suffixes through a consideration of three areas of corpus linguistic analysis: token frequencies, collocations, and semantic prosody. An analysis based on concordance data reveals that the two causal connectives, -nula(ko) and -nun palamey, have more differences than similarities in terms of syntactic and semantic constraints. The idiosyncratic structures of the two suffixes are discussed in terms of same subject condition, verb selection, same agent condition, synchronicity condition, and negative semantic prosody.

  • PDF

Parallels between Korean Verbs and Nouns in Subcategorization (한국어 동사와 명사사이의 하위범주화에 있어서의 평행성)

  • 노용균
    • Language and Information
    • /
    • v.1
    • /
    • pp.27-65
    • /
    • 1997
  • Nouns in the Korean language are subcategorized for various frames(called SUBCAT lists) in much the same way as verbs are. Assuming a monostratal grammar and building on analyses of various 'little elements' as clitics, such as the ones given by No(1991), Chae(1995,1996), and Oh(1991), I delineate the ranges of SUBCAT lists for the Korean verbs and nouns and show that the two word-classes have heavily overlapping frames. Twenty five SUBCAT lists are identified for verbs, and twenty four for nouns, of which twenty three find associated lexical items in both. By the way of justification, I offer analyses of noun--verb collocations in terms of the new five-valued syntactic feature COLLOC along with SUBCAT, which subsume 'light verb' constructions. It is hoped that this work will have given clear syntactic underpinnings to those who are concerned with practical lexicography.

  • PDF

Semantic Prosody and Meaning Equivalence: Is Korean pin konggan Equivalent to ‘Empty Space’ or ‘Blank Space’\ulcorner (의미운률과 의미 등가성: ‘빈 공간’은 ‘empty space’인가 ‘blank space’인가\ulcorner)

  • 조의연
    • Korean Journal of English Language and Linguistics
    • /
    • v.3 no.4
    • /
    • pp.589-609
    • /
    • 2003
  • The purpose of this paper is to show that lexical equivalency in translation can be achieved when it is based on semantic prosodies of lexical items. This paper examines the semantic prosodies of two seemingly synonymous English adjectives ‘empty’ and ‘blank’ on the basis of the corpus given in Cobuild English Collocations on CD-ROM and proposes that they are different in terms of spatial dimensions. Thus when a Korean equivalent pin derived from the verb pita is translated into English, syntagmatic phraseological environments of the Korean adjective must be taken into account to attain the equivalency of the source and target languages. Relevant Korean corpus was taken from the 21st Century Sejong Plan (2002). Out of 12 examples of pin konggan, five appear to be equivalent to ‘blank’ and seven to ‘empty.’ The five to seven ratio in different usage indicates that the equivalency problem concerning the lexical item pin is not a trivial matter in translation.

  • PDF

Practical Target Word Selection Using Collocation in English to Korean Machine Translation (영한번역 시스템에서 연어 사용에 의한 실용적인 대역어 선택)

  • 김성묵
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.5 no.2
    • /
    • pp.56-61
    • /
    • 2000
  • The quality of English to Korean Machine Translation depends on how well it deals with target word selection of verbs containing enormous ambiguity. Verb sense disambiguation can be done by using collocation, but the construction of verb collocations costs a lot of efforts and expenses. So, existing methods should be examined in the practical view points. This paper describes the practical method of target word selection using existing collocation and semantic distance computed from minimum semantic features of nouns.

  • PDF

The Study on the Model of Extracting Collocations from Corpus in Korean Using the Statistical Tools (통계 기법을 이용한 연어 추출 모형 연구)

  • Ahn, Sung-Min
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.162-165
    • /
    • 2010
  • 공기하여 나타나는 구 정보 중에서 언어에 대한 연구는 응용 언어학에 발전에 기여할 수 있는 부분이 크다. 연어란 어휘들 간의 제한된 결합 관계를 갖는 공기 확률이 높은 구 구성이다. 이러한 연어 구성에 대한 연구는 특히 기계 번역이나 사전 편찬 등의 분야에서 관심이 높아지고 있다. 본 연구에서는 언어를 추출하기 위해 T-test와 상호 정보, 조건 확률 등의 여러 통계 기법의 사용을 제시한다. 각 기법을 적용하였을 때 연어 추출에 어떠한 변화를 보이는지 조사하였고, 가장 적절한 기법의 적용도 모색함으로써 향후 언어 추출의 방향을 제시하고자 한다.

  • PDF

A Corpus-based Lexical Analysis of the Speech Texts: A Collocational Approach

  • Kim, Nahk-Bohk
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.151-170
    • /
    • 2009
  • Recently speech texts have been increasingly used for English education because of their various advantages as language teaching and learning materials. The purpose of this paper is to analyze speech texts in a corpus-based lexical approach, and suggest some productive methods which utilize English speaking or writing as the main resource for the course, along with introducing the actual classroom adaptations. First, this study shows that a speech corpus has some unique features such as different selections of pronouns, nouns, and lexical chunks in comparison to a general corpus. Next, from a collocational perspective, the study demonstrates that the speech corpus consists of a wide variety of collocations and lexical chunks which a number of linguists describe (Lewis, 1997; McCarthy, 1990; Willis, 1990). In other words, the speech corpus suggests that speech texts not only have considerable lexical potential that could be exploited to facilitate chunk-learning, but also that learners are not very likely to unlock this potential autonomously. Based on this result, teachers can develop a learners' corpus and use it by chunking the speech text. This new approach of adapting speech samples as important materials for college students' speaking or writing ability should be implemented as shown in samplers. Finally, to foster learner's productive skills more communicatively, a few practical suggestions are made such as chunking and windowing chunks of speech and presentation, and the pedagogical implications are discussed.

  • PDF

Java API Pattern Extraction and Recommendation using Collocation Analysis (연어 관계 분석을 통한 Java API 패턴 추출 및 추천 방법)

  • Kwon, Chanwoo;Hwang, Sangwon;Nam, Youngkwang
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1165-1177
    • /
    • 2017
  • Many developers utilize specific APIs to develop software, and to identify the use of a particular API, a developer can refer to a website that provides the API or can retrieve the API from the web. However, the site that provides the API does not necessarily provide guidance on how to use it while it can be partially provided in many other cases. In this paper, we propose a novel system JACE (Java AST collocation-pattern extractor) as a method to reuse commonly-used code as a supplement. The JACE extracts the API call nodes, collocation patterns and analyzes the relations between the collocations to extract significant API patterns from the source code. The following experiment was performed to verify the accuracy of a defined pattern: 794 open source projects were analyzed to extract about 15M API call nodes. Then, the Eclipse plug-in test program was utilized to retrieve the pattern using the top 10 classes of API call nodes. Finally, the code search results from reference pages of the API classes and the Searchcode [1] were compared with the test program results.

Target Word Selection for English-Korean Machine Translation System using Multiple Knowledge (다양한 지식을 사용한 영한 기계번역에서의 대역어 선택)

  • Lee, Ki-Young;Kim, Han-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.75-86
    • /
    • 2006
  • Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the translation accuracy of machine translation systems. In this paper, we present a new approach to select Korean target word for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean local context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. The experiment showed promising results for diverse sentences from web documents.

  • PDF

The Impact of an Ontological Knowledge Representation on Information Retrieval: An Evaluation Study of OCLC's FRBR-Based FictionFinder (정보검색에 온톨로지 지식 표현이 미치는 영향에 대한 연구: OCLC의 FRBR기반 FictionFinder의 평가를 중심으로)

  • Cho, Myung-Dae
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.2
    • /
    • pp.183-198
    • /
    • 2008
  • With the purpose of enriching existing catalogues with FRBR, which is the Functional Requirements for Bibliographic Records, in mind, this paper aims to evaluate the impact of bibliographic ontology on the overall system's performance in the field of literature. In doing this, OCLC's FictionFinder(http://fictionfinder.oclc.org) was selected and qualitatively evaluated. In this study 40 university seniors evaluated the following three aspects using the 'transferring thoughts onto paper method': 1) In which ways is this FRBR-aware bibliographical ontology helpful? 2) Are the things which are initially attempted to be helped being helped? 3) Would users seeking one work in particular also see all other related works? In conclusion, this study revealed that, as Cutter claimed in his $2^{nd}$ rule of the library, collocations give added-value to the users and overall ontology provides better interface and usefulness. It also revealed that a system's evaluation with qualitative methodology helped to build full pictures of the system and to grip the information needs of the users when the system is developed. Qualitative evaluations, therefore, could be used as indicators for the evaluation of any information retrieval systems.