• Title/Summary/Keyword: corpus linguistics

Search Result 77, Processing Time 0.026 seconds

Collocation Networks and Covid-19 in Letters to the Editor: A Malaysian Case Study

  • Joharry, Siti Aeisha;Turiman, Syamimi
    • Asia Pacific Journal of Corpus Research
    • /
    • v.1 no.1
    • /
    • pp.1-30
    • /
    • 2020
  • The present study examines language used to talk about the global coronavirus pandemic during a three-month period of movement control order in Malaysia. More specifically, a corpus of online letters to the editor of a local popular national newspaper was collected during the time in which the official quarantine instruction was initiated, resulting in a total of 303 online letters written by Malaysians that were analyzed through use of corpus linguistics techniques. For this purpose, the latest version of #LancsBox 5.0 (Brezina et al., 2020) is used to analyze patterns of language surrounding the portrayal of Covid-19 and further visualizing them by use of collocation networks. Findings present 25 statistically significant collocates that share an interesting relationship in revealing what the letters are about and thus, reflecting how Malaysians perceive and receive news about the pandemic during this time. Recurring topics and expressions include describing the virus in terms of metaphorical use of language (Covid-19 does not discriminate), preparing for an economic fallout (Prihatin Economic Stimulus Package), and preference to associate Covid-19 as a pandemic (impacts of the Covid19 pandemic) rather than an outbreak (first/second/third wave of the outbreak). Implications of the study resonates with findings from Azizan et al. (2020) where constructions of positive discourse among Malaysian writers may reflect the culture and society that make up the nation.

Semantic Prosody and Meaning Equivalence: Is Korean pin konggan Equivalent to ‘Empty Space’ or ‘Blank Space’\ulcorner (의미운률과 의미 등가성: ‘빈 공간’은 ‘empty space’인가 ‘blank space’인가\ulcorner)

  • 조의연
    • Korean Journal of English Language and Linguistics
    • /
    • v.3 no.4
    • /
    • pp.589-609
    • /
    • 2003
  • The purpose of this paper is to show that lexical equivalency in translation can be achieved when it is based on semantic prosodies of lexical items. This paper examines the semantic prosodies of two seemingly synonymous English adjectives ‘empty’ and ‘blank’ on the basis of the corpus given in Cobuild English Collocations on CD-ROM and proposes that they are different in terms of spatial dimensions. Thus when a Korean equivalent pin derived from the verb pita is translated into English, syntagmatic phraseological environments of the Korean adjective must be taken into account to attain the equivalency of the source and target languages. Relevant Korean corpus was taken from the 21st Century Sejong Plan (2002). Out of 12 examples of pin konggan, five appear to be equivalent to ‘blank’ and seven to ‘empty.’ The five to seven ratio in different usage indicates that the equivalency problem concerning the lexical item pin is not a trivial matter in translation.

  • PDF

A Study on the Lexicalization of {Geuraegajigo} Based on the Spontaneous Speech Corpus (자유 발화 자료에 나타난 {그래가지고}의 접속 부사화)

  • Ha, Youngwoo;Shin, Jiyoung
    • Korean Linguistics
    • /
    • v.64
    • /
    • pp.195-223
    • /
    • 2014
  • The aim of this paper is to study the morphemization of {Geuraegajigo} based on a spontaneous speech corpus. For this purpose, the distributions, the semantic functions, and the intonational phrase pattterns of the connective {Geuraegajigo} have been analyzed based on the corpus. The results are as follow; at first, coalescence that comes with a morphemization process was found, resulting in many variations. Secondly, there are three functions of it: [Direct/Indirect interrelationship], [Enumerate conjunction], and [Discourse marker]. And this semantic/functional diversity has many similarities with conjunctive adverbs. Lastly, intonational phrase patterns of {Geuraegajigo} accord with those of conjunctive adverbs. Especially, the discourse strategic IP pattern is connected with the short variation type. In conclusion, {Geuraegajigo} has finished turning into a conjunctive adverb through morphemization.

The Statistical Relationship between Linguistic Items and Corpus Size (코퍼스 빈도 정보 활용을 위한 적정 통계 모형 연구: 코퍼스 규모에 따른 타입/토큰의 함수관계 중심으로)

  • 양경숙;박병선
    • Language and Information
    • /
    • v.7 no.2
    • /
    • pp.103-115
    • /
    • 2003
  • In recent years, many organizations have been constructing their own large corpora to achieve corpus representativeness. However, there is no reliable guideline as to how large corpus resources should be compiled, especially for Korean corpora. In this study, we have contrived a new statistical model, ARIMA (Autoregressive Integrated Moving Average), for predicting the relationship between linguistic items (the number of types) and corpus size (the number of tokens), overcoming the major flaws of several previous researches on this issue. Finally, we shall illustrate that the ARIMA model presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, corpus representativeness and linguistic comprehensiveness.

  • PDF

GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

  • Oh, So-Yeon;Kim, Ji-Hyeon;Kim, Seo-Jin;Nam, Hee-Jo;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.16 no.3
    • /
    • pp.75-77
    • /
    • 2018
  • Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

Corpus Linguistics as Necessary Concept for Korean Lexicography (뭉치 언어학 : 사전 편찬의 필수적 개념)

  • Lee, Sang-Sup
    • Annual Conference on Human and Language Technology
    • /
    • 1989.10a
    • /
    • pp.73-76
    • /
    • 1989
  • 기존 한국어 사전들은 자연 언어로서의 한국어에 대한 실질적 조사 연구에 근거하고 있지 않다는 점에서 치명적 결함을 안고 있다. 최근 유럽에서 개발 응용되고 있는 ${\ulcorner}$뭉치 언어학${\lrcorner}$(corpus linguistics) 은 컴퓨터의 급격한 발전에 힘입어 대규모 용량의 자연언어 자료를 다각적으로 처리할 수 있는 방법을 고안할 수 있게 해주고 있다. 예컨대 영국 버밍엄 대학의 COBUILD 계획은 전혀 새로운 개념의 영어 사전을 편찬하는 데에 성공했다. 한국어 사전의 편찬도 뭉치 언어학적 방법의 도입으로 가능할 것으로 믿어, 필자가 작성한 작은 ${\ulcorner}$뭉치${\lrcorner}$로부터의 실례를 제시한다.

  • PDF

-eullanjira Construction of the Southwestern Dialect in Korea (서남방언의 '-을란지라' 구문 연구)

  • KIM, Ji-eun
    • Korean Linguistics
    • /
    • v.74
    • /
    • pp.1-24
    • /
    • 2017
  • This paper investigated -eullanjira sentence as a kind of construction of the Southwestern dialect in Korea. Five informants were selected to form the main corpus of -eullanjira. Through analyzing the corpus, its semantic, syntactic and morphological characteristics were figured out. Firstly, a view of construction grammar was adopted to capture the semantic and syntactic characteristics of -eullanjira. The construction of -eullanjira was established as "Xdo Yeullanjira Z". Syntactically, -do was found to be a common auxiliary particle, which allowed nouns, adverbs, verbs and adjectives to appear at the position of X, while only verbs and adjectives could appear at the position of Y. Subject-honorific, causative and passive prefinal endings could coexist with Y, while tense and modal prefinal endings could not. Z was an embedded clause, which had the semantic feature of [-DOUBT], meaning 'it should be done undoubtedly'. The formation of -eullanjira was next examined both diachronically and synchronically. It was found there was a conjuntive ending of Middle Korean, corresponding -eullanjira, namely, -landai. Finally, -eullanjira was newly analyzed as [[-eulla-]+[-n-ji-ra]].

KKMA : A Tool for Utilizing Sejong Corpus based on Relational Database (꼬꼬마 : 관계형 데이터베이스를 활용한 세종 말뭉치 활용 도구)

  • Lee, Dong-Joo;Yeon, Jong-Heum;Hwang, In-Beom;Lee, Sang-Goo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1046-1050
    • /
    • 2010
  • Corpus is widely used as a fundamental resource for various purposes in linguistic studies. There are several large corpora such as Sejong corpus in Korea. However, it is hard to find a tool utilizing such large corpora. In this paper, we propose a method of utilizing Sejong corpus based on the relational database. We designed the relational database scheme to store corpus and implemented a Web-based application so that many researchers can easily access and utilize the Sejong corpus.

Using Corpora for the Study of Word-Formation: A Case Study in English Negative Prefixation

  • Kwon, Heok-Seung
    • Korean Journal of English Language and Linguistics
    • /
    • v.1 no.3
    • /
    • pp.369-386
    • /
    • 2001
  • This paper will show that traditional approaches to the derivation of different negative words have been of an essentially hypothetical nature, based on either linguists' intuitions or rather scant evidence, and that native-speaker dictionary entries show meaning potentials (rather than meanings) which are in fact linguistic and cognitive prototypes. The purpose of this paper is to demonstrate that using a large corpus of natural language can provide better answers to questions about word-formation (i.e., with particular reference to negative prefixation) than any other source of information.

  • PDF