• Title/Summary/Keyword: corpus linguistics

Search Result 77, Processing Time 0.023 seconds

Prosodic features and discourse functions of discourse marker 'mak'('막') ('막'의 운율적 특성과 담화적 기능)

  • Song, Inseong
    • Korean Linguistics
    • /
    • v.65
    • /
    • pp.211-236
    • /
    • 2014
  • The aim of this study is to investigate categorical characteristics of 'mak' and their discourse functions through analyzed the prosodic features of 'mak'. The previous studies of 'mak' focused on grammatical or semantic characteristics, but this study focuses on the prosodic features of 'mak' based on speech data. As a result, adverb 'mak' and discourse marker 'mak' are distinguished from prosodic boundary, duration, pause and sort of number tonal patterns. Functions of discourse marker 'mak' is as follows: Maintenance of utterance, Attention, Delay, Expression negative manner. These functions have salient prosodic features related to their functions. Consequently prosodic features are important to analyze categorical characteristics and to establish functions of 'mak'.

Building an RST-tagged Corpus and its Classification Scheme for Korean News Texts (한국어 수사구조 분류체계 수립 및 주석 코퍼스 구축)

  • Noh, Eunchung;Lee, Yeonsoo;Kim, YeonWoo;Lee, Do-Gil
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.33-38
    • /
    • 2016
  • 수사구조는 텍스트의 각 구성 성분이 맺고 있는 관계를 의미하며, 필자의 의도는 논리적인 구조를 통해서 독자에게 더 잘 전달될 수 있다. 따라서 독자의 인지적 효과를 극대화할 수 있도록 수사구조를 고려하여 단락과 문장 구조를 구성하는 것이 필요하다. 그럼에도 불구하고 지금까지 수사구조에 기초한 한국어 분류체계를 만들거나 주석 코퍼스를 설계하려는 시도가 없었다. 본 연구에서는 기존 수사구조 이론을 기반으로, 한국어 보도문 형식에 적합한 30개 유형의 분류체계를 정제하고 최소 담화 단위별로 태깅한 코퍼스를 구축하였다. 또한 구축한 코퍼스를 토대로 중심문장을 비롯한 문장 구조의 특징과 분포 비율, 신문기사의 장르적 특성 등을 살펴봄으로써 텍스트에서 응집성의 실현 양상과 구문상의 특징을 확인하였다. 본 연구는 한국어 담화 구문에 적합한 수사구조 분류체계를 설계하고 이를 이용한 주석 코퍼스를 최초로 구축하였다는 점에서 의의를 갖는다.

  • PDF

Using Small Corpora of Critiques to Set Pedagogical Goals in First Year ESP Business English

  • Wang, Yu-Chi;Davis, Richard Hill
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.17-29
    • /
    • 2021
  • The current study explores small corpora of critiques written by Chinese and non-Chinese university students and how strategies used by these writers compare with high-rated L1 students. Data collection includes three small corpora of student writing; 20 student critiques in 2017, 23 student critiques from 2018, and 23 critiques from the online Michigan MICUSP collection at the University of Michigan. The researchers employ Text Inspector and Lexical Complexity to identify university students' vocabulary knowledge and awareness of syntactic complexity. In addition, WMatrix4® is used to identify and support the comparison of lexical and semantic differences among the three corpora. The findings indicate that gaps between Chinese and non-Chinese writers in the same university classes exist in students' knowledge of grammatical features and interactional metadiscourse. In addition, critiques by Chinese writers are more likely to produce shorter clauses and sentences. In addition, the mean value of complex nominal and coordinate phrases is smaller for Chinese students than for non-Chinese and MICUSP writers. Finally, in terms of lexical bundles, Chinese student writers prefer clausal bundles instead of phrasal bundles, which, according to previous studies, are more often found in texts of skilled writers. The current study's findings suggest incorporating implicit and explicit instruction through the implementation of corpora in language classrooms to advance skills and strategies of all, but particularly of Chinese writers of English.

The Parallel Corpus Approach to Building the Syntactic Tree Transfer Set in the English-to- Vietnamese Machine Translation

  • Dien Dinh;Ngan Thuy;Quang Xuan;Nam Chi
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.382-386
    • /
    • 2004
  • Recently, with the machine learning trend, most of the machine translation systems on over the world use two syntax tree sets of two relevant languages to learn syntactic tree transfer rules. However, for the English-Vietnamese language pair, this approach is impossible because until now we have not had a Vietnamese syntactic tree set which is correspondent to English one. Building of a very large correspondent Vietnamese syntactic tree set (thousands of trees) requires so much work and take the investment of specialists in linguistics. To take advantage from our available English-Vietnamese Corpus (EVC) which was tagged in word alignment, we choose the SITG (Stochastic Inversion Transduction Grammar) model to construct English- Vietnamese syntactic tree sets automatically. This model is used to parse two languages at the same time and then carry out the syntactic tree transfer. This English-Vietnamese bilingual syntactic tree set is the basic training data to carry out transferring automatically from English syntactic trees to Vietnamese ones by machine learning models. We tested the syntax analysis by comparing over 10,000 sentences in the amount of 500,000 sentences of our English-Vietnamese bilingual corpus and first stage got encouraging result $(analyzed\;about\;80\%)[5].$ We have made use the TBL algorithm (Transformation Based Learning) to carry out automatic transformations from English syntactic trees to Vietnamese ones based on that parallel syntactic tree transfer set[6].

  • PDF

The Method for the Construction of POS Tagged Corpus based on Morpheme Ready Made Dictionary and RDBMS (형태소 기분석 사전과 DBMS를 이장한 형태소 분석 말뭉치 구축의 한 방법)

  • Cho, Jin-Hyun;Kang, Beom-Mo
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.33-40
    • /
    • 2001
  • 본 논문은 1999년도에 구축된 '150만 세종 형태소 분석 말뭉치'를 바탕으로 형태소 기분석 사전을 구축하고, 이를 토대로 후처리의 수작업을 고려한 반자동 태거를 구축하는 방법론에 대해 연구한 것이다. 분석말뭉치 구축에 있어 기존 자동 태거에 의한 자동 태깅의 문제점을 분석하고, 이미 구축된 형태분석 말뭉치를 이용해 후처리 작업이 보다 용이한 1차 가공말뭉치를 구축하는 반자동 태거의 개발과 그 방법론을 제시하는데 목적을 두고 있다. 이와 같은 논의에 따라 분석 말뭉치의 구축을 위한 태거는 일반적인 언어 처리를 위한 태거와는 다르다는 점을 주장하였고, 태거에 전적으로 의존하는 태깅 방식보다는 수작업의 편의를 제공할 수 있는 태깅 방식이 필요함을 강조하였다. 본 연구에서 제안된 반자동 태거는 전체적인 태깅 성공률과 정확도가 기존의 태거에 비해 떨어지지만 정확한 단일 분석 결과를 텍스트의 장르에 따른 편차 없이 50% 이상으로 산출하고, 해결이 어려운 어절 유형에 대해서 완전히 작업자의 판단에 맡김으로써 오류의 가능성을 줄인다. 또한 분석 어절에 대해 여러 표지를 부착함으로써 체계적이고 단계적인 후처리 작업이 가능하도록 하였다.

  • PDF

A Study of the Realization of Speech Act and Teaching-learning Contents of Korean Speculative Expressions (한국어 추측 표현의 화행 실현 양상과 교수학습 내용 연구)

  • Jeong, Mi-Jin
    • Korean Linguistics
    • /
    • v.76
    • /
    • pp.187-211
    • /
    • 2017
  • The purpose of this study is to investigate the speech act realization of speculative expressions and to present their teaching-learning contents. It is hard for Korean learners to use speculative expressions appropriately because there are various similar expressions and their meaning is distinctive in detail. This study describes speech act realizations of '-는 것 같다, -을까, -나 보다, -을걸'. All these forms have the meaning of speculations, so they are mainly used to present uncertain information or thoughts of speaker. But they show distinctive aspects. '-는 것 같다' is mainly used to present contents contrary to their counterparts' opinions or irritating for their counterparts. It is used as polite forms because it conveys meanings of uncertainty. Especially in these contexts, it performs the refusal speech acts. '-을까' has the characteristic feature in the complex forms such as '뭐랄까', '뭐라고 할까' and it performs request speech acts more frequently than '-는 것 같다'. Also it is used to express the speakers' opinions contrary to their counterparts'. '-나 보다' expresses speaker's speculations based on hearer's conditions or his speech, so it is used to respond to hearer actively and express interests unlike other speculative expressions. '-을걸' isn't used to perform request, to express interests to hearer. However, it is mainly used when speaker has the contrary assumptions or expectations to hearer's. Based on the analyze, this study presents and grades teaching-learning contents of speculative expressions.

Pragmatic Strategies of Self (Other) Presentation in Literary Texts: A Computational Approach

  • Khafaga, Ayman Farid
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.223-231
    • /
    • 2022
  • The application of computer software into the linguistic analysis of texts proves useful to arrive at concise and authentic results from large data texts. Based on this assumption, this paper employs a Computer-Aided Text Analysis (CATA) and a Critical Discourse Analysis (CDA) to explore the manipulative strategies of positive/negative presentation in Orwell's Animal Farm. More specifically, the paper attempts to explore the extent to which CATA software represented by the three variables of Frequency Distribution Analysis (FDA), Content Analysis (CA), and Key Word in Context (KWIC) incorporate with CDA decipher the manipulative purposes beyond positive presentation of selfness and negative presentation of otherness in the selected corpus. The analysis covers some CDA strategies, including justification, false statistics, and competency, for positive self-presentation; and accusation, criticism, and the use of ambiguous words for negative other-presentation. With the application of CATA, some words will be analyzed by showing their frequency distribution analysis as well as their contextual environment in the selected text to expose the extent to which they are employed as strategies of positive/negative presentation in the text under investigation. Findings show that CATA software contributes significantly to the linguistic analysis of large data texts. The paper recommends the use and application of the different CATA software in the stylistic and corpus linguistics studies.

PPEditor: Semi-Automatic Annotation Tool for Korean Dependency Structure (PPEditor: 한국어 의존구조 부착을 위한 반자동 말뭉치 구축 도구)

  • Kim Jae-Hoon;Park Eun-Jin
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.63-70
    • /
    • 2006
  • In general, a corpus contains lots of linguistic information and is widely used in the field of natural language processing and computational linguistics. The creation of such the corpus, however, is an expensive, labor-intensive and time-consuming work. To alleviate this problem, annotation tools to build corpora with much linguistic information is indispensable. In this paper, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. The most ideal way is to fully automatically create the corpus without annotators' interventions, but as a matter of fact, it is impossible. The proposed tool is semi-automatic like most other annotation tools and is designed to edit errors, which are generated by basic analyzers like part-of-speech tagger and (partial) parser. We also design it to avoid repetitive works while editing the errors and to use it easily and friendly. Using the proposed annotation tool, 10,000 Korean sentences containing over 20 words are annotated with dependency structures. For 2 months, eight annotators have worked every 4 hours a day. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.

Dependency Grammar and the Parsing of Chinese Sentences

  • Lai, Bong-Ycung-Tom;Huang, Changning
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 1994.02a
    • /
    • pp.63-72
    • /
    • 1994
  • Dependency Grammar has been used by Iinguists as the basis of the syntactic components of their grammar formalisms. It has also been used in natural langauge parsing. In China, attempts have been made to use this grammar formalism to parse Chinese sentences using corpus based techniques. This paper reviews the properties of Dependency Grammar as embodied in four axioms for the well-formedness conditions for dependency structures. It is shown that allowing mul tiple governors as done by some followers of this formalism is unnecessary. The practice of augmenting Dependency Grammar with functional labels is discussed in the light of building functional structures when the sentence is parsed. This will also facilitate semantic interpretion.retion.

  • PDF

Vowel Fundamental Frequency in Manner Differentiation of Korean Stops and Affricates

  • Jang, Tae-Yeoub
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.217-232
    • /
    • 2000
  • In this study, I investigate the role of post-consonantal fundamental frequency (F0) as a cue for automatic distinction of types of Korean stops and affricates. Rather than examining data obtained by restricting contexts to a minimum to prevent the interference of irrelevant factors, a relatively natural speaker independent speech corpus is analysed. Automatic and statistical approaches are adopted to annotate data, to minimise speaker variability, and to evaluate the results. In spite of possible loss of information during those automatic analyses, statistics obtained suggest that vowel F0 is a useful cue for distinguishing manners of articulation of Korean non-continuant obstruents having the same place of articulation, especially of lax and aspirated stops and affricates. On the basis of the statistics, automatic classification is attempted over the relevant consonants in a specific context where the micro-prosodic effects appear to be maximised. The results confirm the usefulness of this effect in application for Korean phone recognition.

  • PDF