• Title/Summary/Keyword: Corpus Frequency

Search Result 166, Processing Time 0.024 seconds

Characteristics of Intermediate/Advanced Korean Inter-Englishes: A Corpus-Linguistic Analysis. (우리나라 중.상급학습자 영어의 특징 : 말뭉치 언어학적 분석)

  • 안성호;이영미
    • Korean Journal of English Language and Linguistics
    • /
    • v.4 no.1
    • /
    • pp.83-102
    • /
    • 2004
  • The purpose of this paper is to find out some major characteristics of intermediate-advanced Korean learners' English by corpus- linguistically analyzing their essays in comparison with native speakers'. We construct a corpus of CBT TOEFL essays by Korean learners, NNS1 (94076 words in 402 texts), and its sub-corpus, NNS2 (14291 words in 45 texts), and then a corpus of model essays written or meticulously edited by native speakers, NS (14833 words in 35 texts). We compare NNS1 and NNS2 with NS, and with some other corpora, in terms of high-frequency words, and show that Korean learners' writings have more features of informal writing than those of formal writing, which is in accord with the reports in Granger (1998) that EFL writings by European advanced learners are characterized by informality.

  • PDF

A Study on the optimal text corpus for company names (한국어최적상호명코퍼스설계에관한연구)

  • Lee, Sun-Jung
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.5
    • /
    • pp.747-754
    • /
    • 2004
  • In this paper, we obtain an optimal corpus that can represent its characteristics very well from the baseline corpus which consists of unique 1,566,943 names among company names in a directory assistance serve (114). Two kinds of optimal solutions ared considered to obtain the optimal corpus. The first solution is to find phonetically balanced corpus (PBC), which are the minimum set including all possible triphones in the baseline corpus. The second solution is to find the phonetically distributed corpus (PDC), which is a minimum set representing the frequency characteristics of triphones in the baseline corpus. We can obtain 8,699 words as the PBC and 16,783 words (similarity measure R = 0.92) as PDC, respectively. These corpora can be used for the development of speech recognition and speech synthesis.

  • PDF

Studying the frequencies of sentence pattern for a entence patterns dictionary (문형 사전을 위한 문형 빈도 조사)

  • Kim Yu-Mi
    • Korean Journal of Cognitive Science
    • /
    • v.16 no.2
    • /
    • pp.123-140
    • /
    • 2005
  • The purpose of this paper is to examine the frequency and usage of sentence patterns appearing in electronic dictionaries used in Korean language education in order to design an automatic sentence patterns checking. First, the concept of sentence patterns is defined and it is classified into sentence structure patterns and sentencial expression patterns. Sentence structure patterns and sentencial expression patterns are analyzed how they are expressed in the Korean Learner's Corpus. learner's Corpus is built into the Standard Corpus, which all Korean Learners must learn, and the Errors Corpus made by learners. From these research, we will find out how frequently the Sentential Patterns are being used in the Standard Corpus which has been made of Korean Texts and how the Sentential Pattern are being used in the Errors Corpus which were constructed from Korean learner's writings. Finally, having described the Sentential Patterns on the Sentential Electric Dictionary, we determine the optimum speed in the search for the Sentential Pattern.

  • PDF

A Study on the Voice Onset Time of English Voiceless Stops in the Buckeye Corpus (벅아이 코퍼스를 이용한 영어 무성파열음의 VOT 연구)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.33-40
    • /
    • 2012
  • The purpose of this paper is to investigate the voice onset time (VOT) of the English voiceless stops [p, t, k] found in the Buckeye Corpus of Conversational Speech [1]. Three young female speakers were chosen for this study and their VOT values were semi-automatically extracted along with other factors. The factors used for the analysis were place of articulation, location in word, syllabic stress, content word or not, word frequency calculated from the corpus, and the speech rate expressed in syllables per second. Results showed that, for the three places of articulation of each speaker, all the factors had a statistically significant effect on the VOT values. This paper has significance in that the materials used for the analysis were from a corpus of spontaneous natural English speech.

A Comparison of the Constructions Make / Take a Decision in Malaysian English with the Supervarieties

  • Christina Sook Beng Ong
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.1
    • /
    • pp.43-59
    • /
    • 2023
  • This study aims to compare the structures of light verb constructions (LVCs) taking decision as the deverbal noun in Malaysian English, British English and American English. A general corpus made up of Internet forum threads from Lowyat.Net, was created to represent Malaysian English while the British National Corpus (BNC) and Corpus of Contemporary American English (COCA) were used to represent the supervarieties. Light verbs make and take are found to be heading deverbal noun decision. Differences are observed in the use of articles. The frequency of Malaysian English LVCs without article is the highest while supervarieties LVCs prefer indefinite article. The high occurrences of LVCs without articles in Malaysian English can be attributed to the influence from Malaysian substrate languages. Findings also show that descriptive adjective is the most frequently used modifier in all three varieties of English. This suggests the standard LVC structure, comprising a light verb, the indefinite article, and a deverbal noun is no longer rigidly adhered to even among the native speakers of English.

Relative Clauses in a Modern Diachronic Corpus of Singapore English

  • Lee, Kit Mun
    • Asia Pacific Journal of Corpus Research
    • /
    • v.1 no.1
    • /
    • pp.31-60
    • /
    • 2020
  • This paper investigates changes in relativization in Singapore English broadsheet newspapers from 1993 to 2016. One of the first diachronic studies in Singapore English (SgE), it also explores corresponding data from the diachronic Siena-Bologna (SiBol) news corpus. As SgE is in the endonormative stabilization phase in Schneider's (2007) Dynamic Model of postcolonial Englishes, divergence from British English (BrE) is to be expected. In this study, the dataset is a new Singapore English Newspaper (SEN) corpus compiled from local news articles in 1993, 2005 and 2016, and the corpus tool employed is Sketch Engine. The results reveal changes in relativization practices in SEN over the given period, many of which occur in a similar pattern as those identified in SiBol, albeit at varying rates of change. Most significant of these include a sharp decline in the which relativizer in restrictive relative clauses with non-animate antecedents, complemented by a rise in that. The change has been so rapid that although which relative clauses were more common than that clauses in 1993, that has subsequently overtaken which for both the corpora. One shift in SEN that is different from SiBol is the increase in frequency of non-restrictive relative clauses in SgE. The likely motivators for the changes in the two varieties are identified as colloquialization, densification and prescriptivism. The effect each of these factors could have had on the varieties are discussed, as well as the implications that the findings have on our understanding of the evolutionary status of SgE as a postcolonial variety.

Frequency of grammar items for Korean substitution of /u/ for /o/ in the word-final position (어말 위치 /ㅗ/의 /ㅜ/ 대체 현상에 대한 문법 항목별 출현빈도 연구)

  • Yoon, Eunkyung
    • Phonetics and Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.33-42
    • /
    • 2020
  • This study identified the substitution of /u/ for /o/ (e.g., pyəllo [pyəllu]) in Korean based on the speech corpus as a function of grammar items. Korean /o/ and /u/ share the vowel feature [+rounded], but are distinguished in terms of tongue height. However, researchers have reported that the merger of Korean /o/ and /u/ is in progress, making them indistinguishable. Thus, in this study, the frequency of the phonetic manifestation /u/ of the underlying form of /o/ for each grammar item was calculated in The Korean Corpus of Spontaneous Speech (Seoul Corpus 2015) which is a large corpus from a total of 40 speakers from Seoul or Gyeonggi-do. It was then confirmed that linking endings, particles, and adverbs ending with /o/ in the word-final position were substituted for /u/ approximately 50% of the stimuli, whereas, in nominal items, they were replaced at a frequency of less than 5%. The high rates of substitution were the special particle "-do[du]" (59.6%) and the linking ending "-go[gu]" (43.5%) among high-frequency items. Observing Korean pronunciation in real life provides deep insight into its theoretical implications in terms of speech recognition.

A Study on the Use of Genitive Particle '의': Focusing on the analysis of Korean Learners Corpus (한국어 학습자의 관형격 조사 '의' 사용 양상 연구: 학습자 말뭉치 분석을 중심으로)

  • Ji-Young Sim;Soo-Hyun Lee
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.26 no.3
    • /
    • pp.433-442
    • /
    • 2023
  • The purpose of this study is to reveal the Korean learners' usage pattern of '의', the genitive particle, according to semantic classification, so that it can be referred to in determining the contents and methods of related education. The method of this study adopts a quantitative analysis using learners corpus established by National Institute of Korean Language. As a result of the analysis, as proficiency increases, the overall frequency of '의' increases and the number of meaning senses used increases. However, the frequency of errors also increases with it. As for the usage pattern of each sense, the meaning of 'ownership, belonging' is the most frequent, and followed by 'acting entity', 'kinship, social relations', and 'relationship(area)'. In conclusion, the meanings of 'acting subjects' and 'relationships(area) need to be supplemented with explicit education. Other meanings need to be discussed, and decisions should be made in consideration of learning purpose and proficiency.

A Reconsideration of Asymmetries of Bracketing Paradoxes in English Derivation: a Corpus-based Approach

  • Kim, Jin-hyung
    • Journal of English Language & Literature
    • /
    • v.55 no.3
    • /
    • pp.475-495
    • /
    • 2009
  • In this paper, I discuss some asymmetries of bracketing paradoxes from a corpus-based perspective. Through a critical examination of previous analyses of bracketing paradoxes, it is demonstrated that the cases of apparent asymmetries of bracketing paradoxes are consistently accounted for when combined with the frequency-based parsability in morphological processing. Based on the relative frequency, this paper argues that bracketing paradoxes are well-atttested when their immediate bases are frequent and productive enough to be accessed as a unit and stored as such in memory. This is an extension of Hay 2002 which conducted a comprehensive survey of differential frequency effects in suffix pairs. A frequency-based approach to bracketing paradoxes adopted in this paper can be a challenge to the conventional formal theory by assuming a major role of language use and have the potential to significantly advance our understanding of the asymmetries observed in the real language world.

Phonological processes of vowels from orthographic to pronounced words in the Buckeye Corpus by sex and age groups

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.25-31
    • /
    • 2018
  • This paper investigated the phonological processes of monophthongs and diphthongs in the pronounced words present in the Buckeye Corpus and compared the frequency distribution of these processes by sex and age groups to provide a clearer understanding of spoken English to linguists and phoneticians. Both orthographic and pronounced words were extracted from the transcribed label scripts of the Buckeye Corpus using R. Next, the phonological processes of monophthongs and diphthongs in the orthographic and pronounced labels were tabulated using R scripts, and a frequency distribution by vowel process types, as well as sex and age groups, was created. The results revealed that 95% of the orthographic words contained the same number of syllables, whereas 5% had different numbers of vowels, thereby proving that speakers tend to preserve vowels in spontaneous speech. In addition, deletion processes were preferred in natural speech. Most vowel deletions occurred with an unstressed syllable. Chi-square tests were performed to calculate dependence in the distribution of phonological process types for male and female groups and young and old groups. The results showed a very strong correlation. This finding indicates that vowel processes occurred in approximately the same pattern in natural and spontaneous speech data regardless of sex and age, as well as whether or not the vowel processes were identical. Based on these results, the author concludes that an analysis of phonological processes in spontaneous speech corpora can greatly enhance practical understanding of spoken English.