• Title/Summary/Keyword: spoken corpus analysis

Search Result 37, Processing Time 0.025 seconds

Modal Auxiliary Verbs in Japanese EFL Learners' Conversation: A Corpus-based Study

  • Nakayama, Shusaku
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.23-34
    • /
    • 2021
  • This research examines Japanese non-native speakers' (JNNS) modal auxiliary verb use from two different perspectives: frequency of use and preferences for modalities. Additionally, error analysis is carried out to identify errors in modal use common among JNNSs. Their modal use is compared to that of English native speakers within a spoken dialogue corpus which is part of the International Corpus Network of Asian Learners' English. Research findings show at a statistically significant level that when compared to native speakers, JNNSs underuse past forms of modals and infrequently convey epistemic modality, indicating the possibility that JNNSs fail to express their opinions or thoughts indirectly when needed or to convey politeness appropriately. Error analysis identifies the following three types of common errors: (1) the use of incorrect tenses of modal verb phrases, (2) the use of inflected verb forms after modals, and (3) the non-use of main verbs after modals. The first type of error is largely because JNNSs do not master how to express past meanings of modals. The second and third types of errors seem to be due to first language transfer into second language acquisition and JNNSs' overgeneralization of the subject-verb agreement rules to modals respectively.

Acquisition and Development of particles of Beginner Level Korean Language Learners (초급 한국어 학습자의 조사 습득 및 발달 연구)

  • 이승연;이유경;최은지;이선영
    • Language Facts and Perspectives
    • /
    • v.48
    • /
    • pp.505-541
    • /
    • 2019
  • This research aims to analyze Korean language learners' spoken corpus to reveal their acquisition order and development patterns of particles. To this end, we collected free conversation data of beginning level Korean language learners over five months and constructed a corpus. It was confirmed that particle acquisition takes place over four stages based on the frequency of particle use and its accuracy. The stages of development were first 'ey, un/nun, i/ka(nominative), ul/lul', second 'eyse, hako(conjunction), to, hako(adverbial)', third '(u)lo, pota, man, eykey, kkaci, puthe, kkeyse, ui', and fourth 'hanthey, (i)na(conjunction), wa/kwa(conjunction), kkey, (i)lang(adverbial), eykeyse, mata, wa/kwa(adverbial), (i)na(auxiliary particle), pakkey, (i)lang(conjunction)'. Based on these findings, the characteristics shown in the particle use of beginning level learners are as the following. First, case markers start to develop foremost. Second, the accuracy of each particle use tends to decrease slightly over time. Third, the frequency of some particles was observed to suddenly increase and then decrease again at a certain period. Fourth, the order of most, but not all particles' appearance seemed to be related to the order of being introduced in textbooks. It is important that this research provides implications for grammar education when establishing Korean language education curriculum or developing grammar syllabus.

Phonological processes of consonants from orthographic to pronounced words in the Buckeye Corpus

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.55-62
    • /
    • 2019
  • This paper investigates the phonological processes of consonants in pronounced words in the Buckeye Corpus and compares the frequency distribution of these processes to provide a clearer understanding of conversational English for linguists and teachers. Both orthographic and pronounced words were extracted from the transcribed label scripts of the Buckeye Corpus. Next, the phonological processes of consonants in the orthographic and pronounced labels were tabulated separately by onsets and codas, and a frequency distribution by consonant process types was examined. The results showed that the majority of the onset clusters were pronounced as the same sounds in the Buckeye Corpus. The participants in the corpus were presumed to speak semiformally. In addition, the onsets have fewer deletions than the codas, which might be related to the information weight of the syllable components. Moreover, there is a significant association and strong positive correlation between the phonological processes of the onsets and codas in men and women. This paper concludes that an analysis of phonological processes in spontaneous speech corpora can contribute to a practical understanding of spoken English. Further studies comparing the current phonological process data with those of other languages would be desirable to establish universal patterns in phonological processes.

Teaching Grammar for Spoken Korean to English-speaking Learners: Reported Speech Marker '-dae'. (영어권 학습자를 위한 한국어 구어 문법 교육 - 보고 표지 '-대'를 중심으로 -)

  • Kim, Young A;Cho, In Jung
    • Journal of Korean language education
    • /
    • v.23 no.1
    • /
    • pp.1-23
    • /
    • 2012
  • The development of corpus in recent years has attracted increased research on spoken Korean. Nevertheless, these research outcomes are yet to be meaningfully and adequately reflected in Korean language textbooks. The reported speech marker '-dae' is one of these areas that need more attention. This study investigates whether or not in textbooks '-dae' is clearly explained to English-speaking learners to prevent confusion and misuse. Based on a contrastive analysis of Korean and English, this study argues three points: Firstly, '-dae' should be introduced to Korean learners as an independent sentence ender rather than a contracted form of '-dago hae'. Secondly, it is necessary to teach English-speaking learners that '-dae' is not equivalent to the English report speech form. It functions more or less as a third person marker in Korean. Learners should be informed that '-dae' is used for statements in English, if those statements were hearsay but the source of information does not need to be specified. This is a very distinctive difference between Korean and English and should be emphasized in class when 'dae' is taught. Thirdly, '-dae' should be introduced before indirect speech constructions, because it is mainly used in simple statements and the frequency of '-dae' is very high in spoken Korean.

Phoneme distribution and phonological processes of orthographic and pronounced phrasal words in light of syllable structure in the Seoul Corpus (음절구조로 본 서울코퍼스의 글 어절과 말 어절의 음소분포와 음운변동)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.1-9
    • /
    • 2016
  • This paper investigated the phoneme distribution and phonological processes of orthographic and pronounced phrasal words in light of syllable structure in the Seoul Corpus in order to provide linguists and phoneticians with a clearer understanding of the Korean language system. To achieve the goal, the phrasal words were extracted from the transcribed label scripts of the Seoul Corpus using Praat. Following this, the onsets, peaks, codas and syllable types of the phrasal words were analyzed using an R script. Results revealed that k0 was most frequently used as an onset in both orthographic and pronounced phrasal words. Also, aa was the most favored vowel in the Korean syllable peak with fewer phonological processes in its pronounced form. The total proportion of all diphthongs according to the frequency of the peaks in the orthographic phrasal words was 8.8%, which was almost double those found in the pronounced phrasal words. For the codas, nn accounted for 34.4% of the total pronounced phrasal words and was the varied form. From syllable type classification of the Corpus, CV appeared to be the most frequent type followed by CVC, V, and VC from the orthographic forms. Overall, the onsets were more prevalent in the pronunciation more than the codas. From the results, this paper concluded that an analysis of phoneme distribution and phonological processes in light of syllable structure can contribute greatly to the understanding of the phonology of spoken Korean.

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

  • Chung, Hyun-Song;Huckvale, Mark
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.77-91
    • /
    • 2001
  • This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.

  • PDF

English Predicate Inversion: Towards Data-driven Learning

  • Kim, Jong-Bok;Kim, Jin-Young
    • Journal of English Language & Literature
    • /
    • v.56 no.6
    • /
    • pp.1047-1065
    • /
    • 2010
  • English inversion constructions are not only hard for non-native speakers to learn but also difficult to teach mainly because of their intriguing grammatical and discourse properties. This paper addresses grammatical issues in learning or teaching the so-called 'predicate inversion (PI)' construction (e.g., Equally important in terms of forest depletion is the continuous logging of the forests). In particular, we chart the grammatical (distributional, syntactic, semantic, pragmatic) properties of the PI construction, and argue for adata-driven teaching for English grammar. To depart from the arm-chaired style of grammar teaching (relying on author-made simple sentences), our teaching method introduces a datadriven teaching. With total 25 university students in a grammar-related class, students together have analyzed the British Component of the International Corpus of English (ICE-GB), containing about one million words distributed across a variety of textual categories. We have identified total 290 PI sentences (206 from spoken and 87 from written texts). The preposed syntactic categories of the PI involve five main types: AdvP, PP, VP(ed/ing), NP, AP, and so, all of which function as the complement of the copula. In terms of discourse, we have observed, supporting Birner and Ward's (1998) observation that these preposed phrases represent more familiar information than the postposed subject. The corpus examples gave us the three possible types: The preposed element is discourse-old whereas the postposed one is discourse-new as in Putting wire mesh over a few bricks is a good idea. Both preposed and postposed elements can also be discourse new as in But a fly in the ointment is inflation. These two elements can also be discourse old as in Racing with him on the near-side is Rinus. The dominant occurrence of the PI in the spoken texts also supports the view that the balance (or scene-setting) in information structure is the main trigger for the use of the PI construction. After being exposed to the real data and in-depth syntactic as well as informationstructure analysis of the PI construction, it is proved that the class students have had a farmore clear understanding of the construction in question and have realized that grammar does not mean to live on by itself but tightly interacts with other important grammatical components such as information structure. The study directs us toward both a datadriven and interactive grammar teaching.

Phonological processes of consonants from orthographic to pronounced words in the Seoul Corpus

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.1-7
    • /
    • 2020
  • This paper investigates the phonological processes of consonants in pronounced words in the Seoul Corpus, and compares the frequency distribution of these processes to provide a clearer understanding of conversational Korean to linguists and teachers. To this end, both orthographic and pronounced words were extracted from the transcribed label scripts of the Seoul Corpus. Next, the phonological processes of consonants in the orthographic and pronounced forms were tabulated separately after syllabifying the onsets and codas, and major consonantal processes were examined. First, the results showed that the majority of the orthographic consonants' sounds were pronounced the same way as their pronounced forms. Second, more than three quarters of the onsets were pronounced as the same forms, while approximately half of the codas were pronounced as variants. Third, the majority of different onset and coda symbols were primarily caused by deletions and insertions. Finally, the five phonological process types accounted for only 12.4% of the total possible procedures. Based on these results, this paper concludes that an analysis of phonological processes in spontaneous speech corpora can improve the practical understanding of spoken Korean. Future studies ought to compare the current phonological process data with those of other languages to establish universal patterns in phonological processes.

Modality in Korean Learners' Spoken Interlanguage

  • Park, Hyeson
    • English Language & Literature Teaching
    • /
    • v.18 no.1
    • /
    • pp.197-216
    • /
    • 2012
  • This study examines spoken interlanguage of Korean learners of English, focusing on the distribution of modal verbs and devices of epistemic modality. (Semi-) spontaneous speech data were collected from four students participating in a self-organized study group for seven months, which produced a corpus of about 55,000 words. The data analysis reveals the following: 1) The frequency of the modal verbs produced by the learners was lower than that of native speakers; 1.99 vs. 2.32 tokens per 100 words. The range of the modal verbs used by the learners was also very limited, with over-reliance on can (43%). 2) The grammatical categories of the devices marking epistemic modality were in the order of adverbs, lexical verbs, and modal verbs, with a high frequency of a few items in each category. 3) Lexical items conveying certainty and modals of obligation were preferred over markers of weaker commitment, resulting in speech characterized by firmer assertions and a more authoritative tone, a potential cause for pragmatic failure. 4) A weak developmental change was observed in the frequency of modal verbs, but not in their functions over the seven month period of data collection. L1 influence, L2 proficiency, mode of communication, and instruction effects are discussed as possible variables involved in the distribution patterns observed.

  • PDF

Study of Emotion in Speech (감정변화에 따른 음성정보 분석에 관한 연구)

  • 장인창;박미경;김태수;박면웅
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2004.10a
    • /
    • pp.1123-1126
    • /
    • 2004
  • Recognizing emotion in speech is required lots of spoken language corpus not only at the different emotional statues, but also in individual languages. In this paper, we focused on the changes speech signals in different emotions. We compared the features of speech information like formant and pitch according to the 4 emotions (normal, happiness, sadness, anger). In Korean, pitch data on monophthongs changed in each emotion. Therefore we suggested the suitable analysis techniques using these features to recognize emotions in Korean.

  • PDF