DOI QR코드

DOI QR Code

The Korean Corpus of Spontaneous Speech

  • Received : 2015.04.20
  • Accepted : 2015.06.09
  • Published : 2015.06.30

Abstract

This paper describes the development of the Korean corpus of spontaneous speech, also called the Seoul corpus. The corpus contains the audio recording of the interview-style spontaneous speech from the 40 native speakers of Seoul Korean. The talkers are divided into four age groups; talkers in their teens, twenties, thirties and forties. Each age group has ten talkers, five males and five females. The method used to elicit and record the speech is described. The corpus containing around 220,000 phrasal words was phonemically labeled along with information on the boundaries for Korean phrasal words and utterances, which were additionally romanized. According to the test result of labeling consistency, the inter-labeler agreement on phoneme identification was 98.1% and the mean deviation on boundary placement was 9.04 msec. The corpus will be made available for free to the research community in March, 2015.

Keywords

References

  1. Pitt, M. A., Dilley, L., Johnson, K., Hume, E., Kiesling, S. and W. D. Raymond. (2005). The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication 45, 89-95. https://doi.org/10.1016/j.specom.2004.09.001
  2. Fosler-Lussier, Eric, Dilley, Laura, Tyson, Na'im & Pitt, Mark (2007). The Buckeye Corpus of Speech: Updates and Enhancements. Proceedings of Interspeech 2007, Antwerp, Belgium.
  3. Boersma, Paul & Weenink, David (2012). Praat: doing phonetics by computer [Computer program]. Version 5.3.04, retrieved 12 January 2012 from http://www.praat.org/
  4. Yun, Weonhee (2003). Multiple acoustic cues for Korean stops and automatic speech recognition. Ph.D thesis. University of Edinburgh.
  5. Cohen, Jacob (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1), 37-46. https://doi.org/10.1177/001316446002000104

Cited by

  1. Phonological processes of vowels from orthographic to pronounced words in the Buckeye Corpus by sex and age groups* vol.10, pp.2, 2018, https://doi.org/10.13064/KSSS.2018.10.2.025
  2. Phonological processes of vowels from orthographic to pronounced words in the Buckeye Corpus by sex and age groups* vol.10, pp.2, 2018, https://doi.org/10.13064/KSSS.2018.10.2.25
  3. Effects of gender, age, and individual speakers on articulation rate in Seoul Korean spontaneous speech vol.10, pp.4, 2018, https://doi.org/10.13064/KSSS.2018.10.4.019