• Title/Summary/Keyword: end-to-end speech translation

Search Result 6, Processing Time 0.019 seconds

English-Korean speech translation corpus (EnKoST-C): Construction procedure and evaluation results

  • Jeong-Uk Bang;Joon-Gyu Maeng;Jun Park;Seung Yun;Sang-Hun Kim
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.18-27
    • /
    • 2023
  • We present an English-Korean speech translation corpus, named EnKoST-C. End-to-end model training for speech translation tasks often suffers from a lack of parallel data, such as speech data in the source language and equivalent text data in the target language. Most available public speech translation corpora were developed for European languages, and there is currently no public corpus for English-Korean end-to-end speech translation. Thus, we created an EnKoST-C centered on TED Talks. In this process, we enhance the sentence alignment approach using the subtitle time information and bilingual sentence embedding information. As a result, we built a 559-h English-Korean speech translation corpus. The proposed sentence alignment approach showed excellent performance of 0.96 f-measure score. We also show the baseline performance of an English-Korean speech translation model trained with EnKoST-C. The EnKoST-C is freely available on a Korean government open data hub site.

Development of Korean-to-English and English-to-Korean Mobile Translator for Smartphone (스마트폰용 영한, 한영 모바일 번역기 개발)

  • Yuh, Sang-Hwa;Chae, Heung-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.3
    • /
    • pp.229-236
    • /
    • 2011
  • In this paper we present light weighted English-to-Korean and Korean-to-English mobile translators on smart phones. For natural translation and higher translation quality, translation engines are hybridized with Translation Memory (TM) and Rule-based translation engine. In order to maximize the usability of the system, we combined an Optical Character Recognition (OCR) engine and Text-to-Speech (TTS) engine as a Front-End and Back-end of the mobile translators. With the BLEU and NIST evaluation metrics, the experimental results show our E-K and K-E mobile translation equality reach 72.4% and 77.7% of Google translators, respectively. This shows the quality of our mobile translators almost reaches the that of server-based machine translation to show its commercial usefulness.

Some Notational Problems of the translation of Japanese stops[k, t] and affricates[t s ,$t{\int}$] into Korean (일본어 파열음[k, t]과 파찰음[t s , $t{\int}$ 의 국어 표기상의 문제점)

  • Lee, Young-Hee
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.187-192
    • /
    • 2007
  • The purpose of this paper is to show that the current notation of Japanese proper names in Korean has some problems. It cannot represent the different sounds between the voiced and voiceless. The purpose of this paper is also to give a more correct notation which is coherent and efficient. After introducing some general knowledge about the phonemes of Japanese language, I measured the Voice Onset Time of the stops[k, t] at the beginning, in the middle and at the end of a word, and compared the spectrogram of affricates with that of fricatives. In conclusion, Japanese voiceless [k, t ,$t{\int}$] should be written as [ㅋ,ㅌ,ㅊ] and voiced [g, d $d_3$] as [ㄱ,ㄷ,ㅈ] and the affricate[ts] as[ㅊ] in Korean.

  • PDF

Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition (음성인식기 성능 향상을 위한 영상기반 음성구간 검출 및 적응적 문턱값 추정)

  • Song, Taeyup;Lee, Kyungsun;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.321-327
    • /
    • 2015
  • In this paper, we propose an algorithm for achieving robust Visual Voice Activity Detection (VVAD) for enhanced speech recognition. In conventional VVAD algorithms, the motion of lip region is found by applying an optical flow or Chaos inspired measures for detecting visual speech frames. The optical flow-based VVAD is difficult to be adopted to driving scenarios due to its computational complexity. While invariant to illumination changes, Chaos theory based VVAD method is sensitive to motion translations caused by driver's head movements. The proposed Local Variance Histogram (LVH) is robust to the pixel intensity changes from both illumination change and translation change. Hence, for improved performance in environmental changes, we adopt the novel threshold estimation using total variance change. In the experimental results, the proposed VVAD algorithm achieves robustness in various driving situations.

A Study on Verification of Back TranScription(BTS)-based Data Construction (Back TranScription(BTS)기반 데이터 구축 검증 연구)

  • Park, Chanjun;Seo, Jaehyung;Lee, Seolhwa;Moon, Hyeonseok;Eo, Sugyeong;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.109-117
    • /
    • 2021
  • Recently, the use of speech-based interfaces is increasing as a means for human-computer interaction (HCI). Accordingly, interest in post-processors for correcting errors in speech recognition results is also increasing. However, a lot of human-labor is required for data construction. in order to manufacture a sequence to sequence (S2S) based speech recognition post-processor. To this end, to alleviate the limitations of the existing construction methodology, a new data construction method called Back TranScription (BTS) was proposed. BTS refers to a technology that combines TTS and STT technology to create a pseudo parallel corpus. This methodology eliminates the role of a phonetic transcriptor and can automatically generate vast amounts of training data, saving the cost. This paper verified through experiments that data should be constructed in consideration of text style and domain rather than constructing data without any criteria by extending the existing BTS research.

Syllabus Design and Pronunciation Teaching

  • Amakawa, Yukiko
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.235-240
    • /
    • 2000
  • In the age of global communication, more human exchange is extended at the grass-roots level. In the old days, language policy and language planning was based on one nation-state with one language. But high waves of globalizaiton have allowed extended human flow of exchange beyond one's national border on a daily basis. Under such circumstances, homogeneity in Japan may not allow Japanese to speak and communicate only in Japanese and only with Japanese people. In Japan, an advisory report was made to the Ministry of Education in June 1996 about what education should be like in the 21st century. In this report, an introduction of English at public elementary schools was for the first time made. A basic policy of English instruction at the elementary school level was revealed. With this concept, English instruction is not required at the elementary school level but each school has their own choice of introducing English as their curriculum starting April 2002. As Baker, Colin (1996) indicates the age of three as being the threshold diving a child becoming bilingual naturally or by formal instruction. Threre is a movement towards making second language acquisition more naturalistic in an educational setting, developing communicative competence in a more or less formal way. From the lesson of the Canadian immersion success, Genesee (1987) stresses the importance of early language instruction. It is clear that from a psycho-linguistic perspective, most children acquire basic communication skills in their first language apparently effortlessly and without systematic and formal instruction during the first six or seven years of life. This innate capacity diminishes with age, thereby making language learning increasingly difficult. The author, being a returnee, experienced considerable difficulty acquiring L2, and especially achieving native-like competence. There will be many hurdles to conquer until Japanese students are able to reach at least a communicative level in English. It has been mentioned that English is not taught to clear the college entrance examination, but to communicate. However, Japanese college entrance examination still makes students focus more on the grammar-translation method. This is expected to shift to a more communication stressed approach. Japan does not have to aim at becoming an official bilingual country, but at least communicative English should be taught at every level in school Mito College is a small two-year co-ed college in Japan. Students at Mito College are basically notgood at English. It has only one department for business and economics, and English is required for all freshmen. It is necessary for me to make my classes enjoyable and attractive so that students can at least get motivated to learn English. My major target is communicative English so that students may be prepared to use English in various business settings. As an experiment to introduce more communicative English, the author has made the following syllabus design. This program aims at training students speak and enjoy English. 90-minute class (only 190-minute session per week is most common in Japanese colleges) is divided into two: The first half is to train students orally using Graded Direct Method. The latter half uses different materials each time so that students can learn and enjoy English culture and language simultaneously. There are no quizes or examinations in my one-academic year program. However, all students are required to make an original English poem by the end of the spring semester. 2-6 students work together in a group on one poem. Students coming to Mito College, Japan have one of the lowest English levels in all of Japan. However, an attached example of one poem made by a group shows that students can improve their creativity as long as they are kept encouraged. At the end of the fall semester, all students are then required individually to make a 3-minute original English speech. An example of that speech contest will be presented at the Convention in Seoul.

  • PDF