Search | Korea Science

English-Korean speech translation corpus (EnKoST-C): Construction procedure and evaluation results

Jeong-Uk Bang;Joon-Gyu Maeng;Jun Park;Seung Yun;Sang-Hun Kim
- ETRI Journal
- /
- v.45 no.1
- /
- pp.18-27
- /
- 2023
We present an English-Korean speech translation corpus, named EnKoST-C. End-to-end model training for speech translation tasks often suffers from a lack of parallel data, such as speech data in the source language and equivalent text data in the target language. Most available public speech translation corpora were developed for European languages, and there is currently no public corpus for English-Korean end-to-end speech translation. Thus, we created an EnKoST-C centered on TED Talks. In this process, we enhance the sentence alignment approach using the subtitle time information and bilingual sentence embedding information. As a result, we built a 559-h English-Korean speech translation corpus. The proposed sentence alignment approach showed excellent performance of 0.96 f-measure score. We also show the baseline performance of an English-Korean speech translation model trained with EnKoST-C. The EnKoST-C is freely available on a Korean government open data hub site.
https://doi.org/10.4218/etrij.2021-0336 인용 PDF

A Survey of Machine Translation and Parts of Speech Tagging for Indian Languages

Khedkar, Vijayshri;Shah, Pritesh
- International Journal of Computer Science & Network Security
- /
- v.22 no.4
- /
- pp.245-253
- /
- 2022
Commenced in 1954 by IBM, machine translation has expanded immensely, particularly in this period. Machine translation can be broken into seven main steps namely- token generation, analyzing morphology, lexeme, tagging Part of Speech, chunking, parsing, and disambiguation in words. Morphological analysis plays a major role when translating Indian languages to develop accurate parts of speech taggers and word sense. The paper presents various machine translation methods used by different researchers for Indian languages along with their performance and drawbacks. Further, the paper concentrates on parts of speech (POS) tagging in Marathi dialect using various methods such as rule-based tagging, unigram, bigram, and more. After careful study, it is concluded that for machine translation, parts of speech tagging is a major step. Also, for the Marathi language, the Hidden Markov Model gives the best results for parts of speech tagging with an accuracy of 93% which can be further improved according to the dataset.
https://doi.org/10.22937/IJCSNS.2022.22.4.31 인용 PDF KSCI

A new approach technique on Speech-to-Speech Translation (신호의 복원된 위상 공간을 이용한 오디오 상황 인지)

Le, Thanh Hien;Lee, Sung-young;Lee, Young-Koo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2009.11a
- /
- pp.239-240
- /
- 2009
We live in a flat world in which globalization fosters communication, travel, and trade among more than 150 countries and thousands of languages. To surmount the barriers among these languages, translation is required; Speech-to-Speech translation will automate the process. Thanks to recent advances in Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS), one can now utilize a system to translate a speech of source language to a speech of target language and vice versa in affordable manner. The three phase process establishes that the source speech be transcribed into a (set of) text of the source language (ASR) before the source text is translated into the target text (MT). Finally, the target speech is synthesized from the target text (TTS).
https://doi.org/10.3745/PKIPS.y2009m11a.239 인용 PDF

A Speech Translation System for Hotel Reservation (호텔예약을 위한 음성번역시스템)

구명완;김재인;박상규;김우성;장두성;홍영국;장경애;김응인;강용범
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.4
- /
- pp.24-31
- /
- 1996
In this paper, we present a speech translation system for hotel reservation, KT_STS(Korea Telecom Speech Translation System). KT-STS is a speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation(dialogues between a Korean customer and a hotel reservation de나 in Japan). It consists of a Korean speech recognition system, a Korean-to-Japanese machine translation system and a korean speech synthesis system. The Korean speech recognition system is an HMM(Hidden Markov model)-based speaker-independent, continuous speech recognizer which can recognize about 300 word vocabularies. Bigram language model is used as a forward language model and dependency grammar is used for a backward language model. For machine translation, we use dependency grammar and direct transfer method. And Korean speech synthesizer uses the demiphones as a synthesis unit and the method of periodic waveform analysis and reallocation. KT-STS runs in nearly real time on the SPARC20 workstation with one TMS320C30 DSP board. We have achieved the word recognition rate of 94. 68% and the sentence recognition rate of 82.42% after the speech recognition tests. On Korean-to-Japanese translation tests, we achieved translation success rate of 100%. We had an international joint experiment in which our system was connected with another system developed by KDD in Japan using the leased line.
PDF

Korean-Japanese Speech Translation System for Hotel Reservation - Korean front desk side - (한-일 호텔예약 음성번역 시스템 - 한국 프론트데스트 측 -)

이영직;김영섬;김회린;류준형;이정철;한남용;안영목;최운천;최운천
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.204-207
- /
- 1995
Recently, ETRI developed a Korean-Japanese speech translation system for Korean front de나 side in hotel reservation task. The system consists of three sub-systems each of which is responsible for speech recognition, machine translation, and speech synthesis. This paper introduces the background of the system development and describes the functions of the sub-systems.
PDF

Probabilistic Part-Of-Speech Determination for Efficient English-Korean Machine Translation (효율적 영한기계번역을 위한 확률적 품사결정)

Kim, Sung-Dong;Kim, Il-Min
- The KIPS Transactions:PartB
- /
- v.17B no.6
- /
- pp.459-466
- /
- 2010
Natural language processing has several ambiguity problems, and English-Korean machine translation especially includes those problems to be solved in each translation step. This paper focuses on resolving part-of-speech ambiguity of English words in order to improve the efficiency of English analysis, which is in part of efforts for developing practical English-Korean machine translation system. In order to improve the efficiency of the English analysis, the part-of-speech determination must be fast and accurate for being integrated with machine translation system. This paper proposes the probabilistic models for part-of-speech determination. We use Penn Treebank corpus in building the probabilistic models. In experiment, we present the performance of the part-of-speech determination models and the efficiency improvement of the machine translation system by the proposed part-of-speech determination method.
https://doi.org/10.3745/KIPSTB.2010.17B.6.459 인용 PDF KSCI

A Study on the Korean Parts-of-Speech for Korean-English Machine Translation (기계번역용 한국어 품사에 관한 연구)

송재관;박찬곤
- Journal of the Korea Society of Computer and Information
- /
- v.5 no.4
- /
- pp.48-54
- /
- 2000
This Paper classified korean Parts-of-speech for korean-english machine translation and investigated morphological characters of each parts-of-speech. Korean standard grammar classified parts-of-speech by semantic, functional and formal character. Many rules make a difficulties the understanding of grammar structure and parts-of-speech classification and it is necessary to preprocess at machine translation. This paper classified korean parts-of-speech by one rule. The parts-of-speech suggested in this paper have a same syntactic role and same parts-of-speech with english dictionary, and express the structure of korean sentence. And also it can make target language by pattern matching in korean-english translation.
PDF

A Model of English Part-Of-Speech Determination for English-Korean Machine Translation (영한 기계번역에서의 영어 품사결정 모델)

Kim, Sung-Dong;Park, Sung-Hoon
- Journal of Intelligence and Information Systems
- /
- v.15 no.3
- /
- pp.53-65
- /
- 2009
The part-of-speech determination is necessary for resolving the part-of-speech ambiguity in English-Korean machine translation. The part-of-speech ambiguity causes high parsing complexity and makes the accurate translation difficult. In order to solve the problem, the resolution of the part-of-speech ambiguity must be performed after the lexical analysis and before the parsing. This paper proposes the CatAmRes model, which resolves the part-of-speech ambiguity, and compares the performance with that of other part-of-speech tagging methods. CatAmRes model determines the part-of-speech using the probability distribution from Bayesian network training and the statistical information, which are based on the Penn Treebank corpus. The proposed CatAmRes model consists of Calculator and POSDeterminer. Calculator calculates the degree of appropriateness of the partof-speech, and POSDeterminer determines the part-of-speech of the word based on the calculated values. In the experiment, we measure the performance using sentences from WSJ, Brown, IBM corpus.
PDF

A New Morphological Analysis for the Spoken Language Translation System (음성언어 번역 시스템을 위한 새로운 형태소 분석)

양승원;김재훈
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.4
- /
- pp.17-22
- /
- 1999
It is difficult to integrate the speech processing systems and machine translation system in the spoken language translation system by reason that each system uses its own data and basic processing unit. So, we need a common I/O unit which is used in the whole system. In this paper, we propose a Pscudo-Morpheme as the interface between speech processing systems and language translation system. We implement a morphological analysis system for Pseudo-morpheme. The speech processing system using this pseudo-morpheme can get better result than other systems using the phrase or the general morpheme. So, the quality of the whole spoken language translation system can be improved. The analysis-ratio of our implemented system is 98.9%. This is similar to the common morphological analysis systems.
PDF

Effects of Name Agreement and Word Frequency on the English-Korean Word Translation Task (영어-한국어 단어번역과제에서 이름-일치도와 단어빈도의 효과)

Koo, Min-Mo;Nam, Ki-Chun
- MALSORI
- /
- no.61
- /
- pp.31-48
- /
- 2007
This study investigated the roles of name agreement and word frequency in the English-Korean word translation task. Using the low-frequency homonyms with low name agreement as stimuli, Experiment 1 revealed that the name agreement of materials is a determinant which could modulate times to translate English words into Korean equivalents. On the contrary, Experiment 2 showed that the name agreement of materials does not play a decisive role in the translation task, using the low-frequency homonyms having high name agreement as stimuli. In Experiment 3, we identified that the frequency effects observed from previous two experiments are indeed brought about during the lexical access. Our findings suggest that the word frequencies of materials have a strong influence on English-Korean word translation times, and homonyms are represented independently each other in the lexeme level.
PDF

Search Result 88, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)