Browse > Article
http://dx.doi.org/10.15207/JKCS.2021.12.11.109

A Study on Verification of Back TranScription(BTS)-based Data Construction  

Park, Chanjun (Department of Computer Science and Engineering, Korea University)
Seo, Jaehyung (Department of Computer Science and Engineering, Korea University)
Lee, Seolhwa (Department of Computer Science and Engineering, Korea University)
Moon, Hyeonseok (Department of Computer Science and Engineering, Korea University)
Eo, Sugyeong (Department of Computer Science and Engineering, Korea University)
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
Publication Information
Journal of the Korea Convergence Society / v.12, no.11, 2021 , pp. 109-117 More about this Journal
Abstract
Recently, the use of speech-based interfaces is increasing as a means for human-computer interaction (HCI). Accordingly, interest in post-processors for correcting errors in speech recognition results is also increasing. However, a lot of human-labor is required for data construction. in order to manufacture a sequence to sequence (S2S) based speech recognition post-processor. To this end, to alleviate the limitations of the existing construction methodology, a new data construction method called Back TranScription (BTS) was proposed. BTS refers to a technology that combines TTS and STT technology to create a pseudo parallel corpus. This methodology eliminates the role of a phonetic transcriptor and can automatically generate vast amounts of training data, saving the cost. This paper verified through experiments that data should be constructed in consideration of text style and domain rather than constructing data without any criteria by extending the existing BTS research.
Keywords
Machine translation; BackTranscription; Parallel corpus; Speech recognition; Deep learning; Language convergence;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H. Moon, C. Park, S. Eo, J. Park & H. Lim. (2021). Filter-mBART Based Neural Machine Translation Using Parallel Corpus Filtering. Journal of the Korea Convergence Society, 12(5), 1-7. DOI : /10.15207/JKCS.2021.12.5.001   DOI
2 T. Kudo & J. Richardson. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
3 S. K. Kaya, T. Paksoy & J. A. Garza-Reyes. (2020). The New Challenge of Industry 4.0. Logistics 4.0: Digital Transformation of Supply Chain Management, 51.
4 J. W. Ha, K. Nam, J. Kang, S. W. Lee, S. Yang, H. Jung & S. Kim. (2020). ClovaCall: Korean goal-oriented dialog speech corpus for automatic speech recognition of contact centers. arXiv preprint arXiv:2004.09367.
5 D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel & K. Vesely. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.
6 C. Park, Y. Yang, K. Park & H. Lim. (2020). Decoding strategies for improving low-resource machine translation. Electronics, 9(10), 1562.   DOI
7 H. Cucu, A. Buzo, L. Besacier & C. Burileanu. (2013, July). Statistical error correction methods for domain-specific ASR systems. In International Conference on Statistical Language and Speech Processing (pp. 83-92). Springer, Berlin, Heidelberg.
8 K. Papineni, S. Roukos, T. Ward & W. J. Zhu. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318).
9 P. Aleksic, M. Ghodsi, A. Michaely, C. Allauzen, K. Hall, B. Roark & P. Moreno. (2015). Bringing contextual information to google speech recognition.
10 A. Baevski, H. Zhou, A. Mohamed & M. Auli. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477.
11 Z. Q. Zhang, Y. Song, M. H. Wu, X. Fang & L. R. Dai. (2021). XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition. arXiv preprint arXiv:2103.08207.
12 C. Park, Y. Lee, C. Lee & H. Lim. (2020). Quality, not quantity?: Effect of parallel corpus quantity and quality on neural machine translation. In The 32st Annual Conference on Human Cognitive Language Technology (pp. 363-368).
13 K. Sakaguchi, C. Napoles, M. Post & J. Tetreault. (2016). Reassessing the goals of grammatical error correction: Fluency instead of grammaticality. Transactions of the Association for Computational Linguistics, 4, 169-182.   DOI
14 A. Mani, S. Palaskar, N. V. Meripo, S. Konam & F. Metze. (2020, May). ASR error correction and domain adaptation using machine translation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6344-6348). IEEE.
15 C. Park, Y. Yang, C. Lee & H. Lim. (2020). Comparison of the evaluation metrics for Neural Grammatical Error Correction with Overcorrection. IEEE Access, 8, 106264-106272.   DOI
16 C. Lee & H. Kim. (2013). Automatic Korean word spacing using Pegasos algorithm. Information processing & management, 49(1), 370-379.   DOI
17 M. N. Stuttle. (2003). A Gaussian mixture model spectral representation for speech recognition (Doctoral dissertation, University of Cambridge).
18 M. Gales & S. Young. (2008). The application of hidden Markov models in speech recognition.
19 J. Yi, J. Tao, Y. Bai, Z. Tian & C. Fan. (2020). Adversarial transfer learning for punctuation restoration. arXiv preprint arXiv:2004.00248.
20 C. Wang, J. Pino & J. Gu. (2020). Improving cross-lingual transfer learning for end-to-end speech recognition with speech translation. arXiv preprint arXiv:2006.05474.
21 Z. Chi, S. Huang, L. Dong, S. Ma, S. Singhal, P. Bajaj & F. Wei. (2021). XLM-E: Cross-lingual Language Model Pre-training via ELECTRA. arXiv preprint arXiv:2106.16138.
22 M. Paulik, S. Rao, I. Lane, S. Vogel & T. Schultz, (2008, March). Sentence segmentation and punctuation recovery for spoken language translation. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 5105-5108). IEEE.
23 L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant & C. Raffel. (2020). mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
24 C. Park & H. Lim. (2020). A Study on the Performance Improvement of Machine Translation Using Public Korean-English Parallel Corpus. Journal of Digital Convergence, 18(6), 271-277. DOI : 10.14400/JDC.2020.18.6.271   DOI
25 K. Voll, S. Atkins & B. Forster. (2008). Improving the utility of speech recognition through error detection. Journal of digital imaging, 21(4), 371.   DOI
26 C. Park, S. Eo, H. Moon & H. S. Lim. (2021, June). Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers (pp. 97-104).
27 J. Liao, S. E. Eskimez, L. Lu, Y. Shi, M. Gong, L. Shou & M. Zeng. (2020). Improving readability for automatic speech recognition transcription. arXiv preprint arXiv:2004.04438.
28 C. Park, J. Seo, S. Lee, C. Lee, H. Moon, S. Eo & H. Lim. (2021). BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text. Proceedings of the 8th Workshop on Asian Translation, (pp. 106-116).
29 S. Skodova, M. Kucharova & L. Seps, (2012, September). Discretion of speech units for the text post-processing phase of automatic transcription (in the czech language). In International Conference on Text, Speech and Dialogue (pp. 446-455). Springer, Berlin, Heidelberg.
30 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez & I. Polosukhin. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
31 C. Park, K. Kim, Y. Yang, M. Kang & H. Lim. (2020). Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimedia Tools and Applications, 1-18.