• Title/Summary/Keyword: Multilingual Speech Recognition

Search Result 4, Processing Time 0.016 seconds

A Study on the Multilingual Speech Recognition using International Phonetic Language (IPA를 활용한 다국어 음성 인식에 관한 연구)

  • Kim, Suk-Dong;Kim, Woo-Sung;Woo, In-Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.7
    • /
    • pp.3267-3274
    • /
    • 2011
  • Recently, speech recognition technology has dramatically developed, with the increase in the user environment of various mobile devices and influence of a variety of speech recognition software. However, for speech recognition for multi-language, lack of understanding of multi-language lexical model and limited capacity of systems interfere with the improvement of the recognition rate. It is not easy to embody speech expressed with multi-language into a single acoustic model and systems using several acoustic models lower speech recognition rate. In this regard, it is necessary to research and develop a multi-language speech recognition system in order to embody speech comprised of various languages into a single acoustic model. This paper studied a system that can recognize Korean and English as International Phonetic Language (IPA), based on the research for using a multi-language acoustic model in mobile devices. Focusing on finding an IPA model which satisfies both Korean and English phonemes, we get 94.8% of the voice recognition rate in Korean and 95.36% in English.

Building robust Korean speech recognition model by fine-tuning large pretrained model (대형 사전훈련 모델의 파인튜닝을 통한 강건한 한국어 음성인식 모델 구축)

  • Changhan Oh;Cheongbin Kim;Kiyoung Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.75-82
    • /
    • 2023
  • Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean language. Whisper was pretrained on a large corpus (around 680,000 hours) of web speech data and has demonstrated strong recognition performance for major languages. However, it faces challenges in recognizing languages such as Korean, which is not major language while training. We address this issue by fine-tuning the Whisper model with an additional dataset comprising about 1,000 hours of Korean speech. We also compare its performance against a Transformer model that was trained from scratch using the same dataset. Our results indicate that fine-tuning the Whisper model significantly improved its Korean speech recognition capabilities in terms of character error rate (CER). Specifically, the performance improved with increasing model size. However, the Whisper model's performance on English deteriorated post fine-tuning, emphasizing the need for further research to develop robust multilingual models. Our study demonstrates the potential of utilizing a fine-tuned Whisper model for Korean ASR applications. Future work will focus on multilingual recognition and optimization for real-time inference.

A Study on the Multilingual Speech Recognition for On-line International Game (온라인 다국적 게임을 위한 다국어 혼합 음성 인식에 관한 연구)

  • Kim, Suk-Dong;Kang, Heung-Soon;Woo, In-Sung;Shin, Chwa-Cheul;Yoon, Chun-Duk
    • Journal of Korea Game Society
    • /
    • v.8 no.4
    • /
    • pp.107-114
    • /
    • 2008
  • The requests for speech-recognition for multi-language in field of game and the necessity of multi-language system, which expresses one phonetic model from many different kind of language phonetics, has been increased in field of game industry. Here upon, the research regarding development of multi-national language system which can express speeches, that is consist of various different languages, into only one lexical model is needed. In this paper is basic research for establishing integrated system from multi-language lexical model, and it shows the system which recognize Korean and English speeches into IPA(International Phonetic Alphabet). We focused on finding the IPA model which is satisfied with Korean and English phoneme one simutaneously. As a result, we could get the 90.62% of Korean speech-recognition rate, also 91.71% of English speech-recognition rate.

  • PDF

A Study on the Continuous Speech Recognition for the Automatic Creation of International Phonetics (국제 음소의 자동 생성을 활용한 연속음성인식에 관한 연구)

  • Kim, Suk-Dong;Hong, Seong-Soo;Shin, Chwa-Cheul;Woo, In-Sung;Kang, Heung-Soon
    • Journal of Korea Game Society
    • /
    • v.7 no.2
    • /
    • pp.83-90
    • /
    • 2007
  • One result of the trend towards globalization is an increased number of projects that focus on natural language processing. Automatic speech recognition (ASR) technologies, for example, hold great promise in facilitating global communications and collaborations. Unfortunately, to date, most research projects focus on single widely spoken languages. Therefore, the cost to adapt a particular ASR tool for use with other languages is often prohibitive. This work takes a more general approach. We propose an International Phoneticizing Engine (IPE) that interprets input files supplied in our Phonetic Language Identity (PLI) format to build a dictionary. IPE is language independent and rule based. It operates by decomposing the dictionary creation process into a set of well-defined steps. These steps reduce rule conflicts, allow for rule creation by people without linguistics training, and optimize run-time efficiency. Dictionaries created by the IPE can be used with the speech recognition system. IPE defines an easy-to-use systematic approach that can obtained 92.55% for the recognition rate of Korean speech and 89.93% for English.

  • PDF