• Title/Summary/Keyword: Whisper

Search Result 15, Processing Time 0.025 seconds

Building robust Korean speech recognition model by fine-tuning large pretrained model (대형 사전훈련 모델의 파인튜닝을 통한 강건한 한국어 음성인식 모델 구축)

  • Changhan Oh;Cheongbin Kim;Kiyoung Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.75-82
    • /
    • 2023
  • Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean language. Whisper was pretrained on a large corpus (around 680,000 hours) of web speech data and has demonstrated strong recognition performance for major languages. However, it faces challenges in recognizing languages such as Korean, which is not major language while training. We address this issue by fine-tuning the Whisper model with an additional dataset comprising about 1,000 hours of Korean speech. We also compare its performance against a Transformer model that was trained from scratch using the same dataset. Our results indicate that fine-tuning the Whisper model significantly improved its Korean speech recognition capabilities in terms of character error rate (CER). Specifically, the performance improved with increasing model size. However, the Whisper model's performance on English deteriorated post fine-tuning, emphasizing the need for further research to develop robust multilingual models. Our study demonstrates the potential of utilizing a fine-tuned Whisper model for Korean ASR applications. Future work will focus on multilingual recognition and optimization for real-time inference.

Madness, the Smile, and Transnational Connections in "A Whisper in the Dark"

  • Jin, Seongeun
    • American Studies
    • /
    • v.44 no.1
    • /
    • pp.137-154
    • /
    • 2021
  • Due to her successful novel Little Women (1869), Louisa May Alcott has generally become known as a writer of sentimental fiction. However, her thrillers demonstrate her keen insights into domestic and international issues. Alcott's so-called "left hand" shows her stances on political and historical issues in America as well as in Europe and Asia. Particularly, Alcott's supporting voice for women against social prejudices is metaphorically portrayed in "A Whisper in the Dark" (1861). Interestingly, in the story Alcott displays her knowledge of the drug trades and the cultural effects of white male colonizers exploiting other peoples and countries around the globe, which were issues that she had learned about from neighboring intellectuals and newspapers. In the paper, I examine Alcott's radical views on gender equality, chauvinistic attitudes, and transnational politics in the mid-nineteenth century.

Development of Learning Program using Chinese Whispers Game(Broken Telephone Game) for Systematic Assessment and Reporting of Patients and Exploration on Learners' Experiences (속삭임게임을 활용한 체계적 환자사정 및 보고 교육프로그램의 개발 및 학습자 경험탐색)

  • Jung, Hyun-Jung
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.6
    • /
    • pp.143-153
    • /
    • 2019
  • In order to save lives by recognizing the deteriorating changes of the patients, patient's assessment and reporting should be foundation, but this task is mainly delegated to nursing students or inexperienced nurses. A whisper game is a game in which the first person whisper selects a word, phrase or sentence and delivers it to the team member and finally confirms how many original message have changed during the transmit process. The purpose of this study was to develop a whisper game program to transmit the information of the children included in the DVD using in the pediatric advanced life support process. After four times of games, the experiences of 31 nursing students in the fourth grade were explored by analyzing the reflective journal. The results of the study showed three themes: learning motivation, metacognitive ability, and situated contextual learning. Repeated practice through a whisper game is expected to be widely used because it has been identified as a fresh and interesting learning method that enables nursing students to metacognize the process of assessing patients and conveying information in the contextual situation.

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

The Effects of the Methods of Disguised Voice on the Aural Decision (위장 발화 방법의 차이가 청취 판단에 미치는 영향)

  • Song Min-Chang;Shin Jiyoung;Kang SunMee
    • MALSORI
    • /
    • no.46
    • /
    • pp.25-35
    • /
    • 2003
  • This study deals with the disguised voice (or voice disguise) in the field of forensic phonetics. We especially studied the effects of the methods of disguised voice on the aural decision. Within the nonelectronic-deliberate voice disguise area, the methods of disguised voice include use of lowered pitch, pinched nostrils, falsetto, and whisper. Ten (male:5, female:5) Seoul speakers made a recording of 16 sentences. In the aural test, 30 subjects listened normal and disguised voice. And they were asked to make a decision whether speakers identified or not. The result is as follows: The speaker verification of the falsetto and whisper was more difficult than the lowered pitch and pinched nostrils.

  • PDF

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

  • Miran Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.13-20
    • /
    • 2023
  • This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.

A Study of Data Augmentation and Auto Speech Recognition for the Elderly (한국어 노인 음성 데이터 증강 및 인식 연구 )

  • Keon Hee Kim;Seoyoon Park;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.56-60
    • /
    • 2023
  • 기존의 음성인식은 청장년 층에 초점이 맞추어져 있었으나, 최근 고령화가 가속되면서 노인 음성에 대한 연구 필요성이 증대되고 있다. 그러나 노인 음성 데이터셋은 청장년 음성 데이터셋에 비해서는 아직까지 충분히 확보되지 못하고 있다. 본 연구에서는 부족한 노인 음성 데이터셋 확보에 기여하고자 희소한 노인 데이터셋을 증강할 수 있는 방법론에 대해 연구하였다. 이를 위해 노인 음성 특징(feature)을 분석하였으며, '주파수'와 '발화 속도' 특징을 일반 성인 음성에 합성하여 데이터를 증강하였다. 이후 Whisper small 모델을 파인 튜닝한 뒤 노인 음성에 대한 CER(Character Error Rate)를 구하였고, 기존 노인 데이터셋에 증강한 데이터셋을 함께 사용하는 것이 가장 효과적임을 밝혀내었다.

  • PDF

Development and Utilization of Speech Recognition Service for Ship Radio Communication (선박무선통신 음성인식 서비스 개발 및 활용)

  • Kwang-Il Kim;Sang-Lok Yoo
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.11a
    • /
    • pp.236-237
    • /
    • 2023
  • 선박무선통신장비는 선박이 항해하는데 필요한 안전정보, 선박교통 모니터링 및 관제, 입·출항 정보를 교환하기 위한 필수 장비이므로 선박항해사는 무선통신 내용을 항상 주의 깊게 청취해야 함. 본 연구에서는 선박의 실제 음성 교신데이터 500시간 데이터를 수집 및 학습하고, Wav2Vec 및 Whisper 모델을 활용하여 한글 및 영어(해사영어) 음성인식 모델을 개발하고 실용화를 수행하였다. 음성인식 모델의 성능은 CER(Character Error Rate) 기준 94.5%로 향후 선박 운항 관련 댜양한 분야에 적용이 가능할 것으로 사료된다.

  • PDF

Design of Interrogator for Airspace Surveillance in Multilateration Systems (항공용 다변측정 감시시스템 적용을 위한 질문기 설계)

  • Koh, Young-Mok;Kim, Su-Hong
    • Journal of Advanced Navigation Technology
    • /
    • v.19 no.2
    • /
    • pp.108-115
    • /
    • 2015
  • Multilateration systems are used to provide the position of aircraft in flight or on airport runways. In the multilateration systems, the interrogator is an important transmitter that used to interrogate the airplanes with appropriately scenario in surveillance airspace. Whisper-Shout interrogation sequence, which is one of the key functions of the interrogator, can control airport traffic density when intruder airplanes are coming into the surveillance airspace. Therefore collision chance between airplanes could be reduced and also get highly accurate location of incoming airplane in multilateration systems. In this paper, we developed the interrogator that allows it to transmit Mode A/C and Mode S interrogations which is similar to existing secondary surveillance radar. With appropriately controlled Whisper-Shout sequence in the interrogator, the multilateration systems can avoid synchronous garbling and FRUIT phenomenons caused by receiving multiple responses from a number of airplanes.

Invisible Messenger: A System to Whisper in a Person′s Ear Remotely by integrating Visual Tracking and Speaker Array

  • Mizoguchi, Hiroshi;Kanamori, Tomohiko;Okabe, Kosuke;Hiraoka, Kazuyuki;Tanaka, Masaru;Shigehara, Takaomi;Mishima, Taketoshi
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1897-1900
    • /
    • 2002
  • This paper proposes a novel computer-human interface, named invisible Messenger. It integrates face detection and tracking, and speaker array signal processing. By speaker array it is possible to form acoustic focus at the arbitrary location that is measured by the face tracking. Thus the proposed system can whisper in a person's ear as if an invisible virtual messenger were standing by the person. Not only speculative discussion, the authors have implemented a working prototype system based upon the proposed idea. This paper also describes about this prototype. In order to confirm effectiveness of the proposed idea, the authors conduct experiments using the implemented system. Experimental results demonstrate the effectivenss of the proposed idea.

  • PDF