Search | Korea Science

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

Jungwon Chang;Hosung Nam
- Phonetics and Speech Sciences
- /
- v.15 no.3
- /
- pp.83-88
- /
- 2023
This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.
https://doi.org/10.13064/KSSS.2023.15.3.083 인용 PDF

Fine-tuning Neural Network for Improving Video Classification Performance Using Vision Transformer (Vision Transformer를 활용한 비디오 분류 성능 향상을 위한 Fine-tuning 신경망)

Kwang-Yeob Lee;Ji-Won Lee;Tae-Ryong Park
- Journal of IKEEE
- /
- v.27 no.3
- /
- pp.313-318
- /
- 2023
This paper proposes a neural network applying fine-tuning as a way to improve the performance of Video Classification based on Vision Transformer. Recently, the need for real-time video image analysis based on deep learning has emerged. Due to the characteristics of the existing CNN model used in Image Classification, it is difficult to analyze the association of consecutive frames. We want to find and solve the optimal model by comparing and analyzing the Vision Transformer and Non-local neural network models with the Attention mechanism. In addition, we propose an optimal fine-tuning neural network model by applying various methods of fine-tuning as a transfer learning method. The experiment trained the model with the UCF101 dataset and then verified the performance of the model by applying a transfer learning method to the UTA-RLDD dataset.
https://doi.org/10.7471/ikeee.2023.27.3.313 인용 PDF

A Study on Efficiency Improvement by Fine Tuning of Power Plant Control (제어시스템 튜닝에 의한 발전소 효율향상에 관한 연구)

Kim, Ho-Yol;Kim, Byoung-Chul;Byun, Seung-Hyun
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.61 no.10
- /
- pp.1496-1501
- /
- 2012
A fine tuning on a control system is essential not only for stable operation but also for efficient operation of the power plant. There has been a very few studies on efficiency change by control system tuning. So, it was not clear that if it could be improved or not when the control is stable by fine tuning and how much it could be improved if it works. An accurate algorithm for measurement of the plant efficiency was newly introduced and implemented to measure integrated fuel flow and electricity MW output and to calculate the mean efficiency for given time. As a result, stable operation after fine tuning of control parameters for major controlled variables brought higher efficiency than un-stable operations like a cycling or an oscillation. The plant efficiency has been monitored during various tests and tunings to confirm how much it changes by tuning of the control system on power plant. Now, we can say that the efficiency can be improved in stable operation by fine tuning of the control system.
https://doi.org/10.5370/KIEE.2012.61.10.1496 인용 PDF KSCI

Comparing the performance of Supervised Fine-tuning, Reinforcement Learning, and Chain-of-Hindsight with Llama and OPT models (Llama, OPT 모델을 활용한 Supervised Fine Tuning, Reinforcement Learning, Chain-of-Hindsight 성능 비교)

Hyeon Min Lee;Seung Hoon Na;Joon Ho Lim;Tae Hyeong Kim;Hwi Jung Ryu;Du Seong Chang
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.217-221
- /
- 2023
최근 몇 년 동안, Large Language Model(LLM)의 발전은 인공 지능 연구 분야에서 주요 도약을 이끌어 왔다. 이러한 모델들은 복잡한 자연어처리 작업에서 뛰어난 성능을 보이고 있다. 특히 Human Alignment를 위해 Supervised Fine Tuning, Reinforcement Learning, Chain-of-Hindsight 등을 적용한 언어모델이 관심 받고 있다. 본 논문에서는 위에 언급한 3가지 지시학습 방법인 Supervised Fine Tuning, Reinforcement Learning, Chain-of-Hindsight 를 Llama, OPT 모델에 적용하여 성능을 측정 및 비교한다.
PDF

Fine-tuning BERT Models for Keyphrase Extraction in Scientific Articles

Lim, Yeonsoo;Seo, Deokjin;Jung, Yuchul
- Journal of Advanced Information Technology and Convergence
- /
- v.10 no.1
- /
- pp.45-56
- /
- 2020
Despite extensive research, performance enhancement of keyphrase (KP) extraction remains a challenging problem in modern informatics. Recently, deep learning-based supervised approaches have exhibited state-of-the-art accuracies with respect to this problem, and several of the previously proposed methods utilize Bidirectional Encoder Representations from Transformers (BERT)-based language models. However, few studies have investigated the effective application of BERT-based fine-tuning techniques to the problem of KP extraction. In this paper, we consider the aforementioned problem in the context of scientific articles by investigating the fine-tuning characteristics of two distinct BERT models - BERT (i.e., base BERT model by Google) and SciBERT (i.e., a BERT model trained on scientific text). Three different datasets (WWW, KDD, and Inspec) comprising data obtained from the computer science domain are used to compare the results obtained by fine-tuning BERT and SciBERT in terms of KP extraction.
https://doi.org/10.14801/JAITC.2020.10.1.45 인용

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development (사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구)

Lee, Chi Hoon;Lee, Yeon Ji;Lee, Dong Hee
- Journal of Information Technology Services
- /
- v.19 no.5
- /
- pp.83-91
- /
- 2020
Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.
https://doi.org/10.9716/KITS.2020.19.5.083 인용 PDF KSCI

Fine-tuning of Attention-based BART Model for Text Summarization (텍스트 요약을 위한 어텐션 기반 BART 모델 미세조정)

Ahn, Young-Pill;Park, Hyun-Jun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.12
- /
- pp.1769-1776
- /
- 2022
Automatically summarizing long sentences is an important technique. The BART model is one of the widely used models in the summarization task. In general, in order to generate a summarization model of a specific domain, fine-tuning is performed by re-training a language model trained on a large dataset to fit the domain. The fine-tuning is usually done by changing the number of nodes in the last fully connected layer. However, in this paper, we propose a fine-tuning method by adding an attention layer, which has been recently applied to various models and shows good performance. In order to evaluate the performance of the proposed method, various experiments were conducted, such as accumulating layers deeper, fine-tuning without skip connections during the fine tuning process, and so on. As a result, the BART model using two attention layers with skip connection shows the best score.
https://doi.org/10.6109/jkiice.2022.26.12.1769 인용 PDF KSCI

Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning (프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론)

Jinmyung Jeong;Namgyu Kim
- Journal of Intelligence and Information Systems
- /
- v.29 no.4
- /
- pp.165-184
- /
- 2023
Recently, Deep learning analysis of unstructured text data using language models, such as Google's BERT and OpenAI's GPT has shown remarkable results in various applications. Most language models are used to learn generalized linguistic information from pre-training data and then update their weights for downstream tasks through a fine-tuning process. However, some concerns have been raised that privacy may be violated in the process of using these language models, i.e., data privacy may be violated when data owner provides large amounts of data to the model owner to perform fine-tuning of the language model. Conversely, when the model owner discloses the entire model to the data owner, the structure and weights of the model are disclosed, which may violate the privacy of the model. The concept of offsite tuning has been recently proposed to perform fine-tuning of language models while protecting privacy in such situations. But the study has a limitation that it does not provide a concrete way to apply the proposed methodology to text classification models. In this study, we propose a concrete method to apply offsite tuning with an additional classifier to protect the privacy of the model and data when performing multi-classification fine-tuning on Korean documents. To evaluate the performance of the proposed methodology, we conducted experiments on about 200,000 Korean documents from five major fields, ICT, electrical, electronic, mechanical, and medical, provided by AIHub, and found that the proposed plug-in model outperforms the zero-shot model and the offsite model in terms of classification accuracy.
https://doi.org/10.13088/jiis.2023.29.4.165 인용 PDF

All Digital DLL with Three Phase Tuning Stages (3단 구성의 디지털 DLL 회로)

Park, Chul-Woo;Kang, Jin-Ku
- Journal of IKEEE
- /
- v.6 no.1 s.10
- /
- pp.21-29
- /
- 2002
This paper describes a high resolution DLL(Delay Locked Loop) using all digital circuits. The proposed architecture is based on the three stage of coarse, fine and ultra fine phase tuning block which has a phase detector, selection block and delay line respectively. The first stage, the ultra fine phase tuning block, is tune to accomplish high resolution using a vernier delay line. The second and third stage, the coarse and fine tuning block, are tuning the phase margin of Unit Delay using the delay line and are similar to each other. It was simulated in 0.35um CMOS technology under 3.3V supply using HSPICE simulator. The simulation result shows the phase resolution can be down to lops with the operating range of 250MHz to 800MHz.
PDF

Wide-Band Fine-Resolution DCO with an Active Inductor and Three-Step Coarse Tuning Loop

Pu, Young-Gun;Park, An-Soo;Park, Joon-Sung;Moon, Yeon-Kug;Kim, Su-Ki;Lee, Kang-Yoon
- ETRI Journal
- /
- v.33 no.2
- /
- pp.201-209
- /
- 2011
This paper presents a wide-band fine-resolution digitally controlled oscillator (DCO) with an active inductor using an automatic three-step coarse and gain tuning loop. To control the frequency of the DCO, the transconductance of the active inductor is tuned digitally. To cover the wide tuning range, a three-step coarse tuning scheme is used. In addition, the DCO gain needs to be calibrated digitally to compensate for gain variations. The DCO tuning range is 58% at 2.4 GHz, and the power consumption is 6.6 mW from a 1.2 V supply voltage. An effective frequency resolution is 0.14 kHz. The phase noise of the DCO output at 2.4 GHz is -120.67 dBc/Hz at 1 MHz offset.
https://doi.org/10.4218/etrij.11.0110.0209 인용 PDF KSCI

Search Result 316, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)