Search | Korea Science

Building robust Korean speech recognition model by fine-tuning large pretrained model (대형 사전훈련 모델의 파인튜닝을 통한 강건한 한국어 음성인식 모델 구축)

Changhan Oh;Cheongbin Kim;Kiyoung Park
- Phonetics and Speech Sciences
- /
- v.15 no.3
- /
- pp.75-82
- /
- 2023
Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean language. Whisper was pretrained on a large corpus (around 680,000 hours) of web speech data and has demonstrated strong recognition performance for major languages. However, it faces challenges in recognizing languages such as Korean, which is not major language while training. We address this issue by fine-tuning the Whisper model with an additional dataset comprising about 1,000 hours of Korean speech. We also compare its performance against a Transformer model that was trained from scratch using the same dataset. Our results indicate that fine-tuning the Whisper model significantly improved its Korean speech recognition capabilities in terms of character error rate (CER). Specifically, the performance improved with increasing model size. However, the Whisper model's performance on English deteriorated post fine-tuning, emphasizing the need for further research to develop robust multilingual models. Our study demonstrates the potential of utilizing a fine-tuned Whisper model for Korean ASR applications. Future work will focus on multilingual recognition and optimization for real-time inference.
https://doi.org/10.13064/KSSS.2023.15.3.075 인용 PDF

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

Choi, Youngseok;Park, Jinsoo
- Journal of Intelligence and Information Systems
- /
- v.19 no.1
- /
- pp.111-123
- /
- 2013
Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.
https://doi.org/10.13088/jiis.2013.19.1.111 인용 PDF KSCI

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.24 no.3
- /
- pp.21-44
- /
- 2018
In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.
https://doi.org/10.13088/jiis.2018.24.3.021 인용 PDF KSCI

A Study of Keyword Spotting System Based on the Weight of Non-Keyword Model (비핵심어 모델의 가중치 기반 핵심어 검출 성능 향상에 관한 연구)

Kim, Hack-Jin;Kim, Soon-Hyub
- The KIPS Transactions:PartB
- /
- v.10B no.4
- /
- pp.381-388
- /
- 2003
This paper presents a method of giving weights to garbage class clustering and Filler model to improve performance of keyword spotting system and a time-saving method of dialogue speech processing system for keyword spotting by calculating keyword transition probability through speech analysis of task domain users. The point of the method is grouping phonemes with phonetic similarities, which is effective in sensing similar phoneme groups rather than individual phonemes, and the paper aims to suggest five groups of phonemes obtained from the analysis of speech sentences in use in Korean morphology and in stock-trading speech processing system. Besides, task-subject Filler model weights are added to the phoneme groups, and keyword transition probability included in consecutive speech sentences is calculated and applied to the system in order to save time for system processing. To evaluate performance of the suggested system, corpus of 4,970 sentences was built to be used in task domains and a test was conducted with subjects of five people in their twenties and thirties. As a result, FOM with the weights on proposed five phoneme groups accounts for 85%, which has better performance than seven phoneme groups of Yapanel [1] with 88.5% and a little bit poorer performance than LVCSR with 89.8%. Even in calculation time, FOM reaches 0.70 seconds than 0.72 of seven phoneme groups. Lastly, it is also confirmed in a time-saving test that time is saved by 0.04 to 0.07 seconds when keyword transition probability is applied.
https://doi.org/10.3745/KIPSTB.2003.10B.4.381 인용 PDF KSCI

Beneficial effect of Combination with Korean Red Ginseng, Gastrodia Rhizoma and Polygoni Multiflori on Cholesterol and Erectile Dysfunction in Hyperlipidemia rats (홍삼, 천마, 적하수오 병용투여에 의한 고지혈증 랫드에서의 콜레스테롤 및 발기부전 개선효과)

Lee, Yun Jung;Kho, Min Chul;Tan, Rui;Lee, Jae Yun;Hwang, Jin Seok;Cha, Jeong Dan;Choi, Kyung Min;Kang, Dae Gill
- The Korea Journal of Herbology
- /
- v.30 no.6
- /
- pp.69-75
- /
- 2015
Objectives : This study was designed to investigate effects of the combination with Korean Red Ginseng (Panax ginseng C.A. Meyer), Gastrodia Rhizoma (Gastrodia elata Blume) and Polygoni Multiflori Radix (Polygonum multiflorum Thunberg) on metabolic disorders including cholesterol and erectile dysfunction in hyperlipidemia rats.Methods : Animals were divided into six groups; Control with normal diet, high fat/cholesterol-diet (HFCD), fluvastatin, Korean Red Ginseng treated (KRG), and the combination treated (Korean Red Ginseng, Gastrodia Rhizoma and Polygoni Multiflori Radix; 1:1:1 for KGP1 and 2:1:1 for KGP2). The experimental groups initially received HFCD for 10 weeks and then treated orally with fluvastatin, KRG, KGP1 and KGP2 during the final 6 weeks. Erectile function was determined by the measurements of intracavernosal pressure (ICP) and maximal arterial pressure (MAP) after electrical stimulation of the cavernosal nerve.Results : KGP2 decreased the level of total cholesterol and LDL cholesterol in the sera of HFCD rats without no changes of body weights. KRG, KGP1 and KGP2 decreased the level of C-reactive protein (CRP) levels except of fluvastatin, synthetic HMG-CoA reductase inhibitor. KRG, KGP1 and KGP2 significantly increased the ICP, ICP/MAP ratio, area under the curve (AUC) compared with those of normal rat. Morphometric analyses showed that KRG, KGP1 and KGP2 increased the volume of smooth muscle and the regular arrangement of collagen fibers in corpus cavernosum of HFCD rats. The penile expression of eNOS was increased by KRG, KGP1 and KGP2.Conclusions : Based on these results, we suggest that the combination with Korean Red Ginseng, Gastrodia Rhizoma and Polygoni Multiflori may improve hyperlipidemia through regulating the lipid profiles and erectile dysfunction in rats.
https://doi.org/10.6116/kjh.2015.30.6.69. 인용 PDF KSCI KPUBS HTML

Behavioral Changes of Rats following Cingulate or Other Cortical Damages (대상회전 기타 피질이 손상된 흰쥐들의 행동 변화)

Kim, Chung-Chin;Kim, Jong-Kyu;Kim, Myung-Suk
- The Korean Journal of Physiology
- /
- v.2 no.2
- /
- pp.83-92
- /
- 1968
A study was planned to evaluate the effects of removal of the cingulate cortex upon the occurrence of any behavior commonly displayed by the rat, and to compare the effects of cingulectomy with those of removal of the parietal, parieto-occipital, or occipital regions. The subjects were 54 male albino rats (Holtzman strain, body weight $200{\sim}330\;gm$) including 14 rats in which the cingulate gyri between splenium and genu of the corpus callosum were bilaterally ablated by suction (cingulate group), 9 animals which had their parietal cortices (chiefly area 7) partially removed (parietal group), 9 rats whose parietal and occipital regions (chiefly areae 7 & 17), 13 animals in which the occipital cortices (chiefly area 17) were removed bilaterally (occipital group), and 9 normal rats (normal control group). Eighteen observation cages, each of which housed a subject and was provided with food and water ad lib., were arranged in 6 rows on a rack and the behavior of each subject was scanned by an observer at a distance of 1.5 m from the rack. The observer scanned the first and second rows 6 times in 1 min, then proceeded to the 3rd and 4th rows, scanning for another 1 min, and finally to the 5th and 6th rows. The speed of scanning was such that behavioral observations of all of the 18 rats were completed in 3 min, each subject receiving 6 observations. The scanning was repeated every 3 min for 18 min, which constituted one observation session and was followed by a 72 minutes' recess. The whole procedure was repeated through 24 hours so that a total of 576 behavioral observations were made on each subject in 16 observation sessions. Behaviors checked were sleeping, lying, lying and sniffing, standing, standing and sniffing, exploring, eating, drinking, grooming (included were washing, licking, and scratching), and others. Results obtained were as follows: 1. The cingulate group ate significantly more often than the normal control, the parietal, and the parieto-occipital groups. 2. Exploration was significantly less frequent in the cingulate group than in the normal control, the parietal, and the occipital groups. There was, in the case of the cingulate group, a significant negative correlation between the occurrence of eating and the exploratory activity. 3. The general activity, as judged from the value obtained by adding the occurrence of exploration, eating, drinking, grooming, and standing and sniffing, was significantly increased in the cingulate group compared with those of any other groups including the normal control. 4. Though statistically insignificant, the cingulate group slept least often among all the animal groups tested. 5. The parieto-occipital group tended to groom less, and the parietal group to eat less often than the normal control group did, but the difference was not significant. There were no significant differences among all the groups except the cingulate group as regards other behaviors analyzed. Based on the above results, it was inferred that the cingulate cortex exerts an inhibitory influence upon the occurrence of eating and general activity, while it tends to facilitate the occurrence of sleep.
PDF

Superovulation Response after Follicular Wave Synchronization with Follicular Aspiration by Ultrasonography in HanWoo I. Effect of Follicular Aspiration on Ovarian Response Following Superovulation (과배란 처치시 우세난포 조절에 의한 한우 수정란 생산성 향상에 관한 연구 I. 우세난포 처리에 따른 난소반응)

이병천;이동원;신수정;박종임;황우석
- Journal of Embryo Transfer
- /
- v.14 no.3
- /
- pp.203-210
- /
- 1999
In this stuyd, the effect of the dominant follicle aspiration for the superovulatory response in HanWoo was investigated. The criterion for the presence or absence of a dominant follicle based on their morphological examination. The dominant follicle was aspirated 48hr before the onset of superovulation treatment by 6.5MHz convex probe connected with a carrier and superovulation induced by FSH (Super-Ov Tyrer, Texas, U.S.A) adminstered twic a day s.c. over 4 day in a decreasing regimen. From 13 HanWoo scanned daily to determine the presence and growth of the dominant follicle, its an average diameter of 15.4mm was measured and an average diameter of corpora lutea was 18.7mm on day of follicular aspiration. In the experiment, a follicular remove by ultrasound-guided aspiration, the ovarian response was significantly enhanced when animals were superovulated in the aspiation of a dominant follicle compare with animals superovulated non-aspiration of a dominat follicle. In the aspiration of a dominant follicle donors yieleded more corpora lutea(14.4$\pm$4.7 vs 8.6$\pm$3.4) and transferable embryos(8.9$\pm$4.2 vs 5.4$\pm$2.7) than control. In cows in which the dominant follicle had been aspirated under sonographical control 2 days before superovuation, the number of corpus lutea and transferable embryos were significantly enhanced compared with animals superovulated in the presence of a dominant follicle (14.4$\pm$4.7 vs 6.9$\pm$2.7, ; 8.9$\pm$4.2 vs 3.3$\pm$1.6). After 7 days of artificial insemination, the embryos at 7 days were cllected by uterine flushing after dominant follicle insemination, the embryos at 7 days were collected by uterine flushing after dominant follicle aspiration and superovulation treatment, and evaluated their quality by morphological criteria. Sixteen embryos with excellent and good grade were transferred into 8 recipient cows. Six pregnancies were identified at 60 and 120 days of gestation by rectal palpations. In conclusion, the present study showed that 1) the presence or absence of a dominant follicle signficicnatly affects superovulatory responses, and 2) ultrasound-guided follicular aspiration of the dominant follicle and superovuation treatment provides an accurate and procedure to increase ovarian responses in HanWoo.
PDF

Effect of Different Feeding Ratios of Whole Crop Barley Silage on the Embryo Production in Hanwoo Donors

Son, Dong-Soo;Choe, Chang-Yong;Cho, Sang-Rae;Kim, Nam-Tae;Kim, Hyun-Jong;Yeon, Seong-Heum;Ryu, Il-Sun;Son, Jun-Kyu;Choi, Sun-Ho;Kim, Ill-Hwa
- Journal of Embryo Transfer
- /
- v.24 no.4
- /
- pp.265-269
- /
- 2009
The purpose of this study was to determine the effect of different feeding ratios of whole crop barley silage on the embryo production in Hanwoo donors. All donors were basically fed 2.5 kg concentrate daily. Donors were divided into three groups according to the different feeding of forage; hay 70% and rice straw 30% (control, n = 21), whole crop barley silage 80% and rice straw 20% (T1, n = 25), and whole crop barley silage 60% and rice straw 40% (T2, n = 23) fed based on TDN 6.70/ BW 500 kg. All Hanwoo donors received a CIDR together with injections of 1 mg estradiol benzoate and 50 mg progesterone ($P_4$, Day 0). Four days later, they were superovulated with 28 mg FSH twice daily IM in decreasing doses over 4 days. Then donors received 2 doses of $PGF_2{\alpha}$ (25 and 15 mg) with the 5th and 6th injections of FSH on Day 6. CIDR were withdrawn at the 6th FSH injection and the donors received $100\;{\mu}g$ GnRH 36 h after the second $PGF_2{\alpha}$ injection. The donors were artificially inseminated twice, at 8 and 24 h after GnRH, and embryos were recovered 7 or 8 days after the 1st insemination. The flush rate of the donors following positive superovulation responses did not differ among groups (76.2~96.0%, p>0.05). The number of corpus luteum (CL) at embryo recovery also did not differ among groups (10.6~14.0, p>0.05). Furthermore, the mean numbers of total ova (9.4, 10.5 and 12.0) and transferable embryos (5.3, 12.0 and 6.5) did not significantly differ among the control, T1 and T2 groups, respectively (p>0.05). However, mean concentrations of serum $P_4$ of the T1 (64.2 ng/ml) and T2 groups (55.7 ng/ml) were higher than that of control group (43.3 ng/ml, p<0.01), while serum cholesterol concentrations in the control (105.8 mg/dl) and T2 groups ($96.9\;{\pm}\;mg/dl$) were significantly lower than in the T1 group (121.1 mg/dl, p<0.05). Conclusively, whole crop barley silage can be fed a good substitute for hay forage for Hanwoo donors. Furthermore the ratios of whole crop barley silage 60% and rice straw 40% might be more worthful for embryo production.
PDF KSCI

Classification of nasal places of articulation based on the spectra of adjacent vowels (모음 스펙트럼에 기반한 전후 비자음 조음위치 판별)

Jihyeon Yun;Cheoljae Seong
- Phonetics and Speech Sciences
- /
- v.15 no.1
- /
- pp.25-34
- /
- 2023
This study examined the utility of the acoustic features of vowels as cues for the place of articulation of Korean nasal consonants. In the acoustic analysis, spectral and temporal parameters were measured at the 25%, 50%, and 75% time points in the vowels neighboring nasal consonants in samples extracted from a spontaneous Korean speech corpus. Using these measurements, linear discriminant analyses were performed and classification accuracies for the nasal place of articulation were estimated. The analyses were applied separately for vowels following and preceding a nasal consonant to compare the effects of progressive and regressive coarticulation in terms of place of articulation. The classification accuracies ranged between approximately 50% and 60%, implying that acoustic measurements of vowel intervals alone are not sufficient to predict or classify the place of articulation of adjacent nasal consonants. However, given that these results were obtained for measurements at the temporal midpoint of vowels, where they are expected to be the least influenced by coarticulation, the present results also suggest the potential of utilizing acoustic measurements of vowels to improve the recognition accuracy of nasal place. Moreover, the classification accuracy for nasal place was higher for vowels preceding the nasal sounds, suggesting the possibility of higher anticipatory coarticulation reflecting the nasal place.
https://doi.org/10.13064/KSSS.2023.15.1.025 인용 PDF

Analysis on the English Translation of The First Chosen Educational Ordinance, Manual of Education of Koreans (1913), and Manual of Education in Chosen 1920 (1920) Using Text Mining Analytics (텍스트 마이닝(Text mining) 기법을 활용한 『제1차조선교육령』과 『조선교육요람』(1913, 1920)의영어번역본 분석)

Jinyoung Tak;Eunjoo Kwak;Silo Chin;Minjoo Shon;Dongmie Kim
- The Journal of the Convergence on Culture Technology
- /
- v.9 no.6
- /
- pp.309-317
- /
- 2023
The purpose of this paper is to investigate how Japan tried to dominate Chosen through educational policies by analyzing three official English texts published by the Japanese Government-General of Korea: the First Chosen Educational Ordinance declared in 1911, the Manual of Education of Koreans(1913), and the Manual of Education in Chosen 1920(1920). In order to pursue this purpose, the present study carried a corpus-based diachronic analysis, rather then a qualitative analysis. Facilitating text analytics such as Word Cloud and CONCOR, this paper derived the following results: First, the first Chosen Educational Ordinance(1911) includes overall educational regulations, curriculum, and operations of schools. Second, the Manual of Education of Koreans(1913) contains the educational medium and contents on how to educate. Finally, it can be proposed that the Manual of Education in Chosen 1920(1920) contains specific implementation of education and the subject of education.
https://doi.org/10.17703/JCCT.2023.9.6.309 인용 PDF

Search Result 204, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)