Search | Korea Science

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Text-to-speech with linear spectrogram prediction for quality and speed improvement (음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech)

Yoon, Hyebin
- Phonetics and Speech Sciences
- /
- v.13 no.3
- /
- pp.71-78
- /
- 2021
Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.
https://doi.org/10.13064/KSSS.2021.13.3.071 인용 PDF KSCI

The User Perception in ASMR Marketing Content through Social Media Text-Mining: ASMR Product Review Content vs ASMR How-to Content (텍스트 마이닝을 활용한 ASMR 콘텐츠 분야에 따른 소비자 인식 및 구전효과 차이점 분석: ASMR 제품리뷰 및 ASMR How-to 콘텐츠 중심으로)

Tran, Hung Chuong;Choi, Jae Won
- The Journal of Information Systems
- /
- v.30 no.4
- /
- pp.1-20
- /
- 2021
Purpose Nowadays, Autonomous Sensory Meridian Response (ASMR) is rapidly growing in popularity and increasingly appearing in marketing. Not even in TV commercial advertisement, ASMR also fast growing in one-person media communication, many brands and social media influencers used ASMR for their marketing contents. The purpose of this study is to measure consumers' perceptions about the products in ASMR marketing content and compare the differences in communication effect of ASMR content creator between product review and how-to in the same Macro tier influencer - the YouTuber that has 10,000-100,000 subscribers. Design/methodology/approach The research methods selected ASMRtist that do product review content and how-to content, Text comments data was collected from 200 videos of tech-device review videos and beauty-fashion videos. A total of 52,833 text comments were analyzed by applying the LDA topic modeling algorithm and social network analysis. Findings Through the result, we can know that ASMR is good at taking attention of viewers with ASMR triggers. In the Tech device reviews field, ASMR viewers also focus on the product like product's performance and purchase. However, there are many topics related to reaction of ASMR sound, trigger, relaxation. In the Beauty-fashion field, viewers' topics mainly focus on the reaction of the ASMR trigger, response to ASMRtist and other topics are talking about makeup - fashion, product, purchase. From LDA result, many ASMR viewers comment that they feel more comfortable when watching the marketing content that uses ASMR. This result has shown that ASMR marketing contents have a good performance in terms of user watching experience, so applying ASMR can take more consumer intention. And the result of social network analysis showed that product review ASMRtist have a higher communication effectiveness than how-to ASMRtist in the same tier. As an influencer marketing strategy, this study provides information to establish an efficient advertising strategy by using influencers that create ASMR content.
https://doi.org/10.5859/KAIS.2021.30.4.1 인용 PDF KSCI

Development of Novel Disaster Pictogram Emergency Alert Technology for Hearing Impaired (청각장애인을 위한 재난안전 픽토그램 긴급알림 전달 기술 개발)

Yong-Yook Kim;Hyun-Chul Kim;Beom-Jun Cho
- Journal of the Society of Disaster Information
- /
- v.19 no.1
- /
- pp.76-83
- /
- 2023
Purpose: In emergency situations such as earthquakes, heavy rains, typhoons, or fires, when quick delivery of emergency alerts is crucial, the hearing impaired are the ones who are the most disadvantaged and vulnerable when alerts are only delivered through auditory or text alerts. They can't perceive auditory information, and many have difficulties in fast understanding text-based alerts. Method: An alert system that can deliver pictograms for specific disaster situations has been devised. Then, a novel approach based on artificial intelligence has been studied so that the pictograms for specific disaster situations can be chosen instantly once a disaster alert is issued in text. Result: A disaster alert system that delivers pictograms for specific disaster situations was developed and a novel method has been suggested for automatic delivery. Conclusion: A system to instantaneously deliver disaster alert information in pictograms has been developed to improve alert delivery to the populations vulnerable to disaster due to hearing impairment by the instantaneous understanding of disaster situations through visual information.
https://doi.org/10.15683/kosdi.2023.3.31.076 인용 PDF HTML

Fast Skew Detection of Document Images by Extraction of Center Points of Blank Lines (공백행의 중심점 추출에 의한 고속 문서 기울기 검출)

Jeong, Jae-Yeong;Kim, Mun-Hyeon
- Journal of KIISE:Software and Applications
- /
- v.26 no.11
- /
- pp.1342-1349
- /
- 1999
본 논문에서는 문서 내의 인접한 두 행 사이에는 일정한 두께의 공백 행이 존재하며 그 공백 행의 기울기는 실제 문서의 기울어진 정도를 반영한다는 사실에 기반하여, 선형적으로 기울어진 문서 영상의 기울기 추정을 위한 고속의 알고리즘을 제안한다. 먼저, 간단한 모폴로지 연산(dilation)을 이용하여 문자행 영역과 공백행 영역을 분리한 후, 이를 일정 간격으로 수직 샘플링하여 수직선 상에 있는 모든 공백행의 중심점(행간점)을 찾는다. 동일한 공백 행 상에 있는 인접한 두 행간점 간에 기울기를 계산하고, 전체 영상으로부터 이들의 분포를 조사하여 최대 빈도를 가지는 기울기를 입력 문서의 기울기로 추정한다. 실험에서는 제안한 알고리즘을 필기체 및 인쇄체를 포함하는 다양한 형태의 가로쓰기 문서에 적용한 결과를 보인다.Abstract In this paper, we propose a fast algorithm to estimate the skew angle of linearly skewed document images. This paper is based on the fact that there is a blank line with uniform thickness between two adjacent text lines and the slope of the line is the same as that of the document. Firstly, we apply a dilation operation to the image to separate blank lines from text lines, and we detect center points of blank lines along the vertically sampled lines. Then we calculate the slope between neighboring center points in the same blank line. Calculated slopes for the entire image are accumulated on the histogram to display the distribution of them. Finally, the peak in the histogram is detected and estimated as the slope of the document image. In the experiments, we adopted a lot of images of various format with hand-printed or machine-printed document to verify our algorithm.

Tax Judgment Analysis and Prediction using NLP and BiLSTM (NLP와 BiLSTM을 적용한 조세 결정문의 분석과 예측)

Lee, Yeong-Keun;Park, Koo-Rack;Lee, Hoo-Young
- Journal of Digital Convergence
- /
- v.19 no.9
- /
- pp.181-188
- /
- 2021
Research and importance of legal services applied with AI so that it can be easily understood and predictable in difficult legal fields is increasing. In this study, based on the decision of the Tax Tribunal in the field of tax law, a model was built through self-learning through information collection and data processing, and the prediction results were answered to the user's query and the accuracy was verified. The proposed model collects information on tax decisions and extracts useful data through web crawling, and generates word vectors by applying Word2Vec's Fast Text algorithm to the optimized output through NLP. 11,103 cases of information were collected and classified from 2017 to 2019, and verified with 70% accuracy. It can be useful in various legal systems and prior research to be more efficient application.
https://doi.org/10.14400/JDC.2021.19.9.181 인용 PDF KSCI

Recommendation System for Research Field of R&D Project Using Machine Learning (머신러닝을 이용한 R&D과제의 연구분야 추천 서비스)

Kim, Yunjeong;Shin, Donggu;Jung, Hoekyung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.12
- /
- pp.1809-1816
- /
- 2021
In order to identify the latest research trends using data related to national R&D projects and to produce and utilize meaningful information, the application of automatic classification technology was also required in the national R&D information service, so we conducted research to automatically classify and recommend research field. About 450,000 cases of national R&D project data from 2013 to 2020 were collected and used for learning and evaluation. A model was selected after data pre-processing, analysis, and performance analysis for valid data among collected data. The performance of Word2vec, GloVe, and fastText was compared for the purpose of deriving the optimal model combination. As a result of the experiment, the accuracy of only the subcategories used as essential items of task information is 90.11%. This model is expected to be applicable to the automatic classification study of other classification systems with a hierarchical structure similar to that of the national science and technology standard classification research field.
https://doi.org/10.6109/jkiice.2021.25.12.1809 인용 PDF KSCI

Approximate Top-k Labeled Subgraph Matching Scheme Based on Word Embedding (워드 임베딩 기반 근사 Top-k 레이블 서브그래프 매칭 기법)

Choi, Do-Jin;Oh, Young-Ho;Bok, Kyoung-Soo;Yoo, Jae-Soo
- The Journal of the Korea Contents Association
- /
- v.22 no.8
- /
- pp.33-43
- /
- 2022
Labeled graphs are used to represent entities, their relationships, and their structures in real data such as knowledge graphs and protein interactions. With the rapid development of IT and the explosive increase in data, there has been a need for a subgraph matching technology to provide information that the user is interested in. In this paper, we propose an approximate Top-k labeled subgraph matching scheme that considers the semantic similarity of labels and the difference in graph structure. The proposed scheme utilizes a learning model using FastText in order to consider the semantic similarity of a label. In addition, the label similarity graph(LSG) is used for approximate subgraph matching by calculating similarity values between labels in advance. Through the LSG, we can resolve the limitations of the existing schemes that subgraph expansion is possible only if the labels match exactly. It supports structural similarity for a query graph by performing searches up to 2-hop. Based on the similarity value, we provide k subgraph matching results. We conduct various performance evaluations in order to show the superiority of the proposed scheme.
https://doi.org/10.5392/JKCA.2022.22.08.033 인용 PDF KSCI HTML

Keyword Network Visualization for Text Summarization and Comparative Analysis (문서 요약 및 비교분석을 위한 주제어 네트워크 가시화)

Kim, Kyeong-rim;Lee, Da-yeong;Cho, Hwan-Gue
- Journal of KIISE
- /
- v.44 no.2
- /
- pp.139-147
- /
- 2017
Most of the information prevailing in the Internet space consists of textual information. So one of the main topics regarding the huge document analyses that are required in the "big data" era is the development of an automated understanding system for textual data; accordingly, the automation of the keyword extraction for text summarization and abstraction is a typical research problem. But the simple listing of a few keywords is insufficient to reveal the complex semantic structures of the general texts. In this paper, a text-visualization method that constructs a graph by computing the related degrees from the selected keywords of the target text is developed; therefore, two construction models that provide the edge relation are proposed for the computing of the relation degree among keywords, as follows: influence-interval model and word- distance model. The finally visualized graph from the keyword-derived edge relation is more flexible and useful for the display of the meaning structure of the target text; furthermore, this abstract graph enables a fast and easy understanding of the target text. The authors' experiment showed that the proposed abstract-graph model is superior to the keyword list for the attainment of a semantic and comparitive understanding of text.
https://doi.org/10.5626/JOK.2017.44.2.139 인용 KSCI

Fast Video Detection Using Temporal Similarity Extraction of Successive Spatial Features (연속하는 공간적 특징의 시간적 유사성 검출을 이용한 고속 동영상 검색)

Cho, A-Young;Yang, Won-Keun;Cho, Ju-Hee;Lim, Ye-Eun;Jeong, Dong-Seok
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.11C
- /
- pp.929-939
- /
- 2010
The growth of multimedia technology forces the development of video detection for large database management and illegal copy detection. To meet this demand, this paper proposes a fast video detection method to apply to a large database. The fast video detection algorithm uses spatial features using the gray value distribution from frames and temporal features using the temporal similarity map. We form the video signature using the extracted spatial feature and temporal feature, and carry out a stepwise matching method. The performance was evaluated by accuracy, extraction and matching time, and signature size using the original videos and their modified versions such as brightness change, lossy compression, text/logo overlay. We show empirical parameter selection and the experimental results for the simple matching method using only spatial feature and compare the results with existing algorithms. According to the experimental results, the proposed method has good performance in accuracy, processing time, and signature size. Therefore, the proposed fast detection algorithm is suitable for video detection with the large database.
PDF KSCI

Search Result 171, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)