Search | Korea Science

Visual analysis of attention-based end-to-end speech recognition (어텐션 기반 엔드투엔드 음성인식 시각화 분석)

Lim, Seongmin;Goo, Jahyun;Kim, Hoirin
- Phonetics and Speech Sciences
- /
- v.11 no.1
- /
- pp.41-49
- /
- 2019
An end-to-end speech recognition model consisting of a single integrated neural network model was recently proposed. The end-to-end model does not need several training steps, and its structure is easy to understand. However, it is difficult to understand how the model recognizes speech internally. In this paper, we visualized and analyzed the attention-based end-to-end model to elucidate its internal mechanisms. We compared the acoustic model of the BLSTM-HMM hybrid model with the encoder of the end-to-end model, and visualized them using t-SNE to examine the difference between neural network layers. As a result, we were able to delineate the difference between the acoustic model and the end-to-end model encoder. Additionally, we analyzed the decoder of the end-to-end model from a language model perspective. Finally, we found that improving end-to-end model decoder is necessary to yield higher performance.
https://doi.org/10.13064/KSSS.2019.11.1.041 인용 PDF KSCI

Hi, KIA! Classifying Emotional States from Wake-up Words Using Machine Learning (Hi, KIA! 기계 학습을 이용한 기동어 기반 감성 분류)

Kim, Taesu;Kim, Yeongwoo;Kim, Keunhyeong;Kim, Chul Min;Jun, Hyung Seok;Suk, Hyeon-Jeong
- Science of Emotion and Sensibility
- /
- v.24 no.1
- /
- pp.91-104
- /
- 2021
This study explored users' emotional states identified from the wake-up words -"Hi, KIA!"- using a machine learning algorithm considering the user interface of passenger cars' voice. We targeted four emotional states, namely, excited, angry, desperate, and neutral, and created a total of 12 emotional scenarios in the context of car driving. Nine college students participated and recorded sentences as guided in the visualized scenario. The wake-up words were extracted from whole sentences, resulting in two data sets. We used the soundgen package and svmRadial method of caret package in open source-based R code to collect acoustic features of the recorded voices and performed machine learning-based analysis to determine the predictability of the modeled algorithm. We compared the accuracy of wake-up words (60.19%: 22%~81%) with that of whole sentences (41.51%) for all nine participants in relation to the four emotional categories. Accuracy and sensitivity performance of individual differences were noticeable, while the selected features were relatively constant. This study provides empirical evidence regarding the potential application of the wake-up words in the practice of emotion-driven user experience in communication between users and the artificial intelligence system.
https://doi.org/10.14695/KJSOS.2021.24.1.91 인용 PDF KSCI

Diagnosis of Valve Internal Leakage for Ship Piping System using Acoustic Emission Signal-based Machine Learning Approach (선박용 밸브의 내부 누설 진단을 위한 음향방출신호의 머신러닝 기법 적용 연구)

Lee, Jung-Hyung
- Journal of the Korean Society of Marine Environment & Safety
- /
- v.28 no.1
- /
- pp.184-192
- /
- 2022
Valve internal leakage is caused by damage to the internal parts of the valve, resulting in accidents and shutdowns of the piping system. This study investigated the possibility of a real-time leak detection method using the acoustic emission (AE) signal generated from the piping system during the internal leakage of a butterfly valve. Datasets of raw time-domain AE signals were collected and postprocessed for each operation mode of the valve in a systematic manner to develop a data-driven model for the detection and classification of internal leakage, by applying machine learning algorithms. The aim of this study was to determine whether it is possible to treat leak detection as a classification problem by applying two classification algorithms: support vector machine (SVM) and convolutional neural network (CNN). The results showed different performances for the algorithms and datasets used. The SVM-based binary classification models, based on feature extraction of data, achieved an overall accuracy of 83% to 90%, while in the case of a multiple classification model, the accuracy was reduced to 66%. By contrast, the CNN-based classification model achieved an accuracy of 99.85%, which is superior to those of any other models based on the SVM algorithm. The results revealed that the SVM classification model requires effective feature extraction of the AE signals to improve the accuracy of multi-class classification. Moreover, the CNN-based classification can be a promising approach to detect both leakage and valve opening as long as the performance of the processor does not degrade.
https://doi.org/10.7837/kosomes.2022.28.1.184 인용 PDF KSCI

Analysis and verification of the characteristic of a compact free-flooded ring transducer made of single crystals (압전단결정을 이용한 소형 free-flooded ring 트랜스듀서의 성능 특성 예측 및 검증)

Im, Jongbeom;Yoon, Hongwoo;Kwon, Byungjin;Kim, Kyungseop;Lee, Jeongmin
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.3
- /
- pp.278-286
- /
- 2022
In this study, a 33-mode Free-Flooded Ring (FFR) transducer was designed to apply piezoelectric single crystal PIN-PMN-PT, which has high piezoelectric constants and electromechanical coupling coefficient. To ensure low-frequency high transmitting sensitivity characteristics with a small size of FFR transducer, the commercial FFR transducer based on piezoelectric ceramics was compared. To develop the FFR transducer with broadband characteristics, a piezoelectric segmented ring structure inserted with inactive elements was applied. The oil-filled structure was applied to minimize the change of acoustic characteristics of the ring transducer. It was verified that the transmitting voltage response, underwater impedance, and beam pattern matched the finite element numerical simulation results well through an acoustic test. The difference in the transmitting voltage response between the measured and the simulated results is about 1.3 dB in cavity mode and about 0.3 dB in radial mode. The fabricated FFR transducer had a higher transmitting voltage response compared to the commercial transducer, but the diameter was reduced by about 17 %. From this study, it was confirmed that the feasibility of a single crystal-applied FFR transducer with compact size and high-power characteristics. The effectiveness of the performance prediction by simulation was also confirmed.
https://doi.org/10.7776/ASK.2022.41.3.278 인용 PDF KSCI

A study on statistical characteristics of time-varying underwater acoustic communication channel influenced by surface roughness (수면 거칠기에 따른 수면 경로의 시변 통신채널 통계적 특성 분석)

In-Seong Hwang;Kang-Hoon Choi;Jee Woong Choi
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.6
- /
- pp.491-499
- /
- 2023
Scattering by Sea surface roughness occurs due to sea level roughness, communication performance deteriorates by causing frequency spread in communication signals and time variation in communication channels. In order to compare the difference in time variation of underwater acoustic communication channel according to the surface roughness, an experiment was performed in a tank owned by Hanyang University Ocean Acoustics Lab. Artificial surface roughness was created in the tank and communication signals with three bandwidths were used (8 kHz, 16 kHz, 32 kHz). The measured surface roughness was converted into a Rayleigh parameter and used as a roughness parameter, and statistical analysis was performed on the time-varying channel characteristics of the surface path using Doppler spread and correlation time. For the Doppler spread of the surface path, the Weighted Root Mean Square Doppler spread (w_fσ_ν) that corrected the effect of the carrier frequency and bandwidth of the communication signal was used. Using the correlation time of the surface path and the energy ratio of the direct path and the surface path, the correlation of total channels was simulated and compared with the measured correlation time of total channels. In this study, we propose a method for efficient communication signal design in an arbitrary marine environment by using the time-varying characteristics of the sea surface path according to the sea surface roughness.
https://doi.org/10.7776/ASK.2023.42.6.491 인용 PDF

Underwater Target Localization Using the Interference Pattern of Broadband Spectrogram Estimated by Three Sensors (3개 센서의 광대역 신호 스펙트로그램에 나타나는 간섭패턴을 이용한 수중 표적의 위치 추정)

Kim, Se-Young;Chun, Seung-Yong;Kim, Ki-Man
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.4
- /
- pp.173-181
- /
- 2007
In this paper, we propose a moving target localization algorithm using acoustic spectrograms. A time-versus-frequency spectrogram provide a information of trajectory of the moving target in underwater. For a source at sufficiently long range from a receiver, broadband striation patterns seen in spectrogram represents the mutual interference between modes which reflected by surface and bottom. The slope of the maximum intensity striation is influenced by waveguide invariant parameter ${\beta}$ and distance between target and sensor. When more than two sensors are applied to measure the moving ship-radited noise, the slope and frequency of the maximum intensity striation are depend on distance between target and receiver. We assumed two sensors to fixed point then form a circle of apollonios which set of all points whose distances from two fixed points are in a constant ratio. In case of three sensors are applied, two circle form an intersection point so coordinates of this point can be estimated as a position of target. To evaluates a performance of the proposed localization algorithm, simulation is performed using acoustic propagation program.
https://doi.org/10.7776/ASK.2007.26.4.173 인용 PDF KSCI

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Cavitation signal detection based on time-series signal statistics (시계열 신호 통계량 기반 캐비테이션 신호 탐지)

Haesang Yang;Ha-Min Choi;Sock-Kyu Lee;Woojae Seong
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.4
- /
- pp.400-405
- /
- 2024
When cavitation noise occurs in ship propellers, the level of underwater radiated noise abruptly increases, which can be a critical threat factor as it increases the probability of detection, particularly in the case of naval vessels. Therefore, accurately and promptly assessing cavitation signals is crucial for improving the survivability of submarines. Traditionally, techniques for determining cavitation occurrence have mainly relied on assessing acoustic/vibration levels measured by sensors above a certain threshold, or using the Detection of Envelop Modulation On Noise (DEMON) method. However, technologies related to this rely on a physical understanding of cavitation phenomena and subjective criteria based on user experience, involving multiple procedures, thus necessitating the development of techniques for early automatic recognition of cavitation signals. In this paper, we propose an algorithm that automatically detects cavitation occurrence based on simple statistical features reflecting cavitation characteristics extracted from acoustic signals measured by sensors attached to the hull. The performance of the proposed technique is evaluated depending on the number of sensors and model test conditions. It was confirmed that by sufficiently training the characteristics of cavitation reflected in signals measured by a single sensor, the occurrence of cavitation signals can be determined.
https://doi.org/10.7776/ASK.2024.43.4.400 인용 PDF

A Study on Performance Evaluation of Hidden Markov Network Speech Recognition System (Hidden Markov Network 음성인식 시스템의 성능평가에 관한 연구)

오세진;김광동;노덕규;위석오;송민규;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.4
- /
- pp.30-39
- /
- 2003
In this paper, we carried out the performance evaluation of HM-Net(Hidden Markov Network) speech recognition system for Korean speech databases. We adopted to construct acoustic models using the HM-Nets modified by HMMs(Hidden Markov Models), which are widely used as the statistical modeling methods. HM-Nets are carried out the state splitting for contextual and temporal domain by PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) algorithm, which is modified the original SSS algorithm. Especially it adopted the phonetic decision tree to effectively express the context information not appear in training speech data on contextual domain state splitting. In case of temporal domain state splitting, to effectively represent information of each phoneme maintenance in the state splitting is carried out, and then the optimal model network of triphone types are constructed by in the parameter. Speech recognition was performed using the one-pass Viterbi beam search algorithm with phone-pair/word-pair grammar for phoneme/word recognition, respectively and using the multi-pass search algorithm with n-gram language models for sentence recognition. The tree-structured lexicon was used in order to decrease the number of nodes by sharing the same prefixes among words. In this paper, the performance evaluation of HM-Net speech recognition system is carried out for various recognition conditions. Through the experiments, we verified that it has very superior recognition performance compared with the previous introduced recognition system.
PDF

A Study of the Vibration Characteristics of a Haptic Vibrator for Horizontal and Vertical Magnetization (수평 및 수직 착자에 대한 햅틱 진동자의 진동특성에 관한 연구)

Ko, Dong Shin;Hur, Deog Jae;Park, Tae Won;Lee, Jai Hyuk;Lee, Sung Su
- Transactions of the Korean Society of Mechanical Engineers A
- /
- v.39 no.4
- /
- pp.415-421
- /
- 2015
This paper describes the study of the design procedure for the step-by-step setup parameters and of the magnetizing method for performance and size reduction in the development of a haptic vibrator. The study of magnetization was accomplished by comparing the electromagnetic force in accordance with the horizontal and the vertical magnetization. The theoretical results indicated that the horizontal magnetization resulted in a better performance. The systematic design of a step-by-step procedure for establishing the design parameters was verified by testing the characteristics of the fabricated prototype product. The vibration response function analysis and electric field analysis were processed by decoupling of the analytical method, and these were determined to be in good agreement with the test results. The design parameters to contributing to the product reliability included the spring height, the welding position, and the coil position. The sensitivity of the electromagnetic field and the performance change were analyzed based on the design parameters. As a result, we proposed a design method to implement a reliability-based, high performance haptic vibrator.
https://doi.org/10.3795/KSME-A.2015.39.4.415 인용 PDF KSCI

Search Result 1,479, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)