• Title/Summary/Keyword: acoustic features

Search Result 329, Processing Time 0.03 seconds

How to Retrieve Music using Mood Tags in a Folksonomy

  • Chang Bae Moon;Jong Yeol Lee;Byeong Man Kim
    • Journal of Web Engineering
    • /
    • v.20 no.8
    • /
    • pp.2335-2360
    • /
    • 2021
  • A folksonomy is a classification system in which volunteers collaboratively create and manage tags to annotate and categorize content. The folksonomy has several problems in retrieving music using tags, including problems related to synonyms, different tagging levels, and neologisms. To solve the problem posed by synonyms, we introduced a mood vector with 12 possible moods, each represented by a numeric value, as an internal tag. This allows moods in music pieces and mood tags to be represented internally by numeric values, which can be used to retrieve music pieces. To determine the mood vector of a music piece, 12 regressors predicting the possibility of each mood based on acoustic features were built using Support Vector Regression. To map a tag to its mood vector, the relationship between moods in a piece of music and mood tags was investigated based on tagging data retrieved from Last.fm, a website that allows users to search for and stream music. To evaluate retrieval performance, music pieces on Last.fm annotated with at least one mood tag were used as a test set. When calculating precision and recall, music pieces annotated with synonyms of a given query tag were treated as relevant. These experiments on a real-world data set illustrate the utility of the internal tagging of music. Our approach offers a practical solution to the problem caused by synonyms.

Simulation Techniques for Mid-Frequency Vibro-Acoustics Virtual Tools For Real Problems

  • Desmet, Wim;Pluymers, Bert;Atak, Onur;Bergen, Bart;Deckers, Elke;Huijssen, Koos;Van Genechten, Bert;Vergote, Karel;Vandepitte, Dirk
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2010.05a
    • /
    • pp.49-49
    • /
    • 2010
  • The most commonly used numerical modelling techniques for acoustics and vibration are based on element based techniques, such as the nite element and boundary element method. Due to the huge computational eorts involved, the use of these deterministic techniques is practically restricted to low-frequency applications. For high-frequency modelling, probabilistic techniques such as SEA are well established. However, there is still a wide mid-frequency range, for which no adequate and mature prediction techniques are available. In this frequency range, the computational eorts of conventional element based techniques become prohibitively large, while the basic assumptions of the probabilistic techniques are not yet valid. In recent years, a vast amount of research has been initiated in a quest for an adequate solution for the current midfrequency problem. One family of research methods focuses on novel deterministic approaches with an enhanced convergence rate and computational eciency compared to the conventional element based methods in order to shift the practical frequency limitation towards the mid-frequency range. Amongst those techniques, a wave based prediction technique using an indirect Tretz approach is being developed at the K.U.Leuven - Noise and Vibration Research group. This paper starts with an outline of the major features of the mid-frequency modelling challenge and provides a short overview of the current research activities in response to this challenge. Next, the basic concepts of the wave based technique and its hybrid coupling with nite element schemes are described. Various validations on two- and threedimensional acoustic, elastic, poro-elastic and vibro-acoustic examples are given to illustrate the potential of the method and its benecial performance as compared to conventional element based methods. A closing part shares some views on the open issues and future research directions.

  • PDF

Automatic speech recognition using acoustic doppler signal (초음파 도플러를 이용한 음성 인식)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.74-82
    • /
    • 2016
  • In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.

Differentiation of Vocal Cyst and Polyp by High-Piched Phonation Characteristics (성대낭종과 성대폴립 간의 고음발성 양상의 차이)

  • Lee, Jong-Ik;Jeong, Go-Eun;Kim, Seong-Tae;Kim, Sang-Yeon;Nam, Soon-Yuhl;Kim, Sang-Yoon;Roh, Jong-Lyel;Choi, Seung-Ho
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.23 no.1
    • /
    • pp.48-51
    • /
    • 2012
  • Background and Objectives : Vocal fold cyst is generally treated by surgical resection, it has a difference with vocal fold polyp, treated by conservative management first. Decrease in mucosal waves is known as main diagnostic criteria of vocal fold cyst. Sometimes there is a difficulty for diffrential diagnosis between cyst and polyp only by endoscopic examination. The purpose of the study is to identify the objective features of vocal cyst and polyp on the basis of voice analysis for the proper differential diagnosis, especially at high pitched phonation. Materials and Method : The voice analysis was done in 15 focal fold cyst patients and 42 vocal fold polyp. Parameters of perceptual assessment, acoustic and aerodynamic measure, and voice range profile were compared between two groups. Results : Vocal fold cyst patients showed significantly reduced MPT by acoustic and aerodynamic analysis, narrowed frequency-range and low maximun frequency by voice range profile analysis compared with vocal fold polyp patient. Maximun frequency 381 Hz is established for cut off value, differential diagnosis between cyst and polyp (ROC analysis, sensitivity 60%, specificity 68%). Conclusion : Voice analysis is helpful for differential diagnosis between vocal fold cyst and polyp, especially there is a difficulty for distinguish cyst from polyp at clinical situation by endoscopic examination. The result of decreased maximum frequncy at vocal fold cyst supports incomplete high-pitched phonation and falsetto regester at vocal fold cyst patients due to decreased mucosal wave, compared with vocal fold polyp patients.

  • PDF

Characteristics of Tide-induced Flow and its Effect on Pollutant Patterns Near the Ocean Outfall of Wastewater Treatment Plants in Jeju Island in Late Spring (제주도 하수처리장 해양방류구 인근해역의 늦은 봄철 조류 특성과 조석잔차류에 의한 오염물질의 분포 특성)

  • KIM, JUN-TECK;HONG, JI-SEOK;MOON, JAE-HONG;KIM, SANG-HYUN;KIM, TAE-HOON;KIM, SOO-KANG
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.26 no.2
    • /
    • pp.63-81
    • /
    • 2021
  • In this study, we investigated the tide-induced flow patterns near the ocean outfall of the Jeju and Bomok Wastewater Treatment Plants (WTP) in Jeju Island by using measurements of Acoustic Doppler Current Meter (ADCP) and a numerical experiment with inserting passive tracer into a regional ocean model. In late spring of 2018, the ADCP measurements showed that tidal currents dominate the flow patterns as compared to the non-tidal components in the outfall regions. According to harmonic analysis, the dominant type of tides is mixed of diurnal and semi-diurnal but predominantly semidiurnal, showing stronger oscillations in the Jeju WTP than those in the Bomok WTP. The tidal currents oscillate parallel to the isobath in both regions, but the rotating direction is different each other: an anti-clockwise direction in the Jeju WTP and a clockwise in the Bomok WTP. Of particular interest is the finding that the residual current mainly flows toward the coastline across the isobath, especially at the outfall of the Bomok WTP. Our model successfully captures the features of tidal currents observed near the outfall in both regions and indicates possibly high persistent pollutant accumulation along the coasts of Bomok.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Cavitation signal detection based on time-series signal statistics (시계열 신호 통계량 기반 캐비테이션 신호 탐지)

  • Haesang Yang;Ha-Min Choi;Sock-Kyu Lee;Woojae Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.4
    • /
    • pp.400-405
    • /
    • 2024
  • When cavitation noise occurs in ship propellers, the level of underwater radiated noise abruptly increases, which can be a critical threat factor as it increases the probability of detection, particularly in the case of naval vessels. Therefore, accurately and promptly assessing cavitation signals is crucial for improving the survivability of submarines. Traditionally, techniques for determining cavitation occurrence have mainly relied on assessing acoustic/vibration levels measured by sensors above a certain threshold, or using the Detection of Envelop Modulation On Noise (DEMON) method. However, technologies related to this rely on a physical understanding of cavitation phenomena and subjective criteria based on user experience, involving multiple procedures, thus necessitating the development of techniques for early automatic recognition of cavitation signals. In this paper, we propose an algorithm that automatically detects cavitation occurrence based on simple statistical features reflecting cavitation characteristics extracted from acoustic signals measured by sensors attached to the hull. The performance of the proposed technique is evaluated depending on the number of sensors and model test conditions. It was confirmed that by sufficiently training the characteristics of cavitation reflected in signals measured by a single sensor, the occurrence of cavitation signals can be determined.

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

  • Pok, Gou-Chol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.3
    • /
    • pp.197-205
    • /
    • 2020
  • Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers (머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구)

  • Jihyun Mun;Sunhee Kim;Myeong Ju Kim;Jiwon Ryu;Sejoong Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.45-56
    • /
    • 2022
  • This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.

LOFAR/DEMON grams compression method for passive sonars (수동소나를 위한 LOFAR/DEMON 그램 압축 기법)

  • Ahn, Jae-Kyun;Cho, Hyeon-Deok;Shin, Donghoon;Kwon, Taekik;Kim, Gwang-Tae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.1
    • /
    • pp.38-46
    • /
    • 2020
  • LOw Frequency Analysis Recording (LOFAR) and Demodulation of Envelop Modulation On Noise (DEMON) grams are bearing-time-frequency plots of underwater acoustic signals, to visualize features for passive sonar. Those grams are characterized by tonal components, for which conventional data coding methods are not suitable. In this work, a novel LOFAR/DEMON gram compression algorithm based on binary map and prediction methods is proposed. We first generate a binary map, from which prediction for each frequency bin is determined, and then divide a frame into several macro blocks. For each macro block, we apply intra and inter prediction modes and compute residuals. Then, we perform the prediction of available bins in the binary map and quantize residuals for entropy coding. By transmitting the binary map and prediction modes, the decoder can reconstructs grams using the same process. Simulation results show that the proposed algorithm provides significantly better compression performance on LOFAR and DEMON grams than conventional data coding methods.