통합 검색 | Korea Science

배경음악 분리를 위한 확장된 합성곱을 이용한 멀티 밴드 멀티 스케일 DenseNet (Multi-band multi-scale DenseNet with dilated convolution for background music separation)

허운행;김혜미;권오욱
- 한국음향학회지
- /
- 제38권6호
- /
- pp.697-702
- /
- 2019
방송 콘텐츠의 혼합 신호에서 배경음악 신호를 분리하는 확장된 합성곱을 이용한 멀티 밴드 멀티 스케일 DenseNet을 제안한다. 확장된 합성곱은 스펙트로그램의 다양한 스케일 문맥 정보를 학습하기 용이하도록 한다. 컴퓨터 모의실험 결과, 제안한 구조는 신호대잡음비(Signal to Noise Ratio, SNR) 0 dB, -10 dB의 환경에서 각각 0.15 dB, 0.27 dB의 신호대왜곡비(Signal to Distortion Ratio, SDR)를 개선하였다.
https://doi.org/10.7776/ASK.2019.38.6.697 인용 PDF KSCI

파워스펙트럼 및 후두내시경을 이용한 방언 음성(方言音聲)의 실험적 연구(實驗的硏究): 경상방언 및 전라방언을 중심으로 (Experimental Phonetic Study of Kyungsang and Cholla Dialect Using Power Spectrum and Laryngeal Fiberscope)

김현기;이은영;홍기환
- 음성과학
- /
- 제9권2호
- /
- pp.25-47
- /
- 2002
Human language activity in the information society has been developing the communication system between humans and machines. The aim of this study was to analyze dialectal speech in Korea. One hundred Kyungsang and one hundred Cholla informants participated in this study. A CSL and Flexible laryngeal fiberscope were used for analysis of the acoustic and glottal gestures of all the vowels and consonants. Test words were made on the picture cards and letter cards which contained each vowel and each consonant, respectively. The dialogue between the examiner and the informants was recorded in a question and answer manner. The acoustic results of two dialects were as follows: Kyungsang and Cholla informants showed neutralization between /e/ and /$\varepsilon$. However, the apertures of Kyungsang vowels /i, w, u, o/ were higher than those of Cholla vowels. The /wi/ and /$\varepsilon$/ of Kyungsang Diphthong vowels were shown as simple vowels /i/ and /$\varepsilon$/ in Cholla dialect. The VOT of Cholla dilaect was longer than that of Kyungsang dialect. The fricative frequence of Kyurlgsang dialect was about 1000Hz higher than that of Cholla dialect. The glottal widths on fiberscopic images showed that the consonant durations of Kyungsang and Cholla dialects were correlated all together with the acoustic duration on the spectrogram.
PDF

Perturbation and Perceptual Analysis of Pathological Sustained Vowels according to Signal Typing

이지연;최성희;;한민수;최홍식
- 말소리와 음성과학
- /
- 제2권2호
- /
- pp.109-115
- /
- 2010
In this paper, we investigate a signal typing on the basis of visual impression of distinctive spectrogram. Pathological voices are classified into signal type 1, 2, 3, or 4 to estimate perturbation parameters and to mark perceptual rating based on Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). The results suggest that perturbation analysis can be applied to only type 1 and 2 signals and the perceptual ratings of overall grade increase with each signal type, overall. A good inter-rater reliability is showed among three raters. We recommend that pathological voices should be marked the signal typing and CAPE-V, together, to definitely describe the characteristics of pathological voices.
PDF

중성자(中性子) 방사화(放射化) 분석법(分析法)에 의(依)한 한국인(韓國人) 간장중(肝臟中)의 비소(砒素) 및 Vitamin제제중(製劑中)의 금속(金屬)(CU, Mn)의 정량(定量) (Determination of Arsenic in Korean human liver and manganese, copper in Vitamin prepartions by neutron action analysis)

오수창
- Journal of Pharmaceutical Investigation
- /
- 제4권4호
- /
- pp.17-25
- /
- 1974
1. Neutron acivation analysis of arsenic contained in Korean human liver was studied in the view point of forensic chemistry, using 12 corpses. A sample of 1g was irradiated for 30 mins. in a neutron flux of $1.2{\times}10^{12}n/cm^2/sec$, followed by nitric-sulfuric acid digestion and then by Gutzeit separation. Radio activity was detected by it's scintillation counter. The arsenic content in the liver was found to be $0.01{\mu}g/g$ to $0.15{\mu}g/g$. 2. A rapid and convenient method for the radiochemical determination of minerals by neutron activation analysis was established. After neutron irradiation to the standard soln. of Cu and Mn in pneumatic tube (neutron flux : $1.2{\times}10^{12}n/cm^2/sec$), Cu and Mn were determined by estimating the ratio of the widths under energy peak area in ${\gamma}-ray-spectrogram$. When the standard soln. of Mn and Cu is irradiated for 15 mins. to 18 hrs., recovery test shows that the relative errors are 5.1% and 4.5% for copper and manganese, respectively.
PDF

모돈의 일반 발성음과 발정기 특이음의 비교분석 (Comparative Analysis for General and Estrus-related Vocalizations in Sows)

전중환;연성찬;장홍희
- Journal of Animal Science and Technology
- /
- 제47권1호
- /
- pp.133-140
- /
- 2005
The aim of this study was to divide vocalizations of sows into general(GVs) and estrus-related vocalizations( EVs) and to find out their phonetic characteristics. Ten sows(Landrace) were recorded using digital video recorders twice daily(06: 00 - 08 : 00h and 17: 00 - 19 : 00h) during the anestrus and estrus periods. The GVs and EVs were divided based on the shapes of spectrum and spectrogram. The GVs and EVs were identified as 5 and 3 types, respectively. Pitch, formant I, formant 2, and formant 3 between GVs and EVs were not significantly different(P> 0.05), whereas intensity(P < 0.001), duration(P < 0.05), and formant 4(P < 0.01) were significantly different. Three parameter groups(Group I : Formant vector alone, Group II: Formant veetor+ parameters from time signal, Group III: Formant vector+parameters from time signal-parameters eliminated by stepwise discriminant analysis backward) were compared by discriminant function analysis. The classification system adopted in the Group II represented the higher discrimination rate than those in other groups(Group I : 76.1 0/0, Group II : 88.1 0/0, Group Ill: 87.3 %). These results suggest that EVs are present and intensity, formant 2, and formant 4 are available parameters for discrimination of EVs in sows.
https://doi.org/10.5187/JAST.2005.47.1.133 인용 PDF KSCI

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제15권2호
- /
- pp.729-748
- /
- 2021
Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.
https://doi.org/10.3837/tiis.2021.02.019 인용 PDF KSCI HTML

말소리장애 아동의 말명료도와 음향학적 측정치 간 상관관계 (The Correlation between Speech Intelligibility and Acoustic Measurements in Children with Speech Sound Disorders)

강은영
- 대한통합의학회지
- /
- 제6권4호
- /
- pp.191-206
- /
- 2018
Purpose : This study investigated the correlation between speech intelligibility and acoustic measurements of speech sounds produced by the children with speech sound disorders and children without any diagnosed speech sound disorder. Methods : A total of 60 children with and without speech sound disorders were the subjects of this study. Speech samples were obtained by having the subjects? speak meaningful words. Acoustic measurements were analyzed on a spectrogram using the Multi-speech 3700 program. Speech intelligibility was determined according to a listener's perceptual judgment. Results : Children with speech sound disorders had significantly lower speech intelligibility than those without speech sound disorders. The intensity of the vowel /u/, the duration of the vowel /${\omega}$/, and the second formant of the vowel /${\omega}$/ were significantly different between both groups. There was no difference in voice onset time between the groups. There was a correlation between acoustic measurements and speech intelligibility. Conclusion : The results of this study showed that the speech intelligibility of children with speech sound disorders was affected by intensity, word duration, and formant frequency. It is necessary to complement clinical setting results using acoustic measurements in addition to evaluation of speech intelligibility.
https://doi.org/10.15268/ksim.2018.6.4.191 인용 PDF KSCI

Deep neural network 기반 오디오 표식을 위한 데이터 증강 방법 연구 (Study on data augmentation methods for deep neural network-based audio tagging)

김범준;문현기;박성욱;박영철
- 한국음향학회지
- /
- 제37권6호
- /
- pp.475-482
- /
- 2018
본 논문에서는 DNN(Deep Neural Network) 기반 오디오 표식을 위한 데이터 증강 방법을 연구한다. 본 시스템에서는 오디오 신호를 멜-스펙트로그램으로 변환하여 오디오 표식을 위한 심층신경망의 입력으로 사용한다. 적은 수의 훈련 데이터를 사용하는 경우 발생하는 문제를 해결하기 위해, 타임 스트레칭, 피치 변화, 동적 영역 압축, 블록 혼합 등의 방법을 사용하여 훈련 데이터를 증강시켰다. 사용된 데이터 증강 기법의 최적 파라미터와 최적 조합을 오디오 표식 시뮬레이션을 통해 확인하였다.
https://doi.org/10.7776/ASK.2018.37.6.475 인용 PDF KSCI HTML

Human Laughter Generation using Hybrid Generative Models

Mansouri, Nadia;Lachiri, Zied
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제15권5호
- /
- pp.1590-1609
- /
- 2021
Laughter is one of the most important nonverbal sound that human generates. It is a means for expressing his emotions. The acoustic and contextual features of this specific sound are different from those of speech and many difficulties arise during their modeling process. During this work, we propose an audio laughter generation system based on unsupervised generative models: the autoencoder (AE) and its variants. This procedure is the association of three main sub-process, (1) the analysis which consist of extracting the log magnitude spectrogram from the laughter database, (2) the generative models training, (3) the synthesis stage which incorporate the involvement of an intermediate mechanism: the vocoder. To improve the synthesis quality, we suggest two hybrid models (LSTM-VAE, GRU-VAE and CNN-VAE) that combine the representation learning capacity of variational autoencoder (VAE) with the temporal modelling ability of a long short-term memory RNN (LSTM) and the CNN ability to learn invariant features. To figure out the performance of our proposed audio laughter generation process, objective evaluation (RMSE) and a perceptual audio quality test (listening test) were conducted. According to these evaluation metrics, we can show that the GRU-VAE outperforms the other VAE models.
https://doi.org/10.3837/tiis.2021.05.001 인용 PDF KSCI HTML

음악 감정 분석을 통한 키네틱 타이포그래피 자막 자동 생성 서비스 (Automatic Generation Subtitle Service with Kinetic Typography according to Music Sentimental Analysis)

지영서;이하람;임순범
- 한국멀티미디어학회논문지
- /
- 제24권8호
- /
- pp.1184-1191
- /
- 2021
In a pop song, the creator's intention is communicated to the user through music and lyrics. Lyric meaning is as important as music, but in most cases lyrics are delivered to users in a static form without non-verbal cues. Providing lyrics in a static text format is inefficient in conveying the emotions of a music. Recently, lyrics video with kinetic typography are increasingly provided, but producing them requires expertise and a lot of time. Therefore, in this system, the emotions of the lyrics are found through the analysis of the text of the lyrics, and the deep learning model is trained with the data obtained by converting the melody into a Mel-spectrogram format to find the appropriate emotions for the music. It sets properties such as motion, font, and color using the emotions found in the music, and automatically creates a kinetic typography video. In this study, we tried to enhance the effect of conveying the meaning of music through this system.
https://doi.org/10.9717/kmms.2021.24.8.1184 인용 PDF KSCI HTML

검색결과 240건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)