• Title/Summary/Keyword: 음합성

Search Result 333, Processing Time 0.026 seconds

Wavelet-based Pitch Detector for 2.4 kbps Harmonic-CELP Coder (2.4 kbps 하모닉-CELP 코더를 위한 웨이블렛 피치 검출기)

  • 방상운;이인성;권오주
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.717-726
    • /
    • 2003
  • This paper presents the methods that design the Wavelet-based pitch detector for 2,4 kbps Harmonic-CELP Coder, and that achieve the effective waveform interpolation by decision window shape of the transition region, Waveform interpolation coder operates by encoding one pitch-period-sized segment, a prototype segment, of speech for each frame, generate the smooth waveform interpolation between the prototype segments for voiced frame, But, harmonic synthesis of the prototype waveforms between previous frame and current frame occur not only waveform errors but also discontinuity at frame boundary on that case of pitch halving or doubling, In addtion, in transition region since waveform interpolation coder synthesizes the excitation waveform by using overlap-add with triangularity window, therefore, Harmonic-CELP fail to model the instantaneous increasing speech and synthesis waveform linearly increases, First of all, in order to detect the precise pitch period, we use the hybrid 1st pitch detector, and increse the precision by using 2nd ACF-pitch detector, Next, in order to modify excitation window, we detect the onset, offset of frame by GCI, As the result, pitch doubling is removed and pitch error rate is decreased 5.4% in comparison with ACF, and is decreased 2,66% in comparison with wavelet detector, MOS test improve 0.13 at transition region.

Comparison of Korean Real-time Text-to-Speech Technology Based on Deep Learning (딥러닝 기반 한국어 실시간 TTS 기술 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.640-645
    • /
    • 2021
  • The deep learning based end-to-end TTS system consists of Text2Mel module that generates spectrogram from text, and vocoder module that synthesizes speech signals from spectrogram. Recently, by applying deep learning technology to the TTS system the intelligibility and naturalness of the synthesized speech is as improved as human vocalization. However, it has the disadvantage that the inference speed for synthesizing speech is very slow compared to the conventional method. The inference speed can be improved by applying the non-autoregressive method which can generate speech samples in parallel independent of previously generated samples. In this paper, we introduce FastSpeech, FastSpeech 2, and FastPitch as Text2Mel technology, and Parallel WaveGAN, Multi-band MelGAN, and WaveGlow as vocoder technology applying non-autoregressive method. And we implement them to verify whether it can be processed in real time. Experimental results show that by the obtained RTF all the presented methods are sufficiently capable of real-time processing. And it can be seen that the size of the learned model is about tens to hundreds of megabytes except WaveGlow, and it can be applied to the embedded environment where the memory is limited.

Intonatin Conversion using the Other Speaker's Excitation Signal (他話者의 勵起信號를 이용한 抑揚變換)

  • Lee, Ki-Young;Choi, Chang-Seok;Choi, Kap-Seok;Lee, Hyun-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.21-28
    • /
    • 1995
  • In this paper an intonation conversion method is presented which provides the basic study on converting the original speech into the artificially intoned one. This method employs the other speaker's excitation signals as intonation information and the original vocal tract spectra, which are warped with the other speaker's ones by using DTW. as vocal features, and intonation converted speech signals are synthesized through short-time inverse Fourier transform(STIFT) of their product. To evaluate the intonation converted speech by this method, we collect Korean single vowels and sentences spoken by 30 males and compare fundamental frequency contours spectrograms, distortion measures and MOS test between the original speech and the converted one. The result shows that this method can convert and speech into the intoned one of the other speaker's.

  • PDF

Lipase-Catalyzed Synthesis of Structured Lipids with Capric and Conjugated Linoleic Acid in a Stirred-Batch Type Reactor (대두유로부터 Lipase를 이용한 재구성 지질의 합성 및 특성)

  • 신정아;이기택
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.33 no.7
    • /
    • pp.1175-1179
    • /
    • 2004
  • Structured lipid (SL) was produced from soybean oil with molar ratio of 1:2:2 (soybean oil:capric acid:CLA) using Chirazyme L-2 lipase (4% by weight of total substrates). The reaction was conducted for 24 hr at 55$^{\circ}C$ in a 1 L stirred-batch type reactor. SL-soybean oil contained 4.9 mol% capric acid and 4.1 mol% CLA, respectively. Iodine value of SL-soybean oil was reduced than that of soybean oil due to the incorporated capric acids. Tocopherol content in SL-soybean oil was 18.2 mg/l00 g. SL-soybean oil appeared more yellowish color than soybean oil. Reverse-phase HPLC showed that SL-triacylglycerol species containing capric acid consisted of about 12.6 area%.

Turtle Neck Syndrome Posture Correction Service Using CNN-based Learning Model (CNN기반의 학습모델을 활용한 거북목 증후군 자세 교정 시스템)

  • Han, Ji-Ye;Park, Jin-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.47-55
    • /
    • 2020
  • Along with the increased use of smart devices, the incidence of turtle neck syndrome among modern people has increased. Turtle neck syndrome is a posture in which the head is forward compared to the torso due to longer front muscles in the neck and shorter upper muscles, and it is more effective to fix the usual posture habits than surgery or medication. Thus, in this paper, a system is proposed to detect and warn posture that can cause turtle neck syndrome in real time. Image data of correct posture and turtle neck posture are collected to create a CNN-based learning model. Using only the webcam(Built-in camera), the sitting position that enters the camera is verified in real time through the learning model, and if it is a turtle neck position, it generates a warning sound and induces the correct posture. The system can induce people to correct their usual posture habits to treat turtle neck syndrome and prevent more serious diseases such as neck discs.

Flexural strength of high-strength concrete filled steel tube columns strengthened by carbon fiber sheets (탄소섬유쉬트로 보강한 고강도 콘크리트 충전강관(CFT) 기둥의 휨내력에 관한 연구)

  • Park, Jai-Woo;Hong, Young-Kyun;Hong, Gi-Soup
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.12 no.1
    • /
    • pp.21-28
    • /
    • 2008
  • The CFT (Concrete Filled Steel Tube) columns became popular in high rise building construction due to not only its composite effect but also economic advantage. However, it has been pointed out in various previous researches that the current practice in CFT columns may lead the steel tube to probable local buckling at critical sections of the columns right after yielding. To resolve such a problem, the TR-CFT (Transversely Reinforced Concrete Filled Steel Tube) column is proposed to control or at least delay the local buckling state at the critical section by wrapping the CFT columns with carbon fiber sheet. The validity of the proposed column system is validated through the present paper by observing the experimental performance and comparing it with the analytical prediction of the TR-CFT columns with hish strength concrete. It is also shown that the current design code provisions such as ACI-318, in which the contribution of concrete confining effect filled in steel tube is not appropriately accounted for, may contain too much conservatism.

Bird sounds classification by combining PNCC and robust Mel-log filter bank features (PNCC와 robust Mel-log filter bank 특징을 결합한 조류 울음소리 분류)

  • Badi, Alzahra;Ko, Kyungdeuk;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.1
    • /
    • pp.39-46
    • /
    • 2019
  • In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.

Acceleration signal-based haptic texture recognition according to characteristics of object surface material using conformer model (Conformer 모델을 이용한 물체 표면 재료의 특성에 따른 가속도 신호 기반 햅틱 질감 인식)

  • Hyoung-Gook Kim;Dong-Ki Jeong;Jin-Young Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.3
    • /
    • pp.214-220
    • /
    • 2023
  • In this paper, we propose a method to improve texture recognition performance from haptic acceleration signals representing the texture characteristics of object surface materials by using a Conformer model that combines the advantages of a convolutional neural network and a transformer. In the proposed method, three-axis acceleration signals generated by impact sound and vibration are combined into one-dimensional acceleration data while a person contacts the surface of the object materials using a tool such as a stylus , and the logarithmic Mel-spectrogram is extracted from the haptic acceleration signal similar to the audio signal. Then, Conformer is applied to the extracted the logarithmic Mel-spectrogram to learn main local and global frequency features in recognizing the texture of various object materials. Experiments on the Lehrstuhl für Medientechnik (LMT) haptic texture dataset consisting of 60 materials to evaluate the performance of the proposed model showed that the proposed method can effectively recognize the texture of the object surface material better than the existing methods.

Relationships of Serum Leptin Levels with Bone Metabolism in the Childhood Obesity (소아 비만에서 Leptin과 골대사의 연관성)

  • Kim, Eun Young;Rho, Young il;Yang, Eun Seok;Moon, Kyung Rae;Park, Sang Kee;Park, Yeong Bong;Lee, Young Hwa
    • Pediatric Gastroenterology, Hepatology & Nutrition
    • /
    • v.9 no.2
    • /
    • pp.226-232
    • /
    • 2006
  • Purpose: The aim of this study was to evaluate the influence of leptin on biochemical markers of bone metabolism in childhood obesity. Methods: A total of 50 male children (25 obese and 25 controls) were recruited from the pediatric outpatient clinic at the Chosun University Hospital from November 1st 2005 to May 30th 2006. BMI, body fat percentage, serum leptin, bone-specific alkaline phosphatase (B-ALP), C-terminal propeptide of type 1 collagen (CICP), total deoxypyridinoline crosslinks (total DPD) were measured. The correlations of leptin with BMI, body fat percentage, B-ALP, CICP, total DPD were analyzed by Pearson's correlation. In a multiple stepwise regression analysis, leptin after correction for body weight was evaluated if there was a correlation with biochemical markers of bone formation and resorption respectively. Results: The leptin levels of the obese group were significantly higher than those of the control group (p=0.012). In the obese group, the leptin level was significantly positively correlated with the BMI (r=0.551, p=0.01) and the percentage of body fat (r=0.584, p=0.018). In the obese group, of bone markers, B-ALP (r=-0.613, p=0.026) and CICP (r=-0.583, p=0.037) were negatively correlated with leptin. B-ALP (r=-0.728, p=0.007) and CICP (r=-0.684, p=0.014) were negatively correlated with leptin when corrected for body weight. In the control group, bone markers were not correlated with leptin. In the multiple stepwise regression analyses, there was a negative correlation between the leptin and B-ALP (Y=-39.653X+356.341, p=0.026), CICP (Y=-13.437X+ 116.013, p=0.037) respectively in the obese group. Conclusion: Leptin was a significant factor in the bone formation but not in bone resorption in childhood obesity.

  • PDF

The Analysis of affection on electromagnetic wave for U-healthcare Remote Diagnosis System (최적의 U-헬스케어용 원격진료서비스 시스템에 대한 전자파적합성 분석)

  • Jeoung, Eui-Bung;Lee, You-Yub;Song, Je-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.11
    • /
    • pp.5442-5446
    • /
    • 2012
  • A u-healthcare remote diagnosis system is proposed for chronic disease and medical vulnerable groups check the health systematically and support for the most optimal environment to improve the quality of life. The u-healthcare remote diagnosis system using wireless measure the thoracic sound in the chest. And this is demonstrated that the system using radio frequency is not be affected by the electromagnetic wave with the use of an experiment and by confirming that this u-healthcare remote diagnosis system can not affect the doctors and the patients.