• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.028 seconds

Real data-based active sonar signal synthesis method (실데이터 기반 능동 소나 신호 합성 방법론)

  • Yunsu Kim;Juho Kim;Jongwon Seok;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.1
    • /
    • pp.9-18
    • /
    • 2024
  • The importance of active sonar systems is emerging due to the quietness of underwater targets and the increase in ambient noise due to the increase in maritime traffic. However, the low signal-to-noise ratio of the echo signal due to multipath propagation of the signal, various clutter, ambient noise and reverberation makes it difficult to identify underwater targets using active sonar. Attempts have been made to apply data-based methods such as machine learning or deep learning to improve the performance of underwater target recognition systems, but it is difficult to collect enough data for training due to the nature of sonar datasets. Methods based on mathematical modeling have been mainly used to compensate for insufficient active sonar data. However, methodologies based on mathematical modeling have limitations in accurately simulating complex underwater phenomena. Therefore, in this paper, we propose a sonar signal synthesis method based on a deep neural network. In order to apply the neural network model to the field of sonar signal synthesis, the proposed method appropriately corrects the attention-based encoder and decoder to the sonar signal, which is the main module of the Tacotron model mainly used in the field of speech synthesis. It is possible to synthesize a signal more similar to the actual signal by training the proposed model using the dataset collected by arranging a simulated target in an actual marine environment. In order to verify the performance of the proposed method, Perceptual evaluation of audio quality test was conducted and within score difference -2.3 was shown compared to actual signal in a total of four different environments. These results prove that the active sonar signal generated by the proposed method approximates the actual signal.

An Implementation of Acoustic Echo Canceller Using Adaptive Filtering in Modulated Lapped Transform Domain (Modulated Lapped Transform 영역에서 적응 필터링을 이용한 음향 반향 제거기의 구현)

  • 백수진;박규식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.425-433
    • /
    • 2003
  • Acoustic Echo Canceller (AEC) is a signal processing system for removing unwanted echo signals in teleconference and hands-free communication. Least mean square (LMS) algorithm is one of the adaptive echo cancellation algorithms and it has been most attractive because of its simplicity and robustness. However, the convergence properties of the LMS algorithm degrade with highly correlated input signals such as speech. For this reason, transform-domain adaptive filtering algorithm was introduced to decorrelate the colored input samples by using the orthogonal transform matrix such as DCT, DFT and then LMS adaptive filtering process is applied. In this paper, we propose a MLT domain adaptive echo canceller base on the MLT (Modulated lapped Transform) orthogonal transform matrix. The proposed algorithm achieves high decorrelation efficiency and fast convergence speed via modulated lapped transform of size 2NXN instead of NXN unitary transform such as DCT, DFT, Hadamad and it is applied to the acoustical echo cancellation system. Form the computer simulation with both synthesis and real speech, the proposed MLT domain adaptive echo canceller shows approximately twice faster convergence speed and 20∼30 ㏈ ERLE improvements over the DCT frequency domain acoustic echo cancellation system.

Automaitc Generation of Fashion Image Dataset by Using Progressive Growing GAN (PG-GAN을 이용한 패션이미지 데이터 자동 생성)

  • Kim, Yanghee;Lee, Chanhee;Whang, Taesun;Kim, Gyeongmin;Lim, Heuiseok
    • Journal of Internet of Things and Convergence
    • /
    • v.4 no.2
    • /
    • pp.1-6
    • /
    • 2018
  • Techniques for generating new sample data from higher dimensional data such as images have been utilized variously for speech synthesis, image conversion and image restoration. This paper adopts Progressive Growing of Generative Adversarial Networks(PG-GANs) as an implementation model to generate high-resolution images and to enhance variation of the generated images, and applied it to fashion image data. PG-GANs allows the generator and discriminator to progressively learn at the same time, continuously adding new layers from low-resolution images to result high-resolution images. We also proposed a Mini-batch Discrimination method to increase the diversity of generated data, and proposed a Sliced Wasserstein Distance(SWD) evaluation method instead of the existing MS-SSIM to evaluate the GAN model.

Design of Programmable SC Filter (프로그램 가능한 SC Filter의 설계)

  • 이병수;이종악
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.3
    • /
    • pp.172-178
    • /
    • 1986
  • The recent interest in the design of filters is motivatied by the fact that such filter can be fully integrated using standard metal-oxide-semiconductor processing technology. This is due to replacing all the resistors in the active RC filter network by the switched capacitors. The voltage gain of a SC filter depends only on the rations of capacitance and these ratios can be obtained and maintained to high accuracy. Therefore, it is known that a switched capacitor is much better than a resistor in temperature and linearity characteristics. This paper proposed a programmable SC filter and proved the fact that ${omega}_0$ Q and G of this circuit can be controlled by digital signal. Experiments show that SC filter remains the low sensitivities but it can't avoid little influence of parasitic capacitance. As the transfer characteristic of the SC filter is varied with sampling frequency and resistor array, SC filtering technigue can be applied for digital processing, speech analysis and synthesis and so on.

  • PDF

Sinusoidal Modeling of Polyphonic Audio Signals Using Dynamic Segmentation Method (동적 세그멘테이션을 이용한 폴리포닉 오디오 신호의 정현파 모델링)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.58-68
    • /
    • 2000
  • This paper proposes a sinusoidal modeling of polyphonic audio signals. Sinusoidal modeling which has been applied well to speech and monophonic signals cannot be applied directly to polyphonic signals because a window size for sinusoidal analysis cannot be determined over the entire signal. In addition, for high quality synthesized signal transient parts like attacks should be preserved which determines timbre of musical instrument. In this paper, a multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in sinusoidal modeling a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. The improved dynamic segmentation is proposed which shows better performance about transients and reduced computation. For various polyphonic audio signals the result of simulation shows the suggested sinusoidal modeling can model polyphonic audio signals without loss of perceptual quality.

  • PDF

Development of Voice Activity Detection Algorithm for Elderly Voice based on the Higher Order Differential Energy Operator (고차 미분에너지 기반 노인 음성에서의 음성 구간 검출 알고리즘 연구)

  • Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.249-255
    • /
    • 2016
  • Since the elderly voices include a lot of noise caused by physiological changes in respiration, phonation, and resonance, the performance of the convergence health-care equipments such as speech recognition, synthesis, analysis program done by elderly voice is deteriorated. Therefore it is necessary to develop researches to operate health-care instruments with elderly voices. In this study, a voice activity detection using a symmetric higher-order differential energy function (SHODEO) was developed and was compared with auto-correlation function(ACF) and the average magnitude difference function(AMDF). It was confirmed to have a better performance than other methods in the voice interval detection. The voice activity detection will be applied to a voice interface for the elderly to improve the accessibility of the smart devices.

Optimum Pattern Synthesis for a Microphone Array (마이크로폰 어레이를 위한 최적 패턴 형성)

  • Chang, Byoung-Kun;Kwon, Tae-Neung;Byun, Youn-Shik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.47-53
    • /
    • 1997
  • This paper concerns an efficient approach to forming a beam pattern of a microphone array to deal with broadband signals such as speech in a teleconference. A numerical method is proposed to find updated location of sidelobes for equalizaing the sidelobes via perturbation of array parameters such as array weight or microphone spacing. Thus the microphone array is optimized in a Dolph-Chebyshev sense such that directional or background noises incident in an array visual range are eliminated efficiently. It is shown that perturbation of microphone spacing yields an optimum pattern more appropriate for dealing with broadband signals than that of array weight. Also, a novel method is proposed to find a beam pattern which is robust with respect to sidelobe in a scanning situation. Computer simulation results are presented.

  • PDF

Physiologic Phonetics for Korean Stop Production (한국어 자음생성의 생리음성학적 특성)

  • Hong, Ki-Hwan;Yang, Yoon-Soo
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.17 no.2
    • /
    • pp.89-97
    • /
    • 2006
  • The stop consonants in Korean are classified into three types according to the manner of articulation as unaspirated (UA), slightly aspirated (SA) and heavily aspirated (HA) stops. Both the UA and the HA types are always voiceless in any environment. Generally, the voice onset time (VOT) could be measured spectrographically from release of consonant burst to onset of following vowel. The VOT of the UA type is within 20 msec of the burst, and about 40-50 msec in the SA and 50-70 msec in the HA. There have been many efforts to clarify properties that differentiate these manner categories. Umeda, et $al^{1)}$ studied that the fundamental frequency at voice onset after both the UA and HA consonants was higher than that for the SA consonants, and the voice onset times were longest in the HA followed by the SA and UA. Han, et $al^{2)}$ reported in their speech synthesis and perception studies that the SA and UA stops differed primarily in terms of a gradual versus a relatively rapid intensity build-up of the following vowel after the stop release. Lee, et $al^{3)}$ measured both the intraoral and subglottal air pressure that the subglottal pressure was higher for the HA stop than for the other two stops. They also compared the dynamic pattern of the subglottal pressure slope for the three categories and found that the HA stop showed the most rapid increase in subglottal pressure in the time period immediately before the stop release. $Kagaya^{4)}$ reported fiberscopic and acoustic studies of the Korean stops. He mentioned that the UA type may be characterized by a completely adducted state of the vocal folds, stiffened vocal folds and the abrupt decreasing of the stiffness near the voice onset, while the HA type may be characterized by an extensively abducted state of the vocal folds and a heightened subglottal pressure. On the other hand, none of these positive gestures are observed for the SA type. Hong, et $al^{5)}$ studied electromyographic activity of the thyroarytenoid and posterior cricoarytenoid (PCA) muscles during stop production. He reported a marked and early activation of the PCA muscle associated with a steep reactivation of the thyroarytenoid muscle before voice onset in the production of the HA consonants. For the production of the UA consonants, little or no activation of the PCA muscle and earliest and most marked reactivation of the thyroarytenoid muscle were characteristic. For the SA consonants, he reported a more moderate activation of the PCA muscle than for the UA consonant, and the least and the latest reactivation of the thyroarytenoid muscle. Hong, et $al^{6)}$ studied the observation of the vibratory movements of vocal fold edges in terms of laryngeal gestures according to the different types of stop consonants. The movements of vocal fold edges were evaluated using high speed digital images. EGG signals and acoustic waveforms were also evaluated and related to the vibratory movements of vocal fold edges during stop production.

  • PDF

Cyclosporin A-induced Gingival Overgrowth is Closely Associated with Regulation Collagen Synthesis by the Beta Subunit of Prolyl 4-hydroxylase and Collagen Degradation by Testican 1-mediated Matrix Metalloproteinase-2 Expression

  • Park, Seong-Hee;Kim, Jae-Yoen;Kim, Hyun-Jeong;Park, Kwang-Kyun;Cho, Kyoo-Sung;Choi, Seong-Ho;Chung, Won-Yoon
    • International Journal of Oral Biology
    • /
    • v.33 no.4
    • /
    • pp.205-211
    • /
    • 2008
  • Gingival overgrowth can cause dental occlusion and seriously interfere with mastication, speech, and dental hygiene. It is observed in 25 to 81% of renal transplant patients treated with cyclosporine A (CsA). CsA-induced gingival overgrowth (CIGO) is caused by quantitative alteration of the extracellular matrix components, particularly collagen. However, the molecular mechanisms involved in the pathogenesis of CIGO remain poorly understood, despite intense clinical and laboratory investigations. The aim of the present work is to identify differentially expressed genes closely associated with CIGO. Human gingival fibroblasts were isolated by primary explant culture of gingival tissues from five healthy subjects (HGFs) and two patients with the CIGO (CIGO-HGFs). The proliferative activity of CsA-treated HGFs and CIGO-HGFs was examined using the MTT assay. The identification of differentially expressed genes in CsA-treated CIGO-HGF was performed by differential display reverse transcriptase-polymerase chain reaction (RT-PCR) followed by DNA sequencing. CsA significantly increased the proliferation of two HGFs and two CIGO-HGFs, whereas three HGFs were not affected. Seven genes, including the beta subunit of prolyl 4-hydroxylase (P4HB) and testican 1, were upregulated by CsA in a highly proliferative CIGO-HGF. The increased P4HB and testican-1 mRNA levels were confirmed in CsA-treated CIGO-HGFs by semiquantitative RT-PCR. Furthermore, CsA increased type I collagen mRNA levels and suppressed MMP-2 mRNA levels, which are regulated by P4HB and testican-1, respectively. These results suggest that CsA may induce gingival overgrowth through the upregulation of P4HB and testican-1, resulting in the accumulation of extracellular matrix components.

A Research Review of High-technology AAC Intervention for Individuals with Disabilities (장애인을 위한 하이-테크놀로지 보완·대체의사소통체계 실험 연구 동향 분석)

  • Song, Jaeok;Jeon, Byung-un
    • 재활복지
    • /
    • v.20 no.4
    • /
    • pp.203-228
    • /
    • 2016
  • The purpose of this study was to find out the recent trend of high-tech AAC intervention studies for individuals with disabilities. Electronic database searches were completed to identify studies published between 2009 and 2016. 46 studies were identified for inclusion in this review. The studies were classified as participants, research design, intervention settings, independent variables, dependent variables, communication skills by High-tech device, type of high-tech AAC device. Across these studies, intervention was provided to total of 126 participants. Most participants are aged 6-11 and the most common diagnosis was autistic spectrum disorder. Most common study designs were multiple probe design and multiple treatment design. The majority of studies implemented interventions in a special education school(classroom) setting. The majority of studies implemented interventions to compare the effect of high-tech and low-tech AAC device interventions. The majority of targeted behavioral outcomes were communication skills. Tablet PC was the most frequently used for intervention in both domestic and foreign studies. The most common softwares were 'My talky' in domestic studies and 'Proloquo2Go' in foreign studies. The synthesis of evidence describing views of users and providers and the implementation of high-tech AAC device can provide valuable data to inform intervention studies and functional outcome measures. Suggestions for the future research are discussed.