• Title/Summary/Keyword: 음성레벨

Search Result 138, Processing Time 0.025 seconds

Optimizing Wavelet in Noise Canceler by Deep Learning Based on DWT (DWT 기반 딥러닝 잡음소거기에서 웨이블릿 최적화)

  • Won-Seog Jeong;Haeng-Woo Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.113-118
    • /
    • 2024
  • In this paper, we propose an optimal wavelet in a system for canceling background noise of acoustic signals. This system performed Discrete Wavelet Transform(DWT) instead of the existing Short Time Fourier Transform(STFT) and then improved noise cancellation performance through a deep learning process. DWT functions as a multi-resolution band-pass filter and obtains transformation parameters by time-shifting the parent wavelet at each level and using several wavelets whose sizes are scaled. Here, the noise cancellation performance of several wavelets was tested to select the most suitable mother wavelet for analyzing the speech. In this study, to verify the performance of the noise cancellation system for various wavelets, a simulation program using Tensorflow and Keras libraries was created and simulation experiments were performed for the four most commonly used wavelets. As a result of the experiment, the case of using Haar or Daubechies wavelets showed the best noise cancellation performance, and the mean square error(MSE) was significantly improved compared to the case of using other wavelets.

A study on the characteristics and pathogenicity of Aeromonas veronii isolated from infected goldfish (Carassius auratus) (피부 궤양이 발생한 금붕어(Carassius auratus)에서 분리한 Aeromonas veronii의 특성 및 병원성 분석)

  • Hyeon Ki Jung;Min Su Kim;Sok Ho Kim;Min Soon Choi
    • Journal of fish pathology
    • /
    • v.37 no.1
    • /
    • pp.79-88
    • /
    • 2024
  • Aeromonas spp. infections have been reported to cause significant economic losses not only in the ornamental fish industry but also in aquaculture. In December 2022-January 2023, an Aeromonas infection occurred on a goldfish in korea, A gram-negative bacterium was isolated from the skin and internal organs of infected goldfish (Carassius auratus). The results showed that the isolate was identified as Aeromonas veronii using 16S rDNA targeted oilgpnucleotide primers, furthermore characteristics of A. veronii was confirmed by enterotoxin gene, infectious experiment, antibiotic resistance. In-vivo pathogenicity of isolates to goldfsh resulted in 100% mortality in challenged host within one week of post experiment injection. As a result of PCR analysis targeting three enterotoxin-encoding genes, cytotoxic enterotoxin (act) was identified in A. veronii isolate in this study. Antimicrobial susceptibility pattern of isolate showed it was to susceptible to most antimicrobial agents tested but resistant to ampicillin, imipenem, meropenem and clindamycin.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Perceptive evaluation of Korean native speakers on the polysemic sentence final ending produced by Chinese Korean learners (KFL중국인학습자들의 한국어 동형다의 종결어미 발화문에 대한 원어민화자의 지각 평가 양상)

  • Yune, Youngsook
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.27-36
    • /
    • 2020
  • The aim of this study is to investigate the perceptive aspects of the polysemic sentence final ending "-(eu)lgeol" produced by Chinese Korean learners. "-(Eu)lgeol" has two different meanings, that is, a guess and a regret, and these different meanings are expressed by the different prosodic features of the last syllable of "-(eu)lgeol". To examine how Korean native speakers perceive "-(eu)lgeol" sentences produced by Chinese Korean learners and the most saliant prosodic variable for the semantic discrimination of "-(eu)lgeol" at the perceptive level, we performed a perceptual experiment. The analysed material constituted four Korean sentences containing "-(eu)lgeol" in which two sentences expressed guesses and the other two expressed regret. Twenty-five Korean native speakers participated in the perceptual experiment. Participants were asked to mark whether "-(eu)lgeol" sentences they listened to were (1) definitely regrets, (2) probably regrets, (3) ambiguous, (4) probably guesses, or (5) definitely guesses based on the prosodic features of the last syllable of "-(eu)lgeol". The analysed prosodic variables were sentence boundary tones, slopes of boundary tones, pitch difference between sentence-final and penultimate syllables, and pitch levels of boundary tones. The results show that all the analysed prosodic variables are significantly correlated with the semantic discrimination of "-(eu)lgeol" and among these prosodic variables, the most salient role in the semantic discrimination of "-(eu)lgeol" is pitch difference between sentence-final syllable and penultimate syllable.

Evaluation of a signal segregation by FDBM (FDBM의 음원분리 성능평가)

  • Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.12
    • /
    • pp.1793-1802
    • /
    • 2013
  • Various approaches for sound source segregation have been proposed. Among these approaches, frequency domain binaural model(FDBM) has the advantages of low computational load and effective howling cancellation. A binaural hearing assistance system based on FDBM has been proposed. This system can enhance desired signal based on the directivity information. Although FDBM has been evaluated in terms of signal-to-noise ratio (SNR) and coherence function, the evaluation results do not always agree with the human impressions. These evaluation methods provide physical measures, and do not take account of perceptual aspect of human being. Considering a binaural hearing assistance system as a one of major applications, the quality of segregated sound should keep level enough. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and Perceptual Evaluation of Speech Quality(PESQ), to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and PESQ, to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions.

A Statistical Prediction Model of Speakers' Intentions in a Goal-Oriented Dialogue (목적지향 대화에서 화자 의도의 통계적 예측 모델)

  • Kim, Dong-Hyun;Kim, Hark-Soo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.9
    • /
    • pp.554-561
    • /
    • 2008
  • Prediction technique of user's intention can be used as a post-processing method for reducing the search space of an automatic speech recognizer. Prediction technique of system's intention can be used as a pre-processing method for generating a flexible sentence. To satisfy these practical needs, we propose a statistical model to predict speakers' intentions that are generalized into pairs of a speech act and a concept sequence. Contrary to the previous model using simple n-gram statistic of speech acts, the proposed model represents a dialogue history of a current utterance to a feature set with various linguistic levels (i.e. n-grams of speech act and a concept sequence pairs, clue words, and state information of a domain frame). Then, the proposed model predicts the intention of the next utterance by using the feature set as inputs of CRFs (Conditional Random Fields). In the experiment in a schedule management domain, The proposed model showed the precision of 76.25% on prediction of user's speech act and the precision of 64.21% on prediction of user's concept sequence. The proposed model also showed the precision of 88.11% on prediction of system's speech act and the Precision of 87.19% on prediction of system's concept sequence. In addition, the proposed model showed 29.32% higher average precision than the previous model.

A Study on Contact Center Evaluation Model Using AHP and Content Analysis (AHP와 내용분석을 이용한 컨택센터 평가 모델 연구)

  • Ryu, Ki-Dong;Kim, Woo-Je
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.5
    • /
    • pp.106-116
    • /
    • 2018
  • Recently, the role of the contact center for business-to-consumer (B2C) operations is becoming more and more important as the customer contact point. In particular, an Internet Protocol (IP)-based contact center system is made up of a complicated information system in order to accommodate various customer channels, in addition to the telephone, and to respond in real time. However, until now, evaluations of contact centers have focused on customer service-based research from inbound contact centers. We used the contact center as a measure of performance, focusing on indicators that have traditionally influenced customer satisfaction, such as response rates and service levels. There is insufficient research on the characteristics of the services that a contact center should have and on the evaluation models for information systems. The role of information systems is becoming important as the latest contact center, which has moved from the TDM-driven digital phone system center to the IP-based contact center, accommodates a variety of digital channels other than voice phones. In particular, as offline branches decrease due to the development of the Internet and mobile phones, non-facing responses to customers are important, so the contact center has influenced the enterprise. Therefore, we developed an evaluation model not only in terms of customer service, but also from information system and business aspects, using the AHP and verifying the evaluation model through empirical cases. In particular, content analysis was used to ensure objectivity of AHP evaluation items.

QoS Guarantee for Service Classes based on Performance Analysis of Cross-Layer Retransmission Scheme (다 계층 재전송 방식 성능 분석을 통한 서비스별 QoS 보장 기법)

  • Go, Kwang-Chun;Lee, Hyun-Jin;Kim, Jae-Hyun;Choo, Sang-Min
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.2A
    • /
    • pp.95-104
    • /
    • 2010
  • In wireless communication system, a variety of retransmission algorithms are used in order to improve the quality of service of users. But the system may be inefficient because retransmission algorithms operate independently with other layers. Also, the quality of service can be degraded due to the unnecessary retransmission of packets. To solve these problems, the study on the cross-layer retransmission schemes have been widely performed. However, in order to apply cross-layer retransmission schemes to wireless communication system, whether the performance of cross-layer retransmission schemes meets QoS requirements of each service class has to be verified. Thus, this paper proposes the mathematical model for analyzing the performance of the cross-layer retransmission schemes and derives both the suitable retransmission scheme and the optimal retransmission parameter on each service class. The proposed mathematical model selects the MCS level based on channel state information and The performance analysis is comparatively easy in case that HARQ, ARQ, and AMC schemes are combined. The proposed mathematical model also enables the analysis of the packet transmission delay. To utilize the analytical model, this paper derives the suitable retransmission scheme and the optimal retransmission parameter for delay sensitive services in WiMAX system. Also, the proposed analytical model can be used to analyze the performance of wireless communication system such as LTE and WLAN.