• Title/Summary/Keyword: speech process

Search Result 526, Processing Time 0.027 seconds

Analysis and Interpretation of Intonation Contours of Slovene

  • Ales Dobnikar
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.542-547
    • /
    • 1996
  • Prosodic characteristics of natural speech, especially intonation, in many cases represent specific feelings of the speaker at the time of the utterance, with relatively vast variations of speaking styles over the same text. We analyzed a collected speech corpus, recorded with ten Slovene speakers. Interpretation of observed intonation contours was done for the purpose of modelling the intonation contour in synthesis process. We devised a scheme for modeling the intonation contour for different types of intonation units based on the results of analyzing intonation contours. The intonation scheme uses a superpositional approach, which defines the intonation contour as the sum of global (intonation unit) and local (accented syllables or syntactic boundaries) components. Near-to-natural intonation contour was obtained by rules, using only the text of the utterance as input.

  • PDF

On a Pitch Alteration Method Compensated with the Spectrum for High Quality Speech Synthesis (스펙트럼 보상된 고음질 합성용 피치 변경법)

  • 문효정
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.123-126
    • /
    • 1995
  • The waveform coding are concerned with simply preserving the wave shape of speech signal through a redundancy reduction process. In the case of speech synthesis, the wave form coding with high quality are mainly used to the synthesis by analysis. However, because the parameters of this coding are not classified as either excitation and vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by using scaling the time-axis and compensating the spectrum. This is a time-frequency domain method that is preserved in the phase components of the waveform and that has a little spectrum distortion with 2.5% and less for 50% pitch change.

  • PDF

A Neural Speech Processing Algorithm for Multielectrode Cochlear Implant System (신경회로망을 이용한 다중 전극 와우각 이식 시스템용 음성처리 알고리즘)

  • Choi, Jin-Young;Cho, Jin-Ho;Lee, Kuhn-Il
    • Journal of Biomedical Engineering Research
    • /
    • v.11 no.1
    • /
    • pp.83-88
    • /
    • 1990
  • A New speech processing algorithm using neural networks is proposed. We transform input data into frequency domain and process them by neural networks of 22 output neurons which have Bark scale on the ground that the Bark scale is similiar with that of the characteristics of human cochlea. An utilized neural network is multilayer perceptron, and the characteristics of cochlea have it trained by error back propagation learning algorithm. The trained neural networks suffices functions of human cochlea including the effects of automatic gain control, compression and equalization. Simulation results show that the proposed speech processing algorithm has good performance in automatic gain control, compression and equalization.

  • PDF

Pitch Detection Using Variable LPF

  • Hong KEUM
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.963-970
    • /
    • 1994
  • In speech signal processing, it is very important to detect the pitch exactly. The algorithms for pitch extraction that have been proposed until now are not enough to detect the fine pitch in speech signal. Thus we propose the new algorithm which takes advantage of the G-peak extraction. It is the method to find MZCI(maximum zer-crossing interval) which is defined as cut-off bandwidth rate of LPF (low pass filter)and detect the pitch period of the voiced signals. This algorithm performs robustly with a gross error rate of 3.63% even in 0 dB SNR environment. The gross error rate for clean speech is only 0.18%. Also it is able to process all course with speed.

  • PDF

A Study on the Lexicalization of {Geuraegajigo} Based on the Spontaneous Speech Corpus (자유 발화 자료에 나타난 {그래가지고}의 접속 부사화)

  • Ha, Youngwoo;Shin, Jiyoung
    • Korean Linguistics
    • /
    • v.64
    • /
    • pp.195-223
    • /
    • 2014
  • The aim of this paper is to study the morphemization of {Geuraegajigo} based on a spontaneous speech corpus. For this purpose, the distributions, the semantic functions, and the intonational phrase pattterns of the connective {Geuraegajigo} have been analyzed based on the corpus. The results are as follow; at first, coalescence that comes with a morphemization process was found, resulting in many variations. Secondly, there are three functions of it: [Direct/Indirect interrelationship], [Enumerate conjunction], and [Discourse marker]. And this semantic/functional diversity has many similarities with conjunctive adverbs. Lastly, intonational phrase patterns of {Geuraegajigo} accord with those of conjunctive adverbs. Especially, the discourse strategic IP pattern is connected with the short variation type. In conclusion, {Geuraegajigo} has finished turning into a conjunctive adverb through morphemization.

Segmentation of continuous Korean Speech Based on Boundaries of Voiced and Unvoiced Sounds (유성음과 무성음의 경계를 이용한 연속 음성의 세그먼테이션)

  • Yu, Gang-Ju;Sin, Uk-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2246-2253
    • /
    • 2000
  • In this paper, we show that one can enhance the performance of blind segmentation of phoneme boundaries by adopting the knowledge of Korean syllabic structure and the regions of voiced/unvoiced sounds. eh proposed method consists of three processes : the process to extract candidate phoneme boundaries, the process to detect boundaries of voiced/unvoiced sounds, and the process to select final phoneme boundaries. The candidate phoneme boudaries are extracted by clustering method based on similarity between two adjacent clusters. The employed similarity measure in this a process is the ratio of the probability density of adjacent clusters. To detect he boundaries of voiced/unvoiced sounds, we first compute the power density spectrum of speech signal in 0∼400 Hz frequency band. Then the points where this paper density spectrum variation is greater than the threshold are chosen as the boundaries of voiced/unvoiced sounds. The final phoneme boundaries consist of all the candidate phoneme boundaries in voiced region and limited number of candidate phoneme boundaries in unvoiced region. The experimental result showed about 40% decrease of insertion rate compared to the blind segmentation method we adopted.

  • PDF

A Comparative Study of Spoken and Written Sentence Production in Adults with Fluent Aphasia (유창성 실어증 환자의 구어와 문어 문장산출 능력 비교)

  • Ha, Ji-Wan;Pyun, Sung-Bom;Hwang, Yu Mi;Yi, Hoyoung;Sim, Hyun Sub
    • Phonetics and Speech Sciences
    • /
    • v.5 no.3
    • /
    • pp.103-111
    • /
    • 2013
  • Traditionally it has been assumed that written abilities are completely dependent on phonology. Therefore spoken and written language skills in aphasic patients have been known to exhibit similar types of impairment. However, a number of latest studies have reported the findings that support the orthographic autonomy hypothesis. The purpose of this study was to examine whether fluent aphasic patients have discrepancy between speaking and writing skills, thereby identifying whether the two skills are realized through independent processes. To this end, this study compared the K-FAST speaking and writing tasks of 30 aphasia patients. In addition, 16 aphasia patients, who were capable of producing sentences not only in speaking but also in writing, were compared in their performances at each phase of the sentence production process. As a result, the subjects exhibited different performances between speaking and writing, along with statistically significant differences between the two language skills at positional and phonological encoding phases of the sentence production process. Therefore, the study's results suggest that written language is more likely to be produced via independent routes without the mediation of the process of spoken language production, beginning from a certain phase of the sentence production process.

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.313-318
    • /
    • 2015
  • A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.

Implementation of Encoder/Decoder to Support SNN Model in an IoT Integrated Development Environment based on Neuromorphic Architecture (뉴로모픽 구조 기반 IoT 통합 개발환경에서 SNN 모델을 지원하기 위한 인코더/디코더 구현)

  • Kim, Hoinam;Yun, Young-Sun
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.2
    • /
    • pp.47-57
    • /
    • 2021
  • Neuromorphic technology is proposed to complement the shortcomings of existing artificial intelligence technology by mimicking the human brain structure and computational process with hardware. NA-IDE has also been proposed for developing neuromorphic hardware-based IoT applications. To implement an SNN model in NA-IDE, commonly used input data must be transformed for use in the SNN model. In this paper, we implemented a neural coding method encoder component that converts image data into a spike train signal and uses it as an SNN input. The decoder component is implemented to convert the output back to image data when the SNN model generates a spike train signal. If the decoder component uses the same parameters as the encoding process, it can generate static data similar to the original data. It can be used in fields such as image-to-image and speech-to-speech to transform and regenerate input data using the proposed encoder and decoder.

The Effect of Word Frequency and Neighborhood Density on Spoken Word Segmentation in Korean (단어 빈도와 음절 이웃 크기가 한국어 명사의 음성 분절에 미치는 영향)

  • Song, Jin-Young;Nam, Ki-Chun;Koo, Min-Mo
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.3-20
    • /
    • 2012
  • The purpose of this study was to investigate whether a segmentation unit for a Korean noun is a 'syllable' and whether the process of segmenting spoken words occurs at the lexical level. A syllable monitoring task was administered which required participants to detect an auditorily presented target from visually presented words. In Experiment 1, syllable neighborhood density of high frequency words which can be segmented into both CV-CVC and CVC-VC were controlled. The syllable effect and the neighborhood density effect were significant, and the syllable effect emerged differently depending on the syllable neighborhood density. Similar results were obtained in Experiment 2 where low frequency words were used. The significance of word frequency effect on syllable effect was also examined. The results of Experiments 1 and 2 indicated that the segmentation unit for a Korean noun is indeed a 'syllable', and this process can occur at the lexical level.