• Title/Summary/Keyword: Speech feature

Search Result 712, Processing Time 0.029 seconds

Voice Conversion Using Linear Multivariate Regression Model and LP-PSOLA Synthesis Method (선형다변회귀모델과 LP-PSOLA 합성방식을 이용한 음성변환)

  • 권홍석;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.15-23
    • /
    • 2001
  • This paper presents a voice conversion technique that modifies the utterance of a source speaker as if it were spoken by a target speaker. Feature parameter conversion methods to perform the transformation of vocal tract and prosodic characteristics between the source and target speakers are described. The transformation of vocal tract characteristics is achieved by modifying the LPC cepstral coefficients using Linear Multivariate Regression (LMR). Prosodic transformation is done by changing the average pitch period between speakers, and it is applied to the residual signal using the LP-PSOLA scheme. Experimental results show that transformed speech by LMR and LP-PSOLA synthesis method contains much characteristics of the target speaker.

  • PDF

A Study on the Characteristic of Color use Scheme in Luis Barragan's Architecture (루이스 바라간 건축의 색채사용 특성에 관한 연구)

  • Yoo, Yeon-Sook
    • Journal of the Korea Furniture Society
    • /
    • v.24 no.4
    • /
    • pp.416-425
    • /
    • 2013
  • Luis Barragan's Architecture has creative feature that Mexican environment and traditional culture were complete by color. Thus the color of his work makes our emotions rich and colorful. He said "My architecture is autobiographical..." at speech in Pritz price. As we can see in his architectural philosophy that sentimental architecture is important than theorical system, his works are impression of empirical factor with intuition. "Color is a complement to the architecture. It serve to enlarge or reduce a space. It's also useful for adding that touch of magic a place needs", stated Barragan. During his process of shaping space, Barragan drew on color in the same way as an architectural component, according it a spatial funtion and expressive vale. he allied it with light, deeming it a crucial vehical for conveying the emotive attributes a site. The capacity of color to express sensitivity and sensuality within an architecture space is liked to its psycho-physiological qualities. In this kind of view, color featyre in Barragan's work is one of the most important tools to realize sentimental architecture, not only is result of the Mexican regional color. As a result, make focus on analyzing various meaning of the color in Barragan's architecture like poetic and habitable structure.

  • PDF

An Experimental Phonetic study of Perception of native Korean speakers on English and German $/\int/$ (한국인의 외국어 $/\int/$음에 대한 실험음성학적 연구)

  • Lee Sook-hyang;Kang Hyunsook
    • MALSORI
    • /
    • no.40
    • /
    • pp.1-12
    • /
    • 2000
  • This paper investigated how $/\int/$ in English and German is perceived and interpreted in the loanwords in Korean. $/\int/$ in these languages does not show one-to-one correspondence in Korean: $/\int/$ in the coda position in English and German is perceived as [swi] in Korean while $/\int/$ in the onset position is perceived as [syu]. This paper examined phonetic characteristics of $/\int/$ in English and German through its acoustic analysis and attempted to figure out which factor could explain this surface distribution of [swi] and [syu]; phonological (onset vs. coda) or phonetic (coarticulation) factor. Two acoustic features of $/\int/$ in English and German were examined: duration and energy Peak frequency of the frication noise. German $/\int/$ Perceived as [swi] in Korean showed higher energy Peak frequency and longer duration than that perceived as [syu] in Korean. English iii perceived as [swi] also showed longer duration than that Perceived as [syu] in Korean but energy Peak frequency showed different behavior. English $/\int/$ showed coarticulation with the preceding vowel rather than being affected by its position in the syllable in English. This paper concludes that 1)Phonetic characteristics used are duration and energy Peak frequency of its frication noise when $/\int/$ in English and German are adopted in Korean, 2)duration is used prior to energy peak frequency, which can be used as an enhancing feature.

  • PDF

A Robust Korean Spoken Language Parsing Based on Core Concept (핵심개념 기반의 강건한 한국어 대화체 파싱)

  • No, Seo-Yeong;Jeong, Cheon-Yeong;Seo, Yeong-Hun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2113-2123
    • /
    • 1999
  • The partial free order feature of Korean makes grammar size represented by CFG too big and that's why grammar has to contain all the ordered words. There are some problems to parse spoken language, because spontaneous spoken language has special features such as meaningless words, repetitious speech, etc. So, in this paper, we define 'Core-Concept' as the necessary element for parsing and we describe grammar only using Core-Concept. And we can prevent grammar from becoming very large and reduce an additional parsing burden as we select. Core-Concept described in grammar as parsing element. Through this strategy, we present that the simplified grammar can give us more efficient method to get right results. Experiments show that our parsing strategy has an average of 98% or over success rate in correct parsing results.

  • PDF

Discriminative Feature Selection for G.723-based Speech Recognition (G.723기반의 음성인식을 위한 변별적인 음성 특징 벡터 선정)

  • 이규환;정민화
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.387-389
    • /
    • 2000
  • 정보 통신 분야의 발달로 사람들의 전화 사용이 늘어나고 또한 전화기에 여러 가지 멀티미디어 기능들이 추가되면서 음성 인식의 필요성이 점차 증가하고 있다. 그러나 현재의 기술로는 음성 인식의 성능이 사람들의 기대치를 만족시키지 못하고 있다. 본 연구에서는 G.723을 이용한 네트워크 상에서 음성 인식 시간을 줄이고 같은 차수에서 더 좋은 음성 인식 성능을 얻을 수 있는 방법에 대해 연구하였다. 일반적인 보코더는 채널을 통과시킬 때 왜곡을 최소화 하기 위해 양지화할 때 안정적이라고 알려져 있는 LSP 파라메터를 양자화하여 전송한다. 전송된 양자화된 LSP 파라메터는 복호화기를 통과하게 되는데 본 연구에서는 양자화된 LSP 파라메터를 음성인식에 직접 이용하여 음성 합성한 후 음성 특징 파라메터를 추출하는 시간을 줄일 수 있고 음성 합성시 왜곡을 미연에 방지할 수 있다. 본 연구에서는 변별적인 기준에 의해 특징 벡터 요소들을 순서화를 이용하여 음성 특징 벡터의 차수를 동적으로 조절할 수 있는 방법을 G.723에 적용시켜 보았다. 순서화 된 음성 특징 요소들 중에서 인식 목적에 적절한 차수를 선정하며 차수를 줄이면서도 음성인식 성능은 유지 또는 향상시킬 수 있음을 확인하였다. 특히 네트워크 통신망에서도 음성인식 성능을 향상시킬 수 있음을 확인하였고, 기존의 합성음에서 음성인식을 하는 방법보다 시간도 크게 단축할 수 있었다.

  • PDF

Gesture-Based Emotion Recognition by 3D-CNN and LSTM with Keyframes Selection

  • Ly, Son Thai;Lee, Guee-Sang;Kim, Soo-Hyung;Yang, Hyung-Jeong
    • International Journal of Contents
    • /
    • v.15 no.4
    • /
    • pp.59-64
    • /
    • 2019
  • In recent years, emotion recognition has been an interesting and challenging topic. Compared to facial expressions and speech modality, gesture-based emotion recognition has not received much attention with only a few efforts using traditional hand-crafted methods. These approaches require major computational costs and do not offer many opportunities for improvement as most of the science community is conducting their research based on the deep learning technique. In this paper, we propose an end-to-end deep learning approach for classifying emotions based on bodily gestures. In particular, the informative keyframes are first extracted from raw videos as input for the 3D-CNN deep network. The 3D-CNN exploits the short-term spatiotemporal information of gesture features from selected keyframes, and the convolutional LSTM networks learn the long-term feature from the features results of 3D-CNN. The experimental results on the FABO dataset exceed most of the traditional methods results and achieve state-of-the-art results for the deep learning-based technique for gesture-based emotion recognition.

Two Wheeler Recognition Using the Correlation Coefficient for Histogram of Oriented Gradients to Apply Intelligent Wheelchair (지능형 휠체어 적용을 위한 기울기 히스토그램의 상관계수를 이용한 도로위의 이륜차 인식)

  • Kim, Bum-Koog;Park, Sang-Hee;Lee, Yeung-Hak;Lee, Gang-Hwa
    • Journal of Biomedical Engineering Research
    • /
    • v.32 no.4
    • /
    • pp.336-344
    • /
    • 2011
  • This article describes a new recognition algorithm using correlation coefficient for intelligent wheelchair to avoid collision for elderly or disabled people. The correlation coefficient can be used to represent the relationship of two different areas. The algorithm has three steps: Firstly, we extract an edge vector using the Histogram of Oriented Gradients(HOG) which includes gradient information and unique magnitude for each cell. From this result, the correlation coefficients are calculated between one cell and others. Secondly, correlation coefficients are used as the weighting factors for normalizing the HOG cell. And finally, these features are used to classify or detect variable and complicated shapes of two wheelers using Adaboost algorithm. In this paper, we propose a new feature vectors which is calculated by weighted cell unit to classify with multiple view-based shapes: frontal, rear and side views($60^{\circ}$, $90^{\circ}$ and mixed angle). Our experimental results show that two wheeler detection system based on a proposed approach leads to a higher detection accuracy than the method using traditional features in a similar detection time.

Using Korean Phonetic Alphabet (KPA) in Teaching English Stop Sounds to Koreans

  • Jo, Un-Il
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.165-165
    • /
    • 2000
  • In the phoneme level, English stop sounds are classified with the feature of 'voicing': voiceless and voiced (p/b, t/d, k/g). But when realized, a voiceless stop is not alwats the same sound. For example, the two 'p' sounds in 'people' are different. The former is pronounced with much aspiration, while the latter without it. This allophonic differnece between [$P^h$] and [p] out of an English phoneme /p/ can be well explained to Koreans because in Korean these two sounds exist as two different phonemes {/ㅍ/ and /ㅃ/ respectively). But difficulties lie in teaching the English voiced stop sounds (/b, d, g/) to Koreans because in Korean voiced stops do not exist as phonemes but as allophones of lenis sounds (/ㅂ, ㄷ, ㄱ/). For example, the narrow transcription of '바보' (a fool) is [baboo]. In the word initial position, Korean lenis stops are pronounced voiceless and even with a slight aspiration while in the inrervocalic environments they become voiced, That is in Korean voiced stops do not occur independently and neither they have their own letters. To explain all these more effectively to Koreans, it is very helpful to use Korean Phenetic Alphabet (KPA) which is devised by Dr. LEE Hyunbok (a professor of phonetics at Seoul National Univ. and chairman of Phonetic Society of Koera.)(omitted)

  • PDF

A Splog Detection System Using Support Vector Systems (지지벡터기계를 이용한 스팸 블로그(Splog) 판별 시스템)

  • Lee, Song-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.163-168
    • /
    • 2011
  • Blogs are an easy way to publish information, engage in discussions, and form communities on the Internet. Recently, there are several varieties of spam blog whose purpose is to host ads or raise the PageRank of target sites. Our purpose is to develope the system which detects these spam blogs (splogs) automatically among blogs on Web environment. After removing HTML of blogs, they are tagged by part of speech(POS) tagger. Words and their POS tags information is used as a feature type. Among features, we select useful features with X2 statistics and train the SVM with the selected features. Our system acquired 90.5% of F1 measure with SPLOG data set.

GMM-Based Gender Identification Employing Group Delay (Group Delay를 이용한 GMM기반의 성별 인식 알고리즘)

  • Lee, Kye-Hwan;Lim, Woo-Hyung;Kim, Nam-Soo;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.243-249
    • /
    • 2007
  • We propose an effective voice-based gender identification using group delay(GD) Generally, features for speech recognition are composed of magnitude information rather than phase information. In our approach, we address a difference between male and female for GD which is a derivative of the Fourier transform phase. Also, we propose a novel way to incorporate the features fusion scheme based on a combination of GD and magnitude information such as mel-frequency cepstral coefficients(MFCC), linear predictive coding (LPC) coefficients, reflection coefficients and formant. The experimental results indicate that GD is effective in discriminating gender and the performance is significantly improved when the proposed feature fusion technique is applied.