• Title/Summary/Keyword: 음소

Search Result 529, Processing Time 0.027 seconds

TRACE : 상호작용 음성인식 Neural Network모델

  • 김명원
    • Information and Communications Magazine
    • /
    • v.7 no.3
    • /
    • pp.16-26
    • /
    • 1990
  • 본 논문은 음성인식을 위한 neural network의 하나인 TRACE model에 대하여 기술한다. TRACE model은 HEARSAY 음성이해 system의 blackboard 구조와 유사한 trace 구조를 사용하여 정보처리 unit들간의 자극성 내지 억제성 연결에 의한 상호작용의 결과로서 음성을 인식한다. TRACE model은 특징층, 음소층 그리고 단어층의 3층으로 구성되며 층간의 unit들이 상호작용할 뿐 아니라 동일층의 unit들이 상호경쟁함으로써 음성인식에 있어서의 context 효과, segmentation 및 잡음 등의 문제를 해결할 수 있다.

  • PDF

Text-Independent Speaker Verification Based on MLP Cohort Model (MLP 군집 모델에 기반한 어구독립 화자증명)

  • 이태승;최호진
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.434-436
    • /
    • 2000
  • 본 논문에서는 기존의 확률적 화자군집 모델을 MLP(multi-layer perceptron)로 구현하는 방법과 원형 화자군집 모델이 갖는 문제를 해결할 수정 모델을 제시한다. 화자군집 모델은 화자등록 시간에 민감한 실용 환경에서 중요한 의미를 지닌다. 본 연구에서 사용한 인식단위는 여러 음소계열에서 지속적인 부분을 추출한 지속음이므로 화자등록과 증명 단계에서 특정한 어구에 한정되지 않는 어구독립 방식을 채택한다.

  • PDF

Design and Implementation of the Language Processor for Educational TTS Platform (음성합성 플랫폼을 위한 언어처리부의 설계 및 구현)

  • Lee, Sang-Ho
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.219-222
    • /
    • 2005
  • 본 논문에서는 한국어 TSS 시스템을 위한 언어처리부의 설계 및 구현 과정을 설명한다. 구현된 언어처리부는 형태소 분석, 품사 태깅, 발음 변환 과정을 거쳐, 주어진 문장의 가장 적절한 발음열과 각 음소의 해당 품사를 출력한다. 프로그램은 표준 C언어로 구현되어 있고, Windows와 Linux에서 모두 동작되는 것을 확인하였다. 수동으로 품사가 할당된 4.5만 어절의 코퍼스로부터 형태소 사전을 구축하였으며, 모든 단어가 사전에 등록되어 있다고 가정할 경우, 488문장의 실험 자료에 대해 어절 단위 오류율이 3.25%이었다.

  • PDF

A Comparative Study on the Working Memory and the Phonological Awareness between Children with Multi-cultural Families and General Families (다문화아동과 일반아동의 작업기억 및 음운인식 능력 비교 연구)

  • Park, Yoo Rin;Kwon, Do Ha
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.11
    • /
    • pp.5025-5032
    • /
    • 2012
  • The purpose of this study is to compare the working memory and the phonological awareness between Children with multi-cultural families and general families. The research subjects were 15 multi-cultural and 15 general primary school students who were attending 1st-3rd grade in D city. Working memory tested K-TTFC-2 by standardized tool. Phonological awareness abilities was considering the age of the subjects, tested phoneme awareness. Data process was conducted by t -test and frequency test in SPSS program. The results were as follows. First, working memory comparison of both groups showed significant differences. Especially had significant difference in chapter 1, chapter 4. Second, comparison of phonological awareness between both groups showed significant statistic differences. Third, the Phonological awareness comparison of both groups showed that there are differences in sound matching(word medial coda), substituting middle sound in monosyllabic words, phoneme switching. This research result is considered that can be used as the fundamental data for the development of the therapy data considering the working memory and the phonological awareness of children with multi-cultural families.

A Study on the Diphone Recognition of Korean Connected Words and Eojeol Reconstruction (한국어 연결단어의 이음소 인식과 어절 형성에 관한 연구)

  • ;Jeong, Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.46-63
    • /
    • 1995
  • This thesis described an unlimited vocabulary connected speech recognition system using Time Delay Neural Network(TDNN). The recognition unit is the diphone unit which includes the transition section of two phonemes, and the number of diphone unit is 329. The recognition processing of korean connected speech is composed by three part; the feature extraction section of the input speech signal, the diphone recognition processing and post-processing. In the feature extraction section, the extraction of diphone interval in input speech signal is carried and then the feature vectors of 16th filter-bank coefficients are calculated for each frame in the diphone interval. The diphone recognition processing is comprised by the three stage hierachical structure and is carried using 30 Time Delay Neural Networks. particularly, the structure of TDNN is changed so as to increase the recognition rate. The post-processing section, mis-recognized diphone strings are corrected using the probability of phoneme transition and the probability o phoneme confusion and then the eojeols (Korean word or phrase) are formed by combining the recognized diphones.

  • PDF

The Relationship Between Children's Reading Ability of Environmental Print and Phonological Awareness (유아의 환경인쇄물 읽기 능력과 음운론적 인식 능력 간의 관계)

  • Kim, Hyo Jin;Son, Seung Hee;Rha, Jong Hae
    • Korean Journal of Childcare and Education
    • /
    • v.9 no.6
    • /
    • pp.107-127
    • /
    • 2013
  • The purpose of this study was to investigate the differences in children's reading abilities of environmental print and phonological awareness by children's age and the relationship between children's reading abilities of environmental print and phonological awareness. The subjects were 90 children, 3 to 4 years of age. The Children's Reading Abilities of Environmental Print Scale (CRAEPS) developed by Son (2012) and Phonological Awareness Scale (PAS) revised by Choi (2007) were used to measure children's reading ability of environmental prints and phonological awareness. The results of this study were as follows: Firstly, 4-year-olds performed significantly better than 3-year-olds on the environmental print reading tasks. Also, 4 year-olds scored significantly higher than 3-year-olds in syllable counting, syllable deletion, and phoneme substitution. Secondly, children's scores on the environmental print reading tasks were positively correlated with phonological awareness. In other words, the 3-year-olds who could read environmental prints better got higher scores in syllable counting and the 4-year-olds who could read environmental prints better got higher scores in syllable counting, syllable deletion, and phoneme substitution.

Vocabulary Recognition Performance Improvement using a convergence of Bayesian Method for Parameter Estimation and Bhattacharyya Algorithm Model (모수 추정을 위한 베이시안 기법과 바타차랴 알고리즘을 융합한 어휘 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.353-358
    • /
    • 2015
  • The Vocabulary Recognition System made by recognizing the standard vocabulary is seen as a decline of recognition when out of the standard or similar words. In this case, reconstructing the system in order to add or extend a range of vocabulary is a way to solve the problem. This paper propose configured Bhattacharyya algorithm standing by speech recognition learning model using the Bayesian methods which reflect parameter estimation upon the model configuration scalability. It is recognized corrected standard model based on a characteristic of the phoneme using the Bayesian methods for parameter estimation of the phoneme's data and Bhattacharyya algorithm for a similar model. By Bhattacharyya algorithm to configure recognition model evaluates a recognition performance. The result of applying the proposed method is showed a recognition rate of 97.3% and a learning curve of 1.2 seconds.

초등영어교육에 있어서 발음교육

  • 박매란
    • Proceedings of the KSPS conference
    • /
    • 1997.07a
    • /
    • pp.257-257
    • /
    • 1997
  • 우리 나라에서 초등영어 교육이 실시되는 연령층이 언어습득 분야에 있어 그렇게 일찍 언어학습을 시작하는 것이 학습효율성의 측면 또는 언어숙달도 측면에서 과연 이로운지에 관해서는 학자들간에 아직도 논쟁의 여지가 많다. 피아제의 인지발달론에 의하면, 의식적 조작이 잘 이루어지지 않는 분야가 바로 발 음분야이기 때문에, 따라서 의식적 조작이 이루어지는 '형식적 조작기'이전인 10-11세 정도에서부터 음성언어 중심의 외국어 교육을 실시하는 것이 바람직하다고 본다. 따라서 듣기와 말하기의 기능에 주안점을 둔 초등영어 교육은 감각과 놀이, 게임, 노래나 챈트 둥으로 흥미를 지속시키면서, 영어의 특질인 강세박자리듬언어(stress-timed rhythm language)의 특성올 초창기부터 듣기 및 말하기 훈련으로 지속적으로 연습시킬 필요가 있다고 본다. 또한 James Asher가 창안한 교수법인 천신반용볍(Total Physical Response)도 초기에 말하기에 대한 부담감 없이 흥미 있는 활동을 통하여 학습동기를 높여줄 수 있는 장점을 지니고 었다. 뿐만 아니라, 청취단계에 있어서 초기에는 귀로들은 외국어를 무조건 기억하지 않고 즉각적인 인지로 끌어내어, 점차 이와 같은 인지훈련을 반복함으로써 결국에는 기억에까지 도달하려 하는 초기학습에 중한 역할을 차지하는 학습법이다. 음성학적인 측면에서 초동영어 교육의 시작단계인 3학년에서는 특히 분절음소 차원에서 영어의 자,모음이 우리말의 자,모음과는 다르다는 차이점을 배우게 되고, 초분절 음소 중에서는 강세와 리듬의 차이를 자연스럽게 습득할 수 있도록 정확한 발음을 들려주어 정확한 발음을 해 낼 수 있도록 훈련하는 것이 중요하다고 하겠다. 하지만, 한편으로는 제6차 교육과정의 영어교육 목표가 언어의 '정확성'보다는 '유창성' 에 그 목표를 두고 있는 점을 감안한다면, 시작단계부터 반드시 정확한 발음을 지녀야 하는 가의 문제도 생각해 볼 필요가 있다. 경우에 따라서는, 정확한 발음은 그 언어에 대한 숙련도가 점차 높아짐에 따라 이와 병행하여 이루어지는 경우도 흔히 경험하는 일이기 때문이다. 결국 초등영어 교육과정에도 명시되어 있듯이 '...영어에 대한 친숙함과 자신감을 심어주고, 영어에 대한 흥미와 관심을 지속적으로 유지시키는 것이 중요하기' 때문에 무엇보다 중요한 측면은 흥미와 관심을 유지시키는 지적인 학습활동보다는 정의적인 학습활동의 전개가 필요하다고 하겠다.

  • PDF

Time-Synchronization Method for Dubbing Signal Using SOLA (SOLA를 이용한 더빙 신호의 시간축 동기화)

  • 이기승;지철근;차일환;윤대희
    • Journal of Broadcast Engineering
    • /
    • v.1 no.2
    • /
    • pp.85-95
    • /
    • 1996
  • The purpose of this paper Is to propose a dubbed signal time-synchroniztion technique based on the SOLA(Synchronized Over-Lap and Add) method which has been widely used to modify the time scale of speech signal. In broadcasting audio recording environments, the high degree of background noise requires dubbing process. Since the time difference between the original and the dubbed signal ranges about 200mili seconds, process is required to make the dubbed signal synchronize to the corresponding image. The proposed method finds he starting point of the dubbing signal using the short-time energy of the two signals. Thereafter, LPC cepstrum analysis and DTW(Dynamic Time Warping) process are applied to synchronize phoneme positions of the two signals. After determining the matched point by the minimum mean square error between orignal and dubbed LPC cepstrums, the SOLA method is applied to the dubbed signal, to maintain the consistency of the corresponding phase. Effectiveness of proposed method is verified by comparing the waveforms and the spectrograms of the original and the time synchronized dubbing signal.

  • PDF

Application and Technology of Voice Synthesis Engine for Music Production (음악제작을 위한 음성합성엔진의 활용과 기술)

  • Park, Byung-Kyu
    • Journal of Digital Contents Society
    • /
    • v.11 no.2
    • /
    • pp.235-242
    • /
    • 2010
  • Differently from instruments which synthesized sounds and tones in the past, voice synthesis engine for music production has reached to the level of creating music as if actual artists were singing. It uses the samples of human voices naturally connected to the different levels of phoneme within the frequency range. Voice synthesis engine is not simply limited to the music production but it is changing cultural paradigm through the second creations of new music type including character music concerts, media productions, albums, and mobile services. Currently, voice synthesis engine technology makes it possible that users input pitch, lyrics, and musical expression parameters through the score editor and they mix and connect voice samples brought from the database to sing. New music types derived from such a development of computer music has sparked a big impact culturally. Accordingly, this paper attempts to examine the specific case studies and the synthesis technologies for users to understand the voice synthesis engine more easily, and it will contribute to their variety of music production.