• 제목/요약/키워드: voice database

검색결과 96건 처리시간 0.023초

임베디드 시스템에서 사용 가능한 적응형 MFCC 와 Deep Learning 기반의 음성인식 (Voice Recognition-Based on Adaptive MFCC and Deep Learning for Embedded Systems)

  • 배현수;이호진;이석규
    • 제어로봇시스템학회논문지
    • /
    • 제22권10호
    • /
    • pp.797-802
    • /
    • 2016
  • This paper proposes a noble voice recognition method based on an adaptive MFCC and deep learning for embedded systems. To enhance the recognition ratio of the proposed voice recognizer, ambient noise mixed into the voice signal has to be eliminated. However, noise filtering processes, which may damage voice data, diminishes the recognition ratio. In this paper, a filter has been designed for the frequency range within a voice signal, and imposed weights are used to reduce data deterioration. In addition, a deep learning algorithm, which does not require a database in the recognition algorithm, has been adapted for embedded systems, which inherently require small amounts of memory. The experimental results suggest that the proposed deep learning algorithm and HMM voice recognizer, utilizing the proposed adaptive MFCC algorithm, perform better than conventional MFCC algorithms in its recognition ratio within a noisy environment.

CIM 지향의 설비관리용 Graphic 환경구현과 DB 운용 (Graphic Environment & Database for Utility Management in CIM)

  • 김동훈;송준엽
    • 산업공학
    • /
    • 제7권3호
    • /
    • pp.227-237
    • /
    • 1994
  • In this study, graphic environment for system monitoring is designed that can efficiently manage monitoring data. And also system informations are inplemented to database for reliability and a utility management software is developed to monitor systems on graphic environment and RDBMS (Relational DataBase Management System). Specially, system status informations are presented in the forms of animation, graph, value, icon, and voice message. Status data and general basic informations of system can be all the times updated and indexly reported on database.

  • PDF

성별에 따른 한국 정상 성인 음성의 음향학적 평가 기준치 (Acoustic Characteristics of the Voices of Korean Normal Adults by Gender on MDVP)

  • 김재옥
    • 말소리와 음성과학
    • /
    • 제1권4호
    • /
    • pp.147-157
    • /
    • 2009
  • The purpose of the study is to develop the normal voice database and to analyze the acoustic characteristics of Korean adults' voices by gender using MDVP. Eight categories in the 34 parameters of MDVP were analyzed in the voices of 170 Korean normal adults taken from /a/ vowel. Among them, Fundamental Frequency Parameters and Frequency Perturbation Parameters were significantly different by gender. In addition, Fundamental Frequency Parameters of our data were remarkably different from the data suggested in the MDVP program which currently used in clinics. Therefore, the data obtained from the current study can be effectively used for the diagnosis of voice disorders of Korean adults as the standard parameter values of MDVP.

  • PDF

SMS 인증 기반의 보이스포탈에서의 음성인식을 위한 CTI 모듈 구현 (Voice Portal based on SMS Authentication at CTI Module Implementation by Speech Recognition)

  • 오세일;김봉현;고진환;박원배
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2001년도 춘계학술발표논문집 (하)
    • /
    • pp.1177-1180
    • /
    • 2001
  • 전화를 통해 인터넷 정보를 들을 수 있는 보이스 포탈(Voice Portal) 서비스가 인기를 얻고 있다. Voice Portal 서비스란 알고자 하는 정보를 Speech Recognition System에 음성으로 명령하면 전화를 통해 음성으로 원하는 정보를 듣는 서비스이다. Authentication의 절차를 수행하는 SMS (Short Message Service) 서버 Module, PSTN과 Database 서버사이의 Interface를 제공하는 CTI (Computer Telephony Integration) Module, CTI 서버와 WWW (World Wide Web) 사이의 Voice XML Module, 정보를 검색하기 위한 Searching Module들이 필요하다. 본 논문은 Speech Recognition technology를 기반으로 한 CTI Module 설계를 구현하였다. 또한 인정 방식으로 Random한 일회용 password를 기반으로 한 SMS Authentication을 택하므로 더욱 더 안정된 서비스 제공을 목적으로 하였다.

  • PDF

단위 선택 기반의 음성 변환 (Feature Selection-based Voice Transformation)

  • 이기승
    • 한국음향학회지
    • /
    • 제31권1호
    • /
    • pp.39-50
    • /
    • 2012
  • A voice transformation (VT) method that can make the utterance of a source speaker mimic that of a target speaker is described. Speaker individuality transformation is achieved by altering three feature parameters, which include the LPC cepstrum, pitch period and gain. The main objective of this study involves construction of an optimal sequence of features selected from a target speaker's database, to maximize both the correlation probabilities between the transformed and the source features and the likelihood of the transformed features with respect to the target model. A set of two-pass conversion rules is proposed, where the feature parameters are first selected from a database then the optimal sequence of the feature parameters is then constructed in the second pass. The conversion rules were developed using a statistical approach that employed a maximum likelihood criterion. In constructing an optimal sequence of the features, a hidden Markov model (HMM) was employed to find the most likely combination of the features with respect to the target speaker's model. The effectiveness of the proposed transformation method was evaluated using objective tests and informal listening tests. We confirmed that the proposed method leads to perceptually more preferred results, compared with the conventional methods.

Greeting, Function, and Music: How Users Chat with Voice Assistants

  • Wang, Ji;Zhang, Han;Zhang, Cen;Xiao, Junjun;Lee, Seung Hee
    • 감성과학
    • /
    • 제23권2호
    • /
    • pp.61-74
    • /
    • 2020
  • Voice user interface has become a commercially viable and extensive interaction mechanism with the development of voice assistants. Despite the popularity of voice assistants, the academic community does not utterly understand about what, when, and how users chat with them. Chatting with a voice assistant is crucial as it defines how a user will seek the help of the assistant in the future. This study aims to cover the essence and construct of conversational AI, to develop a classification method to deal with user utterances, and, most importantly, to understand about what, when, and how Chinese users chat with voice assistants. We collected user utterances from the real conventional database of a commercial voice assistant, NetEase Sing in China. We also identified different utterance categories on the basis of previous studies and real usage conditions and annotated the utterances with 17 labels. Furthermore, we found that the three top reasons for the usage of voice assistants in China are the following: (1) greeting, (2) function, and (3) music. Chinese users like to interact with voice assistants at night from 7 PM to 10 PM, and they are polite toward the assistants. The whole percentage of negative feedback utterances is less than 6%, which is considerably low. These findings appear to be useful in voice interaction designs for intelligent hardware.

HMM 기반 TTS와 MusicXML을 이용한 노래음 합성 (Singing Voice Synthesis Using HMM Based TTS and MusicXML)

  • 칸 나지브 울라;이정철
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권5호
    • /
    • pp.53-63
    • /
    • 2015
  • 노래음 합성이란 주어진 가사와 악보를 이용하여 컴퓨터에서 노래음을 생성하는 것이다. 텍스트/음성 변환기에 널리 사용된 HMM 기반 음성합성기는 최근 노래음 합성에도 적용되고 있다. 그러나 기존의 구현방법에는 대용량의 노래음 데이터베이스 수집과 학습이 필요하여 구현에 어려움이 있다. 또한 기존의 상용 노래음 합성시스템은 피아노 롤 방식의 악보 표현방식을 사용하고 있어 일반인에게는 익숙하지 않으므로 읽기 쉬운 표준 악보형식의 사용자 인터페이스를 지원하여 노래 학습의 편의성을 향상시킬 필요가 있다. 이 문제를 해결하기 위하여 본 논문에서는 기존 낭독형 음성합성기의 HMM 모델을 이용하고 노래음에 적합한 피치값과 지속시간 제어방법을 적용하여 HMM 모델 파라미터 값을 변화시킴으로서 노래음을 생성하는 방법을 제안한다. 그리고 음표와 가사를 입력하기 위한 MusicXML 기반의 악보편집기를 전단으로, HMM 기반의 텍스트/음성 변환 합성기를 합성기 후단으로서 사용하여 노래음 합성시스템을 구현하는 방법을 제안한다. 본 논문에서 제안하는 방법을 이용하여 합성된 노래음을 평가하였으며 평가결과 활용 가능성을 확인하였다.

남녀 음성 변환 기술연구 (A Study On Male-To-Female Voice Conversion)

  • 최정규;김재민;한민수
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2000년도 하계학술발표대회 논문집 제19권 1호
    • /
    • pp.115-118
    • /
    • 2000
  • Voice conversion technology is essential for TTS systems because the construction of speech database takes much effort. In this paper. male-to-female voice conversion technology in Korean LPC TTS system has been studied. In general. the parameters for voice color conversion are categorized into acoustic and prosodic parameters. This paper adopts LSF(Line Spectral Frequency) for acoustic parameter, pitch period and duration for prosodic parameters. In this paper. Pitch period is shortened by the half, duration is shortened by $25\%, and LSFs are shifted linearly for the voice conversion. And the synthesized speech is post-filtered by a bandpass filter. The proposed algorithm is simpler than other algorithms. for example, VQ and Neural Net based methods. And we don't even need to estimate formant information. The MOS(Mean Opinion Socre) test for naturalness shows 2.25 and for female closeness, 3.2. In conclusion, by using the proposed algorithm. male-to-female voice conversion system can be simply implemented with relatively successful results.

  • PDF

음성 인터페이스 기반의 재고 관리 시스템의 설계 및 구현 (Design and Implementation of Vocal Interface-Inventory Management System)

  • 박세진;권철홍
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.119-122
    • /
    • 2002
  • This paper focuses on building up a database of commercial stocks using XML syntax and looks into a way of building up a system with the combination of XML and XSLT that provides connectivity to client-server databases through vocal means. The use of XSLT has several advantages. Most importantly, it can transform a type of data into different formats. A vocal interface minimizes some space and time limits imposed on users outside premises when they need an instant connection to their database. In this fashion, the users can check information on stock lists without being pressurized by certain limits. PC, PDAs and cellular phones are some examples of mobile connection. The use of VoiceXML creates vocal applications. In VoiceXML servies, users can gain immediate access to data upon the input of their voices and the DTMF signals of the telephone.

  • PDF

An Experimental Study on Barging-In Effects for Speech Recognition Using Three Telephone Interface Boards

  • Park, Sung-Joon;Kim, Ho-Kyoung;Koo, Myoung-Wan
    • 음성과학
    • /
    • 제8권1호
    • /
    • pp.159-165
    • /
    • 2001
  • In this paper, we make an experiment on speech recognition systems with barging-in and non-barging-in utterances. Barging-in capability, with which we can say voice commands while voice announcement is coming out, is one of the important elements for practical speech recognition systems. Barging-in capability can be realized by echo cancellation techniques based on the LMS (least-mean-square) algorithm. We use three kinds of telephone interface boards with barging-in capability, which are respectively made by Dialogic Company, Natural MicroSystems Company and Korea Telecom. Speech database was made using these three kinds of boards. We make a comparative recognition experiment with this speech database.

  • PDF