• 제목/요약/키워드: phonetic data

검색결과 200건 처리시간 0.019초

Back TranScription(BTS)기반 데이터 구축 검증 연구 (A Study on Verification of Back TranScription(BTS)-based Data Construction)

  • 박찬준;서재형;이설화;문현석;어수경;임희석
    • 한국융합학회논문지
    • /
    • 제12권11호
    • /
    • pp.109-117
    • /
    • 2021
  • 최근 인간과 컴퓨터의 상호작용(HCI)을 위한 수단으로 음성기반 인터페이스의 사용률이 높아지고 있다. 이에 음성인식 결과에 오류를 교정하기 위한 후처리기에 대한 관심 또한 높아지고 있다. 그러나 sequence to sequence(S2S)기반의 음성인식 후처리기를 제작하기 위해서는 데이터 구축을 위해 human-labor가 많이 소요된다. 최근 기존의 구축 방법론의 한계를 완화하기 위하여 음성인식 후처리기를 위한 새로운 데이터 구축 방법론인 Back TranScription(BTS)이 제안되었다. BTS란 TTS와 STT 기술을 결합하여 pseudo parallel corpus를 생성하는 기술을 의미한다. 해당 방법론은 전사자(phonetic transcriptor)의 역할을 없애고 방대한 양의 학습 데이터를 자동으로 생성할 수 있기에 데이터 구축에 있어서 시간과 비용을 단축할 수 있다. 본 논문은 기존의 BTS 연구를 확장하여 어떠한 기준 없이 데이터를 구축하는 것보다 어투와 도메인을 고려하여 데이터 구축을 해야함을 실험을 통해 검증을 진행하였다.

가변어휘 인식기를 이용한 PDA상에서의 음성제어 구현 (Implementation of Voice Control on PDA using the Text Independent Vocabulary Recognizer)

  • 곽상훈;최승호;신도성;김진영
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.57-72
    • /
    • 2002
  • The technology of speech recognition has a wide field of application. The range of such technology is spreading into mobile computing having the large amount of movement for communication equipments at the present time. Particularly, recognition in internet environment is rapidly moving into mobile environment. Because of these environments, users want the faster speed of data transmission and the lighter portable equipment for data access. That is PDA(Personal Digital Assistant). Therefore, we designed a triphone-based text independent vocabulary recognizer for the implementation of speech control in this paper. The text independent vocabulary recognizer is based on the state .joint algorithm with decision trees

  • PDF

ICA 기반의 특징변환을 이용한 화자적응 (Speaker Adaptation using ICA-based Feature Transformation)

  • 박만수;김회린
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.127-136
    • /
    • 2002
  • The speaker adaptation technique is generally used to reduce the speaker difference in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the transformation matrix is learned from a speaker independent training data. When the amount of data is small, however, it is necessary to adjust the ICA-based transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method: through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. We observed that the proposed technique is effective to adaptation performance.

  • PDF

음성 인터페이스 기반의 재고 관리 시스템의 설계 및 구현 (Design and Implementation of Vocal Interface-Inventory Management System)

  • 박세진;권철홍
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.119-122
    • /
    • 2002
  • This paper focuses on building up a database of commercial stocks using XML syntax and looks into a way of building up a system with the combination of XML and XSLT that provides connectivity to client-server databases through vocal means. The use of XSLT has several advantages. Most importantly, it can transform a type of data into different formats. A vocal interface minimizes some space and time limits imposed on users outside premises when they need an instant connection to their database. In this fashion, the users can check information on stock lists without being pressurized by certain limits. PC, PDAs and cellular phones are some examples of mobile connection. The use of VoiceXML creates vocal applications. In VoiceXML servies, users can gain immediate access to data upon the input of their voices and the DTMF signals of the telephone.

  • PDF

Three-Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

  • Zgank, Andrej
    • ETRI Journal
    • /
    • 제32권5호
    • /
    • pp.810-818
    • /
    • 2010
  • This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.

흉부음 데이터를 이용한 천식 질환 판별 (Classification of Asthma Disease Using Thoracic Data)

  • 문인섭;최형기;이철희;박기영;김종교
    • 대한음성학회지:말소리
    • /
    • 제49호
    • /
    • pp.135-144
    • /
    • 2004
  • In this paper, we make a study of classification normal from abnormal - normal, asthma through analysis of thoracic sound to take use thoracic sound detection system. Thoracic sound detection system has a function to store thoracic sound and analyze the data. The wave shape of thoracic sound is similar to noise and is systematically generated by inhalation and exhalation breathing, therefore, in this paper, to classify asthma sound in thoracic sound, we could discriminate between normal and abnormal case using level crossing rate(LCR) and spectrogram energy rate.

  • PDF

마이크로폰의 종류 및 설치거리에 따른 음성인식성능변화의 검토 (The Validation of Speech Recognition Performance Change according to the kind and established distance of the Microphone)

  • 김연화;이광현;최대림;김봉완;이용주
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.141-143
    • /
    • 2003
  • Speech recognition performance depends on various factors. One of the factors is the characteristic and established distance of a microphone which is used when speech data is collected. Thus, in the present experiment speech databases for tests are created through the type and established distance of a microphone. Then, acoustic models are built based on these databases, and each of the acoustic models is assessed by the data to determine recognition performance depending on various microphones and established microphone distances.

  • PDF

다양한 음성코퍼스의 통합관리시스템의 설계 및 구현에 관한 검토 (An Investigation for Design and Implementation of an Integrated Data Management System of Various Speech Corpora)

  • 황경훈;정창원;김영일;김봉완;이용주
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.69-72
    • /
    • 2003
  • In this paper, we investigate various factors that are relevant to design and implementation of an integrated management system for various speech corpora. The purpose of this paper is to manage an integrated management system for various kinds of speech corpora necessary for speech research and speech corpora consrtructed in different data formats. In addition, ways are considered to allow users to search with effect for speech corpora that meet various conditions which they want, and to allow them to add with ease corpora that are constructed newly. In order to achieve this goal, we design a global schema for an integrated management of new additional information without changing old speech corpora, and construct a web-based integrated management system based on the scheme that can be accessed without any temporal and spatial restrictions. And we show the steps by which these can be implemented, and describe related future study topics, examining the system.

  • PDF

훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식 (Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter)

  • 정성윤;배건성
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

한국어 위치정보 데이터의 발음 분석 (A Pronunciation Analysis on Korean Point-of-Interest Data)

  • 김선희
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.91-94
    • /
    • 2007
  • This paper aims to analyze the pronunciation of Korean Point-of-Interest (POI) data, which consist of 224 sound files, from the phonological point of view, adapting the notion of prosodic word within the framework of Intonational Phonology. Each POI word is broken down into prosodic words, which are defined as the minimal sequence of segments which can be produced as one Accentual Phrase (AP). Then the pronunciation of the POI word considering its prosodic words are analyzed. The results show that: in most cases, a prosodic word is realized as one AP; that, in some cases, two prosodic words are pronounced as one AP: and that no cases are found where 3 prosodic words are realized as one AP.

  • PDF