• Title/Summary/Keyword: Text-to-speech

Search Result 505, Processing Time 0.025 seconds

A Language Model based on VCCV of Sentence Speech Recognition (문장 음성 인식을 위한 VCCV기반의 언어 모델)

  • 박선희;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2419-2422
    • /
    • 2003
  • To improve performance of sentence speech recognition systems, we need to consider perplexity of language model and the number of words of dictionary for increasing vocabulary size. In this paper, we propose a language model of VCCV units for sentence speech recognition. For this, we choose VCCV units as a processing units of language model and compare it with clauses and morphemes. Clauses and morphemes have many vocabulary and high perplexity. But VCCV units have small lexicon size and limited vocabulary. An advantage of VCCV units is low perplexity. This paper made language model using bigram about given text. We calculated perplexity of each language processing unit. The perplexity of VCCV units is lower than morpheme and clause.

  • PDF

Sentence design for speech recognition database

  • Zu Yiqing
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.472-472
    • /
    • 1996
  • The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]

  • PDF

힐베르트의 '수학 문제'에 관하여

  • 한경혜
    • Journal for History of Mathematics
    • /
    • v.16 no.4
    • /
    • pp.33-44
    • /
    • 2003
  • This article shows Hilbert's Paris address within the context of his career and the mathematics of his day. And the text of Hilbert's speech will be considered in some detail, particularly those parts of it that reflects his vision of mathematics most clearly. At the end it is attempted to show the limitations of the standard view of Hilbert as an advocate of a formalist approach to mathematics.

  • PDF

Table Structure Recognition in Images for Newspaper Reader Application for the Blind (시각 장애인용 신문 구독 프로그램을 위한 이미지에서 표 구조 인식)

  • Kim, Jee Woong;Yi, Kang;Kim, Kyung-Mi
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.11
    • /
    • pp.1837-1851
    • /
    • 2016
  • Newspaper reader mobile applications using text-to-speech (TTS) function enable blind people to read newspaper contents. But, tables cannot be easily read by the reader program because most of the tables are stored as images in the contents. Even though we try to use OCR (Optical character reader) programs to recognize letters from the table images, it cannot be simply applied to the table reading function because the table structure is unknown to the readers. Therefore, identification of exact location of each table cell that contains the text of the table is required beforehand. In this paper, we propose an efficient image processing algorithm to recognize all the cells in tables by identifying columns and rows in table images. From the cell location data provided by the table column and row identification algorithm, we can generate table structure information and table reading scenarios. Our experimental results with table images found commonly in newspapers show that our cell identification approach has 100% accuracy for simple black and white table images and about 99.7% accuracy for colored and complicated tables.

A Development of Administrative Affairs Supporting System using Call Control Mode of CTI (CTI 호출 제어 방식을 이용한 행정 업무 지원 시스템의 개발)

  • 최준기;조성범;정상수;이상정
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.2
    • /
    • pp.46-60
    • /
    • 1999
  • Recently, CTI (Computer Telephony Integration) technology has been widely applied to various area such as video conference, file transfer, voice mail, automatic message transfer and automatic redial, integrated messaging and network fax. In this paper, an administrative affairs supporting system using call control mode of CTI is designed. To improve inefficient processing of job due to heavy calling from entrance candidates during entrance examination of a college, the system is developed. The database of the system is desigend using object modeling technique. Also, the automatic calling and response system using CTI call control mode is implemented. Especially, to interface with voice of candidates who ask whether they pass or fail the entrance examination of the college, TTS(Text To Speech) module is developed.

  • PDF

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase (지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

  • Kim, Kwang-Ho;Lim, Min-Kyu;Kim, Ji-Hwan
    • MALSORI
    • /
    • v.68
    • /
    • pp.115-126
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.

  • PDF

Automatic Speech Recognition Research at Fujitsu (후지쯔에 있어서의 음성 자동인식의 현상과 장래)

  • Nara, Yasuhiro;Kimura, Shinta;Loken-Kim, K.H.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.82-91
    • /
    • 1991
  • The history of automatic speech recognition research, and current and future speech products at Fujitsu are introduced here. The speech recognition research at Fujitsu started in 1970. Our research efforts have results in the production of a speaker dependent 12,000 word discrete / connected word recognizer(F2360), and a speaker independent 17 word discrete word recognizer(F2355L/S). Currently, we are working on a larger vocabulary speech recognizer, in which an input utterance will be matched with networks representing possible phonemic variations. Its application to text input is also discussed.

  • PDF

On-Line Linear Combination of Classifiers Based on Incremental Information in Speaker Verification

  • Huenupan, Fernando;Yoma, Nestor Becerra;Garreton, Claudio;Molina, Carlos
    • ETRI Journal
    • /
    • v.32 no.3
    • /
    • pp.395-405
    • /
    • 2010
  • A novel multiclassifier system (MCS) strategy is proposed and applied to a text-dependent speaker verification task. The presented scheme optimizes the linear combination of classifiers on an on-line basis. In contrast to ordinary MCS approaches, neither a priori distributions nor pre-tuned parameters are required. The idea is to improve the most accurate classifier by making use of the incremental information provided by the second classifier. The on-line multiclassifier optimization approach is applicable to any pattern recognition problem. The proposed method needs neither a priori distributions nor pre-estimated weights, and does not make use of any consideration about training/testing matching conditions. Results with Yoho database show that the presented approach can lead to reductions in equal error rate as high as 28%, when compared with the most accurate classifier, and 11% against a standard method for the optimization of linear combination of classifiers.

Development of Automatic Creating Web-Site Tool for the Blind (시각장애인용 웹사이트 자동생성 툴 개발)

  • Baek, Hyeun-Ki;Ha, Tai-Hyun
    • Journal of Digital Contents Society
    • /
    • v.8 no.4
    • /
    • pp.467-474
    • /
    • 2007
  • This paper documents the design and implementation of an automatic creating web-site tool for the blind to build their own homepage by using both voice recognition and voice mixed technology with equal ease as the non-disabled. The blind can make voice mails, schedules, address lists and bookmarks by making use of the tool. It also facilitates communication between the non-disabled with the help of their information management system. This tool converts basic commands into voice recognition, also making an offer of text-to-speech which supports voice output. In the end, the tool will remove the blind's social isolation, allowing them to enjoy the information age like the non-disabled.

  • PDF