Search | Korea Science

Automatic Speech Database Verification Method Based on Confidence Measure

Kang Jeomja;Jung Hoyoung;Kim Sanghun
- MALSORI
- /
- no.51
- /
- pp.71-84
- /
- 2004
In this paper, we propose the automatic speech database verification method(or called automatic verification) based on confidence measure for a large speech database. This method verifies the consistency between given transcription and speech using the confidence measure. The automatic verification process consists of two stages : the word-level likelihood computation stage and multi-level likelihood ratio computation stage. In the word-level likelihood computation stage, we calculate the word-level likelihood using the viterbi decoding algorithm and make the segment information. In the multi-level likelihood ratio computation stage, we calculate the word-level and the phone-level likelihood ratio based on confidence measure with anti-phone model. By automatic verification, we have achieved about 61% error reduction. And also we can reduce the verification time from 1 month in manual to 1-2 days in automatic.
PDF

Automatic Synthesis Method Using Prosody-Rich Database (대용량 운율 음성데이타를 이용한 자동합성방식)

김상훈
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.87-92
- /
- 1998
In general, the synthesis unit database was constructed by recording isolated word. In that case, each boundary of word has typical prosodic pattern like a falling intonation or preboundary lengthening. To get natural synthetic speech using these kinds of database, we must artificially distort original speech. However, that artificial process rather resulted in unnatural, unintelligible synthetic speech due to the excessive prosodic modification on speech signal. To overcome these problems, we gathered thousands of sentences for synthesis database. To make a phone level synthesis unit, we trained speech recognizer with the recorded speech, and then segmented phone boundaries automatically. In addition, we used laryngo graph for the epoch detection. From the automatically generated synthesis database, we chose the best phone and directly concatenated it without any prosody processing. To select the best phone among multiple phone candidates, we used prosodic information such as break strength of word boundaries, phonetic contexts, cepstrum, pitch, energy, and phone duration. From the pilot test, we obtained some positive results.
PDF

The Speech Database for Large Scale Word Recognizer (Large scale word recognizer를 위한 음성 database - POW)

임연자
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.291-294
- /
- 1995
본논문은 POW algorithm과 알고리즘을 통해 수행된 결과인 large scale word recognizer를 위한 POW set에 대하여 설명하겠다. Large scale word recognizer를 위한 speech database를 구축하기 위해서는 모든 가능한 phonological phenomenon이 POW set에 포함 되어얗 ks다. 또한 POW set의 음운 현상들의 분포는 추출하고자 하는 모집단의 음운현상들의 분포와 유사해야 한다. 위와 같은 목적으로 다음과 같이 3가지 성질을 갖는 POW set을 추출하기 위한 새로운 algorithm을 제안한다. 1. 모집단에서 발생하는 모든 음운현상을 포함해야 한다. 2, 최소한의 단어 집합으로 구성되어야 한다. 3. POW set과 모집단의 음운현상의 분포가 유사해야 한다. 우리는 약 300만 어절의 한국어 text corpus로부터 5천 단어의 고빈도 어절을 추출하고 이로부터 한국어 POW set을 추출하였다.
PDF

The Extraction of Effective Index Database from Voice Database and Information Retrieval (음성 데이터베이스로부터의 효율적인 색인데이터베이스 구축과 정보검색)

Park Mi-Sung
- Journal of Korean Library and Information Science Society
- /
- v.35 no.3
- /
- pp.271-291
- /
- 2004
Such information services source like digital library has been asked information services of atypical multimedia database like image, voice, VOD/AOD. Examined in this study are suggestions such as word-phrase generator, syllable recoverer, morphological analyzer, corrector for voice processing. Suggested voice processing technique transform voice database into tort database, then extract index database from text database. On top of this, the study suggest a information retrieval model to use in extracted index database, voice full-text information retrieval.
PDF

A Validation of the Isolated Word Speech Database (훈련용 단어 음성DB 검증)

Lee Soo-jong;Kim Sanghun;Lee Youngjik
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.36-39
- /
- 2003
The purpose of this paper is to correct the errors in the isolated word speech database under the PC environment, and to analyze the various errors. The importance and procedures of the error detection are also described.
PDF

Data Input and Output of Unstructured Data of Large Capacity (대용량 비정형 데이터 자료 입력 및 출력)

Sim, Kyu-Cheol;Kang, Byung-Jun;Kim, Kyung-Hwan;Jung, Hoe-Kyung
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2013.05a
- /
- pp.613-615
- /
- 2013
Request to provide a service to XML word file recently has been increasing. In this paper, it is converted to an XML file data input (HWP, MS-Office) a Word file, stored in a database by extracting data directly input to the word processor user creates an XML mapping file I to provide a system that. This can be retrieved from the database the required data to previously created forms word processor, to generate a Word file from the application program a word processing document.
PDF

Speech Recognition in the Car Noise Environment (자동차 소음 환경에서 음성 인식)

김완구;차일환;윤대희
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.30B no.2
- /
- pp.51-58
- /
- 1993
This paper describes the development of a speaker-dependent isolated word recognizer as applied to voice dialing in a car noise environment. for this purpose, several methods to improve performance under such condition are evaluated using database collected in a small car moving at 100km/h The main features of the recognizer are as follow: The endpoint detection error can be reduced by using the magnitude of the signal which is inverse filtered by the AR model of the background noise, and it can be compensated by using variants of the DTW algorithm. To remove the noise, an autocorrelation subtraction method is used with the constraint that residual energy obtainable by linear predictive analysis should be positive. By using the noise rubust distance measure, distortion of the feature vector is minimized. The speech recognizer is implemented using the Motorola DSP56001(24-bit general purpose digital signal processor). The recognition database is composed of 50 Korean names spoken by 3 male speakers. The recognition error rate of the system is reduced to 4.3% using a single reference pattern for each word and 1.5% using 2 reference patterns for each word.
PDF

Construction of Full-Text Database and Implementation of Service Environment for Electronic Theses and Dissertations (학위논문 전문데이터베이스 구축 및 서비스환경 구현)

Lee, Kyi-Ho;Kim, Jin-Suk;Yoon, Wha-Muk
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.1
- /
- pp.41-49
- /
- 2000
Form the middle of 199os, most universities in Korea have requested their students to submit not only the original text books but also their Electronic Theses and Dissertations(ETD) for masters degree and doctorates degree. The ETD submitted by the students are usually developed by various kinds of word processors such as MS-Word, LaTex, and HWP. Since there is no standard format for ETD to merge various different formats yet, it is difficult to construct the integrated database that provides full-tex service. In this paper, we transform three different ETD formats into a unified one, construct a full-text database, and implement the full-text retrieval system for effective search in the Internet environment.
PDF

멀티미디어 데이터의 활용

Hwang, Hui-Jeong
- Digital Contents
- /
- no.11 s.42
- /
- pp.64-68
- /
- 1996
지난달까지 홈페이지 제작에 필요한 각종 HTML 태그들과 이들의 활용에 필요한 여러 가지 사항들을 살펴 보았다. 이번 달에는 딱딱한 HTML 태그에서 벗어나 좀더 재밌있고. 활용성이 강한 멀티미디어 데이터를 홈페이지에서 활용하는 방안에 대해 다루어 보겠다. 그리고 다음달에는 Microsoft Internet Assistant For Word를 이용 Word에서 만들어진 데이터를 HTML태그입력 없이 손쉽게 홈페이지를 만드는 방법을 소개하고자 한다.
PDF

Sub-word Based Offline Handwritten Farsi Word Recognition Using Recurrent Neural Network

Ghadikolaie, Mohammad Fazel Younessy;Kabir, Ehsanolah;Razzazi, Farbod
- ETRI Journal
- /
- v.38 no.4
- /
- pp.703-713
- /
- 2016
In this paper, we present a segmentation-based method for offline Farsi handwritten word recognition. Although most segmentation-based systems suffer from segmentation errors within the first stages of recognition, using the inherent features of the Farsi writing script, we have segmented the words into sub-words. Instead of using a single complex classifier with many (N) output classes, we have created N simple recurrent neural network classifiers, each having only true/false outputs with the ability to recognize sub-words. Through the extraction of the number of sub-words in each word, and labeling the position of each sub-word (beginning/middle/end), many of the sub-word classifiers can be pruned, and a few remaining sub-word classifiers can be evaluated during the sub-word recognition stage. The candidate sub-words are then joined together and the closest word from the lexicon is chosen. The proposed method was evaluated using the Iranshahr database, which consists of 17,000 samples of Iranian handwritten city names. The results show the high recognition accuracy of the proposed method.
https://doi.org/10.4218/etrij.16.0115.0542 인용 PDF KSCI

Search Result 235, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)