통합 검색 | Korea Science

한국어 공통 음성 DB구축 및 오류 검증 (Common Speech Database Collection and Validation for Communications)

이수종;김상훈;이영직
- 대한음성학회지:말소리
- /
- 제46호
- /
- pp.145-157
- /
- 2003
In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.
PDF

음성 DB 부가 정보 기술방안 표준화를 위한 제안 (Standardization for Annotation Information Description of Speech Database)

김상훈;이영직;한민수
- 대한음성학회지:말소리
- /
- 제47호
- /
- pp.109-120
- /
- 2003
This paper presents about the activities of speech database standardization in ETRI. Recently, with the support of government, ETRI and SiTEC have been gathering the large speech corpus for the domestic speech related companies. First, due to the lack of sharing the knowledge of speech database specification, the distributed speech database has a different format. Hence it seems to be needed to have the same format as soon as possible. ETRI and SiTEC are trying to find the better representation format of speech database. Second, we introduce a new description method of the annotation information of speech database. As one of the structured description method, XML based description will be applied to represent the metadata of the speech database. It will be continuously revised through the speech technology standard forum during this year.
PDF

자동차용 음성 DB 구축 시스템 개발 (Database Collection System for the Automotive Environment)

권오일
- 음성과학
- /
- 제9권3호
- /
- pp.61-73
- /
- 2002
We collect the Korean Database which can be trained for the speech recognition engine in an automotive environment. We describe the overall trends of the Korean database collections in this paper and suggest a database collection method for the speech recognition system of the car-kit and explain several conditions in collecting the database in the automotive environments. Finally, we expain an effective method of the Korean database collection in the automobile and the results of the database colletions, and the devised softwares used for the collection of the database.
PDF

음성 DB의 메타데이타 표준화 (Meta-data Standardization of Speech Database)

김상훈
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 10월 학술대회지
- /
- pp.61-64
- /
- 2003
In this paper, we introduce a new description method of annotation information of speech database. As one of structured description methods, XML based description which has been standardized by W3C will be applied to represent metadata of speech database. It will be continuously revised through the speech technology standard forum during this year
PDF

고품질 내장형 음성합성 시스템을 위한 음성합성 DB구현 (The implementation of database for high quality Embedded Text-to-speech system)

권오일
- 대한전자공학회논문지SP
- /
- 제42권4호
- /
- pp.103-110
- /
- 2005
음성 데이터베이스는 TTS 시스템에서 가장 중요한 요소 중의 하나이다. 특히, 내장형 TTS 시스템에서는 서버형 TTS 시스템에서보다 좀 더 작은 데이터베이스를 필요로 한다. 이러한 이유로, 음성합성 데이터의 압축과 통계적 축소과정의 비중은 내장형 TTS 시스템에서 아주 중요한 항목이라고 말할 수 있다. 그러나 이러한 압축과 통계적 축소과정은 합성음질의 저하를 유발시킨다. 본 논문에서는 고품질 내장형 TTS 시스템에서의 데이터 구축방법을 제안하며, MOS 테스트를 통한 합성음질을 검증한다.
PDF KSCI

만 3-5세 유아의 한국어 음성 데이터베이스 구축 (Speech Database for 3-5 years old Korean Children)

유재권;이경옥;이경미
- 한국콘텐츠학회논문지
- /
- 제12권4호
- /
- pp.52-59
- /
- 2012
유아는 만3~5세 사이에 언어 능력이 빠르게 발달하게 된다. 유아의 언어발달에 맞는 다양한 경험을 위해서는 그 시기에 맞는 콘텐츠 개발이 필요하다. 다양한 콘텐츠 개발을 위해 유아에 맞는 음성 인터페이스를 이용하는 것이 필요하지만, 한국어에서는 유아를 대상으로 한 데이터베이스가 구축이 되지 않았다. 본 논문에서는 한국어에서 만 3~5세 유아들의 객관화되고 정확한 음성 데이터 수집을 설계하기 위하여 발달시기에 맞는 적절한 단어 선정 및 성인과 다른 유아의 행동 특성 유형을 파악하는 과정을 거쳐 음성 데이터 베이스를 구축하였다. 단어의 경우 MCDI-K에서 두 단계를 걸쳐 선정하였고, 유아는 한 단어 당 세 번씩 발성하였다. 이렇게 수집된 음성 데이터는 유아별, 단어별 파일 토큰화 과정을 거쳐 데이터베이스로 구축되었다. 한국어 유아 음성 데이터베이스는 웹 페이지를 통한 기술 이전을 할 계획이며, 이를 통하여 유아들의 언어 발달에 유익한 다양한 콘텐츠 개발에 그 익일을 담당할 것으로 기대한다.
https://doi.org/10.5392/JKCA.2012.12.04.052 인용 PDF KSCI

'Hanmal' Korean Language Diphone Database for Speech Synthesis

Chung, Hyun-Song
- 음성과학
- /
- 제12권1호
- /
- pp.55-63
- /
- 2005
This paper introduces a 'Hanmal' Korean language diphone database for speech synthesis, which has been publicly available since 1999 in the MBROLA web site and never been properly published in a journal. The diphone database is compatible with the MBROLA programme of high-quality multilingual speech synthesis systems. The usefulness of the diphone database is introduced in the paper. The paper also describes the phonetic and phonological structure of the database, showing the process of creating a text corpus. A machine-readable Korean SAMPA convention for the control data input to the MBROLA application is also suggested. Diphone concatenation and prosody manipulation are performed using the MBR-PSOLA algorithm. A set of segment duration models can be applied to the diphone synthesis of Korean.
PDF

통신망환경 한국어 공통음성 DB 구축 (Common Speech Database Collection for Telecommunications)

김상훈;박문환;김현숙
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 5월 학술대회지
- /
- pp.23-26
- /
- 2003
This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.
PDF

고품질 음성합성을 위한 합성 DB 구축 (Speech Database Design and Structuring for High Quality TTS)

강동규;이승훈;류원호
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2002년도 11월 학술대회지
- /
- pp.33-36
- /
- 2002
As the telematics service that is the integration of information technology approaches commercialization, the necessity and gravity of speech technology is rapidly growing. The speech technology occupies important position in the telematics service because it informs the starting of service and the retrieved result. This service must provide high accuracy of speech recognition and natural synthesis of human speech in a driving environment and it is especially true for the fee-for-service. For high quality TTS, the speech synthesis technique that makes optimal synthesis database and uses efficiently this database is required. In this paper, we describe the design of phonetically balanced sentences used for speech database, the selection of service-suitable-speaker, the extraction methods of accurate phoneme boundary, and the factors which are taken into consideration in the extraction stage of prosody. Finally we show the real case that has commercially implemented.
PDF

대용량 운율 음성데이타를 이용한 자동합성방식 (Automatic Synthesis Method Using Prosody-Rich Database)

김상훈
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
- /
- pp.87-92
- /
- 1998
In general, the synthesis unit database was constructed by recording isolated word. In that case, each boundary of word has typical prosodic pattern like a falling intonation or preboundary lengthening. To get natural synthetic speech using these kinds of database, we must artificially distort original speech. However, that artificial process rather resulted in unnatural, unintelligible synthetic speech due to the excessive prosodic modification on speech signal. To overcome these problems, we gathered thousands of sentences for synthesis database. To make a phone level synthesis unit, we trained speech recognizer with the recorded speech, and then segmented phone boundaries automatically. In addition, we used laryngo graph for the epoch detection. From the automatically generated synthesis database, we chose the best phone and directly concatenated it without any prosody processing. To select the best phone among multiple phone candidates, we used prosodic information such as break strength of word boundaries, phonetic contexts, cepstrum, pitch, energy, and phone duration. From the pilot test, we obtained some positive results.
PDF

검색결과 329건 처리시간 0.021초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)