Search | Korea Science

An Implementation of Speech DB Gathering System Using VoiceXML (VoiceXML을 이용한 음성 DB 수집 시스템 구현)

Kim Dong-Hyun;Roh Yong-Wan;Hong Kwang-Seok
- Journal of Internet Computing and Services
- /
- v.6 no.1
- /
- pp.39-50
- /
- 2005
Speech DB is basically required factor when we are study for phonetics, speech recognition and speech synthesis and so on. The quantity and quality of speech DB decide the efficiency of system that we develop. therefore. speech DB has an extremely important factor, Recently, development of the various telephone service technique such as voice portal. it is actual condition where the necessity of collection of telephone speech DB. The existing IVR application telephone speech DB collection system used C/C++ language or the exclusive development tool. Thus it is the actual condition where the recycle of each application service for resources is difficult and have a problem of many labors and time necessity. But. VoiceXML is a language having tag form ipredicated in XML. which has easy and simple grammar system. Therefore, if we make a few efforts we could draw up easily. it has a merit reducing labors and time, Also, VoiceXML has many advantages of various telephone speech DB gathering because of changing contents of DB. In this paper, we introduce telephone speech DB gathering system which is the mast important factor for development of speech information processing technique.
PDF

Activities of Speech DB construction out of Countries (해외 음성 DB 구축 동향)

이용주
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.253-260
- /
- 1995
음성정보처리 연구에 공통으로 이용 가능한 대량의 각종 음성 데이터를 수집, 편집, 배포하는 dfl은 연구 개발자의 입장에서는 분석, 합성, 인식등의 알고리즘 개발 평가에 이용 가능하며, 음성인식, 합성 시스템의 사용자 입장에서는 각종 시스템의 성능을 객관적으로 평가할 수 있다는 면에서 매우 중요하다. 본 논문에서는 국내 음성 DB 의 효율적인 구축을 위한 방안 도출에 참고하기 위하여 해외 각국의 구축 동향을 기관별, 형태별, 분야별로 구체적으로 정리하여 소개한다.
PDF

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

Jo, Min Su;Kwon, Chul Hong
- The Journal of the Convergence on Culture Technology
- /
- v.7 no.4
- /
- pp.675-681
- /
- 2021
With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.
https://doi.org/10.17703/JCCT.2021.7.4.675 인용 PDF KSCI

신약설계를 위한 화합물 DB-chemical Database for Drug Design-

Lee, Seong-Gwang;No, Gyeong-Tae
- Journal of Scientific & Technological Knowledge Infrastructure
- /
- s.5
- /
- pp.41-50
- /
- 2001
화학구조 D B는 그 목적에 다양하게 분류될 수 있는데, 유사한 약효를 검색하기 위한 유사도 검색(similarity search) DB와 유기합성을 위한 reaction DB, 실험이나 계산으로 얻은 물성을 모은 property DB, 생물학적 검증 데이터를 모은 activity DB등이 있다. 이러한 화학 D B는 신약을 설계하는 입장에서 볼 때, 앞에서 말한 다양한 D B로서의 목적을 모두 충족시킬 수 있는 유기적인 설계가 바람직하다.
PDF

Effects of Composite Floor Slab on Seismic Performance of Welded Steel Moment Connections (철골모멘트 용접접합부의 내진성능에 미치는 합성슬래브의 영향)

Lee, Cheol Ho;Jung, Jong Hyun;Kim, Jeong Jae
- Journal of Korean Society of Steel Construction
- /
- v.26 no.5
- /
- pp.385-396
- /
- 2014
Traditionally, domestic steel design and construction practice has provided extra shear studs to moment frame beams even when they are designed as non-composite beams. In the 1994 Northridge earthquake, connection damage initiated from the beam bottom flange side was prevalent. The upward moving of the neutral axis due to the composite action between steel beam and floor deck was speculated to be one of the critical causes. In this study, full-scale seismic testing was conducted to investigate the side effects of the composite action in steel seismic moment frames. The specimen PN700-C, designed following the domestic connection and floor deck details, exhibited significant upward shift of the neutral axis under sagging (or positive) moment, thus producing high strain demand on the bottom flange, and showed a poor seismic performance because of brittle fracture of the beam bottom flange at 3% story drift. The specimen DB700-C, designed by using RBS connection and with the details of minimized floor composite action, exhibited superior seismic performance, without experiencing any fracture or concrete crushing, almost identical to the bare steel counterpart (specimen DB700-NC). The results of this study clearly indicate that the beams and connections in seismic steel moment frames should be constructed to minimize the composite action of a floor deck if possible.
https://doi.org/10.7781/kjoss.2014.26.5.385 인용 PDF KSCI

Common Speech Database Collection (공통음성 DB 구축)

Kim Sanghum;Oh Seungshin;Jung Ho-Young;Jeong Hyung-Bae;Kim Jeong-Se
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.21-24
- /
- 2002
본 논문은 ETRI 음성정보연구센터에서 추진하고 있는 공통음성 DB 구축에 관하여 기술한다. 총 3 년(2001 11-2004. 10) 동안 음성인식, 음성합성, 화자인식 등 다양한 용도의 음성 DB 를 수집할 예정이며, 1년차인 2002 년에는 총 14 종의 음성 DB 를 수집할 계획이다. 공통 음성 DB 는 다양한 통신망(마이크, 헤드셋, VoIP, 유무선 전화망), 지역, 성별, 발성환경(사무실, 지하철, 도로 등)을 고려하여 설계하였으며, 발성대상은 숫자, 단어, 문장이고, 발성방법은 자유발화, 대화체, 낭독체 등 다양한 스타일의 음성 DB 로 구성되어 있다. 이에 본 논문에서는 총 14 종에 해당하는 공통음성 DB 의 구축내역과 구축방안 및 DB 구축 일정에 관해 기술하고자 한다.
PDF

Development and Evaluation of Automatic Pothole Detection Using Fully Convolutional Neural Networks (완전 합성곱 신경망을 활용한 자동 포트홀 탐지 기술의 개발 및 평가)

Chun, Chanjun;Shim, Seungbo;Kang, Sungmo;Ryu, Seung-Ki
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.17 no.5
- /
- pp.55-64
- /
- 2018
In this paper, we propose fully convolutional neural networks based automatic detection of a pothole that directly causes driver's safety accidents and the vehicle damage. First, the training DB is collected through the camera installed in the vehicle while driving on the road, and the model is trained in the form of a semantic segmentation using the fully convolutional neural networks. In order to generate robust performance in a dark environment, we augmented the training DB according to brightness, and finally generated a total of 30,000 training images. In addition, a total of 450 evaluation DB was created to verify the performance of the proposed automatic pothole detection, and a total of four experts evaluated each image. As a result, the proposed pothole detection showed robust performance for missing.
https://doi.org/10.12815/kits.2018.17.5.55 인용 PDF KSCI

Standardization of XML based Meta-data for Industrial Speech Databases (산업용 음성 DB 메타데이터 표준화)

Joo, Young-Hee;Hong, Ki-Hyung
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.211-214
- /
- 2005
본고에서는 산업용 음성 DB를 위한 XML 기반 메타데이터의 표준화에 대한 현재 상황과 표준화 활동에 대하여 소개한다. 산업용 음성 DB는 구축에 많은 시간과 비용을 요구하며, 양질의 음성 처리 시스템 (인식/합성/인증)의 개발을 위해서는 가능한 많은 양의 음성 데이터가 필요하다. 산업용 음성 DB 메타데이터 표준화는 서로 다른 기관에서 구축한 음성 DB의 공유와 재사용을 원활히 하기 위하여, 2004년 9월부터 요구사항 분석을 시작하여, 2005년 3월 초안이 완성되었다. 본 표준안은 음성 DB 메타데이터의 구조를 XML 기반으로 정의한 것이며, 음성 파일 이름, 화자 식별자, 음소 기호와 같은 구조 외의 표준화 대상에 대해서는 다루지 않는다. 이미 ETRI와 SiTEC [5]에서 XML 기반의 메타데이터 구조와 내용 표준안을 제안한 바 있으나. [5]에서 제안한 구조는 평면 구조를 취하고 있어 내용의 중복성등의 단점이 있어, 이를 보완하여 음성 DB 데이터 모델을 객체지향 방식으로 설계하였다.
PDF

Web based VAD using HUVOIS solution (웹으로 운용하는 음성인식 무인자동교환시스템)

KIM HEE-KYUNG
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.47-48
- /
- 2004
최근 음성시장은 VoiceXML 을 이용하여 간단히 시나리오를 작성할 수 있어, 보다 쉽게 다양한 어플리케이션을 개발하여 서비스할 수 있게 되었다. HUVOIS-VAD 는 KT가 자체 개발한 음성인식/합성 플랫폼 시스템인 HUVOIS 에 VXML 을 이용하여 시나리오를 구성한 음성인식/합성 VAD 시스템으로, 사내 및 사외 여러 사이트에 설치 운용 중이다. 특히, Web 을 기반으로 운용시스템을 구축하여 운용자는 어느 곳에서나, 운용 DB를 점검하거나, 인식사전을 구성하고, 운용할 수 있으며, 사용자 또한 개인의 DB 를 직접 관리하도록 하고, 회사의 DB 와 연계하여 개인의 VAD로 동시에 사용할 수 있도록 하고 있다. 본 논문에서는 Web 을 기반으로 구성된 HUVOIS-VAD 시스템에 대하여 기술하고자 한다.
PDF

Performance change of defect classification model of rotating machinery according to noise addition and denoising process (노이즈 추가와 디노이징 처리에 따른 회전 기계설비의 결함 분류 모델 성능 변화)

Se-Hoon Lee;Sung-Soo Kim;Bi-gun Cho
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2023.07a
- /
- pp.1-2
- /
- 2023
본 연구는 환경 요인이 통제되어 있는 실험실 데이터에 산업 현장에서 발생하는 유사 잡음을 노이즈로 추가하였을 때, SNR비에 따른 노이즈별 STFT Log Spectrogram, Mel-Spectrogram, CWT Spectrogram 총 3가지의 이미지를 생성하고, 각 이미지를 입력으로 한 CNN 결함 분류 모델의 성능 결과를 확인하였다. 원본 데이터의 영향력이 큰 0db 이상의 SNR비로 합성할 경우 원본 데이터와 분류 결과상 큰 차이가 존재하지 않았으며, 노이즈 데이터의 영향이 큰 0db 이하의 SNR비로 합성할 경우, -20db의 STFT 이미지 기준 약 26%의 성능 저하가 발생하였다. 또한, Wiener Filtering을 통한 디노이징 처리 이후, 노이즈를 효과적으로 제거하여 분류 성능의 결과가 높아지는 점을 확인하였다.
PDF

Search Result 87, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)