• Title/Summary/Keyword: 음성데이터베이스

Search Result 269, Processing Time 0.028 seconds

Faster User Enrollment for Neural Speaker Verification Systems (신경망 기반 화자증명 시스템에서 더욱 향상된 사용자 등록속도)

  • Lee, Tae-Seung;Park, Sung-Won;Hwang, Byong-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.1021-1026
    • /
    • 2003
  • While multilayer perceptrons (MLPs) have great possibility on the application to speaker verification, they suffer from inferior learning speed. To appeal to users, the speaker verification systems based on MLPs must achieve a reasonable enrolling speed and it is thoroughly dependent on the fast teaming of MLPs. To attain real-time enrollment on the systems, the previous two studies have been devoted to the problem and each satisfied the objective. In this paper, the two studies are combined and applied to the systems, on the assumption that each method operates on different optimization principle. By conducting experiments using an MLP-based speaker verification system to which the combination is applied on real speech database, the feasibility of the combination is verified from the results of the experiments.

  • PDF

Rapid Speaker Adaptation Based on Eigenvoice Using Weight Distribution Characteristics (가중치 분포 특성을 이용한 Eigenvoice 기반 고속화자적응)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.403-407
    • /
    • 2003
  • Recently, eigenvoice approach has been widely used for rapid speaker adaptation. However, even in the eigenvoice approach, Performance improvement using very small amount of adaptation data is relatively small in comparison with that using somewhat large adaptation data because the reliable estimation of weights of eigenvoice is difficult. In this paper, we propose a rapid speaker adaptation method based on eigenvoice using the weight distribution characteristics to improve the performance on a small adaptation data. In the Experimental results on vocabulary-independent word recognition task (using PBW 452 database), the weight threshold method alleviates the problem of relatively low performance for a tiny small adaptation data. When single adaptation word is used, word error rate is reduced about 9-18% by the weight threshold method.

Sound event classification using deep neural network based transfer learning (깊은 신경망 기반의 전이학습을 이용한 사운드 이벤트 분류)

  • Lim, Hyungjun;Kim, Myung Jong;Kim, Hoirin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.143-148
    • /
    • 2016
  • Deep neural network that effectively capture the characteristics of data has been widely used in various applications. However, the amount of sound database is often insufficient for learning the deep neural network properly, so resulting in overfitting problems. In this paper, we propose a transfer learning framework that can effectively train the deep neural network even with insufficient sound event data by employing rich speech or music data. A series of experimental results verify that proposed method performs significantly better than the baseline deep neural network that was trained only with small sound event data.

Design and Implementation of A Web Based Medical Image System for Telemedicine (원격진료를 위한 인터넷 기반의 의료영상시스템 설계 및 구현)

  • Lee, Su-Jin;Kim, Moon-Hae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.813-816
    • /
    • 2002
  • 컴퓨터 보급의 급속한 발전과 멀티미디어의 등장으로 기존의 텍스트와 이미지, 음성, 오디오, 동영상 등의 정보를 디지털화하고, 컴퓨터를 이용하여 저장, 처리, 전송하게 되면서 의료 분야에서도 상당한 업무의 변화를 요구하게 되었다. 의료 분야에서의 이러한 급격한 개방과 더불어 초고속 정보 통신의 발달은 원격진료라는 또 다른 요구를 대두시키고 있다. 이를 위해서는, 멀티미디어 기술, 대용량의 정보를 저장하는 데이터베이스 기술, 초속의 광 대역 기술 등을 통합하여 종합적인 멀티미디어 의료 정보 시스템을 구축하는 것이 시급하다. 이러한 이유들로 본 논문에서는 병원/의원의 의료진들로 하여금 의료영상이나 자료를 상호 전송하여 환자의 진료 또는 검진결과를 확인하고 전문가의 조언 등을 구하는 원격 진료용 의료영상 시스템의 요구사항을 분석, 설계하고 구현하였다. 본 시스템은 클라이언트/서버 구조로써 영상 획득 및 출력, 의료영상 국제 표준 포맷인 DICOM 포맷으로의 영상 저장, MCA(Multi Channel Analyzer), ROI(Region Of Interest) 등의 영상 분석, 필터링 및 영상 확대/축소/회전 등의 각종 영상 처리의 주요 기능을 갖으며, 사용자가 편리하고 쉽게 사용할 수 있도록 아이콘(icon) 중심의 직관적인 인터페이스를 갖는다.

  • PDF

Identification and Detection of Emotion Using Probabilistic Output SVM (확률출력 SVM을 이용한 감정식별 및 감정검출)

  • Cho, Hoon-Young;Jung, Gue-Jun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.375-382
    • /
    • 2006
  • This paper is about how to identify emotional information and how to detect a specific emotion from speech signals. For emotion identification and detection task. we use long-term acoustic feature parameters and select the optimal Parameters using the feature selection technique based on F-score. We transform the conventional SVM into probabilistic output SVM for our emotion identification and detection system. In this paper we propose three approximation methods for log-likelihoods in a hypothesis test and compare the performance of those three methods. Experimental results using the SUSAS database showed the effectiveness of both feature selection and Probabilistic output SVM in the emotion identification task. The proposed methods could detect anger emotion with 91.3% correctness.

A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering (혼합 가우시안 군집화를 이용한 상태공유 음향모델 최적화)

  • Ann, Tae-Ock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.167-176
    • /
    • 2005
  • This paper describes how the state tying model based on the decision tree which is one of Acoustic models used for speech recognition optimizes the model by reducing the number of mixture Gaussians of the output probability distribution. The state tying modeling uses a finite set of questions which is possible to include the phonological knowledge and the likelihood based decision criteria. And the recognition rate can be improved by increasing the number of mixture Gaussians of the output probability distribution. In this paper, we'll reduce the number of mixture Gaussians at the highest point of recognition rate by clustering the Gaussians. Bhattacharyya and Euclidean method will be used for the distance measure needed when clustering. And after calculating the mean and variance between the pair of lowest distance, the new Gaussians are created. The parameters for the new Gaussians are derived from the parameters of the Gaussians from which it is born. Experiments have been performed using the STOCKNAME (1,680) databases. And the test results show that the proposed method using Bhattacharyya distance measure maintains their recognition rate at $97.2\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. And the method using Euclidean distance measure shows that it maintains the recognition rate at $96.9\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. Then the methods can optimize the state tying model.

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

Home Health Care Service Using Routine Vital Sign Checkup and Electronic Health Questionnaires (주기적인 생리변수 측정과 전자건강설문을 이용한 재택건강관리서비스)

  • 박승훈;우응제;이광호;김종철
    • Journal of Biomedical Engineering Research
    • /
    • v.22 no.5
    • /
    • pp.469-477
    • /
    • 2001
  • In this Paper. we describe a home health care service using electronic health questionnaires and routine checkup of vital signs Including ECG (Electrocardiography) , blood pressure. and SpO$_2$ (Oxygen Saturation) . This system is for patients at home with chronic diseases, discharged Patients, or any normal people for the Prevention of disease The service requires a home health care terminal and a PC with Interned connection installed at Patient home. The distance health care management center is equipped with a vital-sign and questionnaire interpreter as well as database, Web, and notification servers with UMS (Unified Messaging System). Participating Physician can access the servers at the center using a Web browser running on a PC available to them at any time. These components are linked together through various kinds of data and voice communication channels including PSTN (Public Switched Telephone Network) . CATV(Community Antenna TV) . Interned. and mobile communication network. Following the Physician's direction given to a Patient. he or she uses the home health care terminal to collect vital signs and fill out the questionnaire. When the terminal automatically transmits these data to the management center. the data interpreter and servers at the center process the information fo1lowing the Protocol implemented on the system. Physicians can retrieve and review data corresponding to their Patients and send back their diagnostic reports to the center. UMS at the center delivers the physician 's recommendation to the corresponding patient through the notification server. Patients can also reprieve and review their own records as well as diagnostic reports from physicians. The system Provides a new way of collecting diagnostic information and delivering doctor's recommendation to patients at home for their health management. Future works are needed in the development of new technology for measurements and interpretations of various vital signs .

  • PDF

Design and Implementation of user centric pavilion information guide system based on commercial mobile device (모바일 기기 기반 사용자 중심형 전시관 정보 안내 시스템의 설계 및 구현)

  • Yun Hyun-Joo;Bu So-Young;Choi Yoo-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.187-199
    • /
    • 2006
  • This paper proposes the design of user centric pavilion information guide system based on mobile device such as PDA. which is composed of system interface factors as user wants. The suggested information guide system is convenient for user carrying because it is based on PDA mobile device and the used liquid browser system makes all data information displayed on a small screen in contrast with the other browser system. Indeed, factors of system interface can be re-composed by user interaction. And this system can effectively provide the detail information about the exhibited objects as various media data such as text, image, voice, music, video etc. The proposed system is made up of media database server. mobile system control server and mobile system interface which accepts user interaction and displays the information. Each system is networked based on TCP/IP and uses XML (extensible Markup Language) and JAVA 2 micro edition to be able to update data. This feature enhances a wide use to be able to load this system on the any mobile device.

  • PDF

Transfer Dictionary for A Token Based Transfer Driven Korean-Japanese Machine Translation (토큰기반 변환중심 한일 기계번역을 위한 변환사전)

  • Yang Seungweon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.3
    • /
    • pp.64-70
    • /
    • 2004
  • Korean and Japanese have same structure of sentences because they belong to same family of languages. So, The transfer driven machine translation is most efficient to translate each other. This paper introduce a method which creates a transfer dictionary for Token Based Transfer Driven Koran-Japanese Machine Translation(TB-TDMT). If the transfer dictionaries are created well, we get rid of useless effort for traditional parsing by performing shallow parsing. The semi-parser makes the dependency tree which has minimum information needed output generating module. We constructed the transfer dictionaries by using the corpus obtained from ETRI spoken language database. Our system was tested with 900 utterances which are collected from travel planning domain. The success-ratio of our system is $92\%$ on restricted testing environment and $81\%$ on unrestricted testing environment.

  • PDF