• 제목/요약/키워드: Text-independent

검색결과 237건 처리시간 0.028초

음소별 GMM을 이용한 화자식별 (Speaker Identification using Phonetic GMM)

  • 권석봉;김회린
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.185-188
    • /
    • 2003
  • In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.

  • PDF

문장 독립 화자 인증을 위한 세그멘트 단위 혼합 계층 심층신경망 (Segment unit shuffling layer in deep neural networks for text-independent speaker verification)

  • 허정우;심혜진;김주호;유하진
    • 한국음향학회지
    • /
    • 제40권2호
    • /
    • pp.148-154
    • /
    • 2021
  • 문장 독립 화자 인증 연구에서는 일반화 성능 향상을 위해 문장 정보와 독립적인 화자 특징을 추출하는 것이 필수적이다. 그렇지만 심층 신경망은 학습 데이터에 의존적이므로, 동일한 시계열 정보를 반복 학습할 경우, 화자 정보를 학습하는 대신 문장 정보에 과적합 될 수 있다. 본 논문에서는 이러한 과적합을 방지하기 위해 시간 축으로 입력층 혹은 은닉층을 분할 및 무작위 재배열하여 시계열 정보의 순서를 뒤섞는 세그멘트 단위 혼합 계층을 제안한다. 세그멘트 단위 혼합 계층은 입력층 뿐만 아니라 은닉층에도 적용이 가능하므로, 입력층에서의 일반화 기법에 비해 효과적이라 알려진 은닉층에서의 일반화 기법으로 활용이 가능하며, 기존의 데이터 증강 방법과 동시에 적용할 수도 있다. 뿐만아니라, 세그멘트의 단위 크기를 조절하여 혼합의 정도를 조절할 수도 있다. 본 논문에서는 제안한 방법을 적용하여 문장 독립 화자 인증 성능이 개선됨을 확인하였다.

모음 검출을 통한 텍스트 독립 화자인식에 관한 연구 (A Study on the Text-Independent Speaker Recognition from the Vowel Extraction)

  • 김에녹;복혁규;김형래
    • 전자공학회논문지B
    • /
    • 제31B권10호
    • /
    • pp.82-91
    • /
    • 1994
  • In this thesis, we perform the experiment of speaker recognition by identifying vowels in the pronounciation of each speaker. In detail, we extract the vowels from the pronounciation of each speaker first. From it, we check the frequency energgy of 29 channels. After changing these into fuzzy values, we employ the fuzzy inference to recognize the speaker by text-dependent and text-independent methods. For this experiment, an algorithm of extracting vowels is developed, and newly introduced parameter is the frequency energy of the 29 channels computed from the extracted vowels. It shows the features of each speakers better than existing parameters. The advanced point of this paramter is to use the reference pattern only without the help of any codebook. As a rewult, test-dependent method showed about 95.5% rate of recognition, and text-independent method showed about 94.2% rate of recognition.

  • PDF

성량제한을 적용한 어구독립 화자증명 성능향상 방안 (On a Method Which Improves Text Independent Speaker Verification Performance through Limiting Speech Production Loudness)

  • 이태승;최호진
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2001년도 가을 학술발표논문집 Vol.28 No.2 (2)
    • /
    • pp.457-459
    • /
    • 2001
  • 지속음(continuants) 단위로 화자간 차이를 식별하는 어구독립 화자증명(text-independent speaker verification) 방식에서 입력음성의 성량을 제한하여 보다 높은 인식률을 달성할 수 있는 화자인식 방법을 제안한다.

  • PDF

Guiding Practical Text Classification Framework to Optimal State in Multiple Domains

  • Choi, Sung-Pil;Myaeng, Sung-Hyon;Cho, Hyun-Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제3권3호
    • /
    • pp.285-307
    • /
    • 2009
  • This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models.

Text-independent Speaker Identification by Bagging VQ Classifier

  • Kyung, Youn-Jeong;Park, Bong-Dae;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권2E호
    • /
    • pp.17-24
    • /
    • 2001
  • In this paper, we propose the bootstrap and aggregating (bagging) vector quantization (VQ) classifier to improve the performance of the text-independent speaker recognition system. This method generates multiple training data sets by resampling the original training data set, constructs the corresponding VQ classifiers, and then integrates the multiple VQ classifiers into a single classifier by voting. The bagging method has been proven to greatly improve the performance of unstable classifiers. Through two different experiments, this paper shows that the VQ classifier is unstable. In one of these experiments, the bias and variance of a VQ classifier are computed with a waveform database. The variance of the VQ classifier is compared with that of the classification and regression tree (CART) classifier[1]. The variance of the VQ classifier is shown to be as large as that of the CART classifier. The other experiment involves speaker recognition. The speaker recognition rates vary significantly by the minor changes in the training data set. The speaker recognition experiments involving a closed set, text-independent and speaker identification are performed with the TIMIT database to compare the performance of the bagging VQ classifier with that of the conventional VQ classifier. The bagging VQ classifier yields improved performance over the conventional VQ classifier. It also outperforms the conventional VQ classifier in small training data set problems.

  • PDF

가변어휘 인식기를 이용한 PDA상에서의 음성제어 구현 (Implementation of Voice Control on PDA using the Text Independent Vocabulary Recognizer)

  • 곽상훈;최승호;신도성;김진영
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.57-72
    • /
    • 2002
  • The technology of speech recognition has a wide field of application. The range of such technology is spreading into mobile computing having the large amount of movement for communication equipments at the present time. Particularly, recognition in internet environment is rapidly moving into mobile environment. Because of these environments, users want the faster speed of data transmission and the lighter portable equipment for data access. That is PDA(Personal Digital Assistant). Therefore, we designed a triphone-based text independent vocabulary recognizer for the implementation of speech control in this paper. The text independent vocabulary recognizer is based on the state .joint algorithm with decision trees

  • PDF

Text-Independent Speaker Verification Using Variational Gaussian Mixture Model

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • 제33권6호
    • /
    • pp.914-923
    • /
    • 2011
  • This paper concerns robust and reliable speaker model training for text-independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text-independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%.

텔레매틱스 환경에서 화자인증을 이용한 VoIP기반 음성 보안통신 (VoIP-Based Voice Secure Telecommunication Using Speaker Authentication in Telematics Environments)

  • 김형국;신동
    • 한국ITS학회 논문지
    • /
    • 제10권1호
    • /
    • pp.84-90
    • /
    • 2011
  • 본 논문은 텔레매틱스 환경에서 문장독립형 화자인증을 이용한 VoIP 음성 보안통신기술을 제안한다. 보안통신을 위해 송신측에서는 화자의 음성정보로부터 생성된 공개키를 통해 음성 패킷을 암호화하여 수신측에 전송함으로써 중간자 공격에 대항한다. 수신측에서는 수신된 암호화된 음성패킷을 복호화한 후에 추출된 음성 특징과 송신측으로부터 수신받은 음성키를 비교하여 화자인증을 수행한다. 제안된 방식에서는 Gaussian Mixture Model(GMM)-supervector를 Bayesian information criterion (BIC) 방식과 Mahalanobis distance (MD) 방식을 이용한 Support Vector Machine (SVM) 커널에 적용하여 문장독립형 화자인증 정확도를 향상시켰다.

On the Study of Textual Classics and Artistic Creation - Taking Buddhist Art Dunhuang Grottoes as an Example

  • Liu Tingting
    • International Journal of Advanced Culture Technology
    • /
    • 제11권3호
    • /
    • pp.205-210
    • /
    • 2023
  • Stone cave paintings are continuous interactions as independent mediums in places such as text, images and stone cave architecture. Unlike Buddha statues, the narrative of the text always fascinates and guides the viewer to the timeliness of the image, that is, the narrative. In particular, in Buddhist art, Buddha statues are never simple images, and murals are never simple paintings. Before the Tang Dynasty, most unknown artists were artisans, and many artists still worked on murals in temples and palaces, and independent paintings such as scrolls and sides became an important form of painting after the Tang Dynasty, changing the mechanism of painting creation. In this paper, the graphic creation process prioritizes dedication and service, but we can still feel the creativity of the painters strongly. The historical resources of how to paint these paintings, the clues to the copies, and the precursor to the foreground, encourage the painters to constantly try to resemble each other and discover problems...Therefore, in this paper, it was confirmed that reinvention and creativity are very important, and that Dunhuang Buddhist art is the basis for artists' creation and the source of vitality.