Search | Korea Science

An Analysis on the Phoneme Duration Modeling For the Trainable TTS System (Trainable TTS System을 위한 음운 지속시간 모델링)

Seo Jiln;Lee Yanghee
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.109-112
- /
- 2001
본 논문에서는 한국어 Trainable TTS System의 자연스러운 음성 합성을 위해 400문장(어절수 : 6,220, 음운수: 총43,701: 자음 23,899,모음: 19,802)에 대하여 단일 남성화자가 발성한 문 음성 데이터를 음운레벨세그먼트, 음운 라벨링 ,어절간의 띄어쓰기 ,어절에 대한 음운별 품사가 태깅된 문 음성 코퍼스를 사용하여 음운 환경과 품사에 의하여 음운의 지속시간이 어떻게 변화하는가에 대하여 통계적으로 분석하였다. 그리고 음운 지속시간을 보다 정교하게 예측하기 위하여, 각 음운에 대한 고유 지속시간의 영향이 배제된 정규화 음운지속시간에 대한 회귀트리를 이용하여 정규화 지속시간에 영향을 미치는 특징요소들 간의 관계를 통계적인 방법으로 분석하였다. 그 결과 문법적인 특징요소를 나타내는 요소들간에 서로 상관이 높게 나타나는 것을 알 수 있었다 그리고 이러한 경우 유사한 특징 요소들간에 상관이 1에 가까울 정도로 상관이 높은 요소들의 경우 예측지수가 낮은 요소들을 제거하여도 지속시간변화에 영향을 미치지 못하는 것으로 나타났다. 그 결과 문법적 성질이 유사한 특징 요소들을 회귀트리를 통해 모델링할 경우에 요소들간의 상관정도를 분석하여 최소한의 특징요소들을 선택 할 수 있는 방법을 제시하였다 그리고 이를 토대로 한 정규화 회귀트리의 모델링이 지속시간 회귀트리 모델링보다 우수함을 입증하였다.
PDF

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network (심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용)

Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.1
- /
- pp.64-68
- /
- 2020
In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.
https://doi.org/10.7776/ASK.2020.39.1.064 인용 PDF KSCI

Performance Improvement of Speech Recognition System Based on Speaker Normalization Through Linear Warping Function (선형워핑함수의 화자정규화에 의한 음성 인식시스템의 성능향상)

Choi, Seok-Yong;Chung, Kyoung-Yong;Lee, Jung-Hyun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.10b
- /
- pp.879-882
- /
- 2000
화자종속 음성인식 시스템은 훈련 데이터가 화자들 사이의 음향적 변이를 충분히 모델링 할 수 있을 때, 화자독립 시스템보다 더 성능이 졸은 것으로 알려져 있다. 화자 정규화 기술은 입력음성의 스펙트럼을 수정하여 화자들 사이의 변이를 줄인다. 최근 성공적인 화자 정규화 알고리즘은 신호처리단계에 화자 특유 주파수 워핑을 통합했다. 이런 알고리즘은 입력음성에 담겨있는 음향적 특징을 다 사용하지 않는다. 본 논문에서는 화자의 음향적 특징으로 세 개의 포만트 주파수를 이용하였고, 수집된 포만트 주파수들로부터 워핑함수를 정의하는데 선형회귀를 사용한 화자 정규화 방법을 제안한다. 이 방법을 사용하여 인식 성능을 향상할 수 있었다.
PDF

Gait-based Human Identification System using Eigenfeature Regularization and Extraction (고유특징 정규화 및 추출 기법을 이용한 걸음걸이 바이오 정보 기반 사용자 인식 시스템)

Lee, Byung-Yun;Hong, Sung-Jun;Lee, Hee-Sung;Kim, Eun-Tai
- Journal of the Korean Institute of Intelligent Systems
- /
- v.21 no.1
- /
- pp.6-11
- /
- 2011
In this paper, we propose a gait-based human identification system using eigenfeature regularization and extraction (ERE). First, a gait feature for human identification which is called gait energy image (GEI) is generated from walking sequences acquired from a camera sensor. In training phase, regularized transformation matrix is obtained by applying ERE to the gallery GEI dataset, and the gallery GEI dataset is projected onto the eigenspace to obtain galley features. In testing phase, the probe GEI dataset is projected onto the eigenspace created in training phase and determine the identity by using a nearest neighbor classifier. Experiments are carried out on the CASIA gait dataset A to evaluate the performance of the proposed system. Experimental results show that the proposed system is better than previous works in terms of correct classification rate.
https://doi.org/10.5391/JKIIS.2011.21.1.6 인용 PDF KSCI

Normalized Recognition Method using Characteristic Vector of Speech Signal (음성의 특징벡터를 사용한 정규화 인식수법)

Choi, Jae-Seung
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2011.10a
- /
- pp.616-618
- /
- 2011
본 논문에서는 음성의 특징벡터를 추출하여 음성인식을 위한 인식 알고리즘을 제안한다. 본 논문에서 제안하는 방법은 사람의 음성을 정규화하여 시간지연신경회로망을 사용하여 음성인식을 하는 인식 알고리즘이다. 본 논문에서는 시간지연신경회로망을 이용하여 입력되는 음성정보를 일정시간 동안 학습시킨 후에 새로이 입력되는 정보를 인식하는 수법이다. 본 실험에서는 음성인식률에 의하여 본 알고리즘의 유효성을 확인한다.
PDF

New Shot Boundary Detection Method Using Normalization (정규화를 이용한 새로운 샷 경계 검출 방법)

Shin, Seong-Yoon;Baik, Seong-Eun;Pyo, Seong-Bae;Rhee, Yang-Won
- KSCI Review
- /
- v.15 no.1
- /
- pp.197-201
- /
- 2007
비디오 분할은 샷 경계 검출이라고도 하는데, 비디오를 계층적이고 구조적인 형태로 표현하기 위하여 영상, 문자, 오디오와 같은 매체 속에 포함되어 있는 내용들을 특징별로 분석하여 계층별로 분류하는 작업을 말한다. 본 논문에서는 카메라와 객체의 모션에 보다 강건하고 보다 정확한 결과를 산출하여 충분한 공간 정보를 가지는 지역적 $X^2$-히스토그램 비교 방법을 이용하여 샷 경계를 검출한다. 또한 영상처리에서 영상의 명암 값 향상을 위하여 사용되는 로그함수와 상수를 변형하여 차이 값에 적용하는 정규화 방법을 제시한다. 그리고 샷 경계 검출 알고리즘을 제시하여 일반적인 샷과 갑작스런 샷의 특징을 기반으로 검출한다.
PDF

Keypoint Detection Using Normalized Higher-Order Scale Space Derivatives (스케일 공간 고차 미분의 정규화를 통한 특징점 검출 기법)

Park, Jongseung;Park, Unsang
- Journal of KIISE
- /
- v.42 no.1
- /
- pp.93-96
- /
- 2015
The SIFT method is well-known for robustness against various image transformations, and is widely used for image retrieval and matching. The SIFT method extracts keypoints using scale space analysis, which is different from conventional keypoint detection methods that depend only on the image space. The SIFT method has also been extended to use higher-order scale space derivatives for increasing the number of keypoints detected. Such detection of additional keypoints detected was shown to provide performance gain in image retrieval experiments. Herein, a sigma based normalization method for keypoint detection is introduced using higher-order scale space derivatives.
https://doi.org/10.5626/JOK.2015.42.1.93 인용 KSCI

Nonlinear Shape Normalization Algorithms for Gray-Scale Handwritten Hangul Images (명도 한글 글씨 영상에서의 비선형 형태 정규화 알고리즘)

Kim, Sang-Yup;Kim, Dae-In;Lee, Seong-Whan
- Annual Conference on Human and Language Technology
- /
- 1996.10a
- /
- pp.98-104
- /
- 1996
일반적으로 비선형 형태 정규화 과정은 필기체 문자에서 발생하는 형태 변형을 보상하기 위하여 사용되며, 현재까지 이진 영상에 대한 비선형 형태 정규화 방법들이 제안되었다. 그러나 현존하는 대부분의 문자 인식 시스템은 스캐너를 통하여 입력된 명도 문자영상을 이진화하여 사용하고 있기 때문에 이진화로 인해 야기되는 물자 영상에 대한 정보 유실 및 잡영 첨가 현상이 비선형 형태 정규화 과정에 누적되어 결과적으로 좋은 특징 추출 결과를 기대하기 어려운 실정이다. 본 연구에서는 이진화에 의한 정보의 손실을 최소화시키고, 필기체 문자에서 발생하는 다양한 형태 변형을 효과적으로 보상할 수 있는 명도 영상에서의 비선형 형태 정규화 방법을 제안한다. 제안된 명도 영상에서의 비선형 형태 정규화 방법들의 성능을 객관적으로 검증하기 위하여 처리 시간 및 복잡도 등을 기준으로 평가하였으며, 다양한 명도 한글 글씨 데이터에 대한 실험을 통하여 이진 영상에서의 비선형 형태 정규화 방법에 비해 제안된 방법이 변형이 심한 한글 글씨 데이타의 품질을 개선하는데 있어서 매우 효율적임을 확인할 수 있었다.
PDF

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

Kim Dong-Hyun;Hong Kwang-Seok
- Journal of Internet Computing and Services
- /
- v.4 no.3
- /
- pp.9-14
- /
- 2003
The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.
PDF

Facial Feature Extraction using Nasal Masks from 3D Face Image (코 형상 마스크를 이용한 3차원 얼굴 영상의 특징 추출)

김익동;심재창
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.41 no.4
- /
- pp.1-7
- /
- 2004
This paper proposes a new method for facial feature extraction, and the method could be used to normalize face images for 3D face recognition. 3D images are much less sensitive than intensity images at a source of illumination, so it is possible to recognize people individually. But input face images may have variable poses such as rotating, Panning, and tilting. If these variances ire not considered, incorrect features could be extracted. And then, face recognition system result in bad matching. So it is necessary to normalize an input image in size and orientation. It is general to use geometrical facial features such as nose, eyes, and mouth in face image normalization steps. In particular, nose is the most prominent feature in 3D face image. So this paper describes a nose feature extraction method using 3D nasal masks that are similar to real nasal shape.
PDF KSCI

Search Result 357, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)