• Title/Summary/Keyword: 성도 길이 정규화

Search Result 39, Processing Time 0.035 seconds

A Study on Speaker Normalization using VTN (VTN을 이용한 화자 정규화에 관한 연구)

  • 손창희;손종목;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.499-502
    • /
    • 2001
  • 본 연구에서는 화자에 따라 서로 다른 성도의 길이에 의해 발생하는 음성인식 시스템의 성능 저하를 줄이기 위하여, VTN(Vocal Tract Normalization)을 음성인식 시스템에 적용하고, 주소 인식 실험을 통하여 인식 성능을 평가하였다. 또, VTN을 CMN과 동시에 적용하여 인식 실험을 하였다. 실험에서는 화자간 성도길이의 차이를 반영하기 위하여 13개의 Warping 계수에 대해 필터 뱅크를 이용한 선형 Warping 방법을 적용하였다. 실험결과, Baseline 인식 시스템에 비하여 VTN을 적용하면, WER(Word Error Rate)이 1.24% 감소하였고, CMN과 VTN을 동시에 적용한 실험에서는 Baseline 인식 시스템과 비교하여 WER이 0.33% 감소 하였지만 VTN을 적용한 실험결과와 비교하면 오히려 0.91% 증가하였다.

  • PDF

Improving the Effectiveness of Information Retrieval Using Data Fusion Method in the Vector and Neural Network Model (벡터와 신경망 모델에서 데이터 퓨전 기법을 이용한 정보검색의 효율성 향상)

  • 최성환
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2001.08a
    • /
    • pp.137-142
    • /
    • 2001
  • 본 논문에서는 벡터모델과 신경망 모델을 이용하여 데이터 퓨전의 관점에서 다중증거로서 가중치, 문헌분리가, 엔트로피, 공기유사도를 적절히 결합하여 질의를 확장하는 방법을 제안한다. 실험결과 코사인 정규화 가중치 알고리즘, 문서길이 정규화 가중치 알고리즘과 결합하여 질의를 확장하는 것이 정규화시키지 않고 단순히 문헌빈도와 역문헌빈도의 조합을 이용한 가중치 알고리즘과 결합했을 때 보다 평균 정확률 향상이 더 높게 나타났다. 또한 다양한 공기기반 유사도를 이용하여 질의확장을 한 결과 벡터모델과 신경망 모델에서 코사인 공기유사도에 기반하여 질의확장한 경우가 다른 공기유사도에 비해 더 좋은 성능을 보였다.

  • PDF

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.299-307
    • /
    • 2003
  • There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.

An analysis of emotional English utterances using the prosodic distance between emotional and neutral utterances (영어 감정발화와 중립발화 간의 운율거리를 이용한 감정발화 분석)

  • Yi, So-Pae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.25-32
    • /
    • 2020
  • An analysis of emotional English utterances with 7 emotions (calm, happy, sad, angry, fearful, disgust, surprised) was conducted using the measurement of prosodic distance between 672 emotional and 48 neutral utterances. Applying the technique proposed in the automatic evaluation model of English pronunciation to the present study on emotional utterances, Euclidean distance measurement of 3 prosodic elements such as F0, intensity and duration extracted from emotional and neutral utterances was utilized. This paper, furthermore, extended the analytical methods to include Euclidean distance normalization, z-score and z-score normalization resulting in 4 groups of measurement schemes (sqrF0, sqrINT, sqrDUR; norsqrF0, norsqrINT, norsqrDUR; sqrzF0, sqrzINT, sqrzDUR; norsqrzF0, norsqrzINT, norsqrzDUR). All of the results from perceptual analysis and acoustical analysis of emotional utteances consistently indicated the greater effectiveness of norsqrF0, norsqrINT and norsqrDUR, among 4 groups of measurement schemes, which normalized the Euclidean measurement. The greatest acoustical change of prosodic information influenced by emotion was shown in the values of F0 followed by duration and intensity in descending order according to the effect size based on the estimation of distance between emotional utterances and neutral counterparts. Tukey Post Hoc test revealed 4 homogeneous subsets (calm

Contactless Biometric Using Thumb Image (엄지손가락 영상을 이용한 비접촉식 바이오인식)

  • Lim, Naeun;Han, Jae Hyun;Lee, Eui Chul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.12
    • /
    • pp.671-676
    • /
    • 2016
  • Recently, according to the limelight of Fintech, simple payment using biometric at smartphone is widely used. In this paper, we propose a new contactless biometric method using thumb image without additional sensors unlike previous biometrics such as fingerprint, iris, and vein recognition. In our method, length, width, and skin texture information are used as features. For that, illumination normalization, skin region segmentation, size normalization and alignment procedures are sequentially performed from the captured thumb image. Then, correlation coefficient is calculated for similarity measurement. To analyze recognition accuracy, genuine and imposter matchings are performed. At result, we confirmed the FAR of 1.68% at the FRR of 1.55%. In here, because the distribution of imposter matching is almost normal distribution, our method has the advantage of low FAR. That is, because 0% FAR can be achieved at the FRR of 15%, the proposed method is enough to 1:1 matching for payment verification.

Financial Market Prediction and Improving the Performance Based on Large-scale Exogenous Variables and Deep Neural Networks (대규모 외생 변수 및 Deep Neural Network 기반 금융 시장 예측 및 성능 향상)

  • Cheon, Sung Gil;Lee, Ju Hong;Choi, Bum Ghi;Song, Jae Won
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.26-35
    • /
    • 2020
  • Attempts to predict future stock prices have been studied steadily since the past. However, unlike general time-series data, financial time-series data has various obstacles to making predictions such as non-stationarity, long-term dependence, and non-linearity. In addition, variables of a wide range of data have limitations in the selection by humans, and the model should be able to automatically extract variables well. In this paper, we propose a 'sliding time step normalization' method that can normalize non-stationary data and LSTM autoencoder to compress variables from all variables. and 'moving transfer learning', which divides periods and performs transfer learning. In addition, the experiment shows that the performance is superior when using as many variables as possible through the neural network rather than using only 100 major financial variables and by using 'sliding time step normalization' to normalize the non-stationarity of data in all sections, it is shown to be effective in improving performance. 'moving transfer learning' shows that it is effective in improving the performance in long test intervals by evaluating the performance of the model and performing transfer learning in the test interval for each step.

High Speed Wind Tunnel Test on the Aerodynamic Load Characteristics of Rocket Nozzle (로켓 노즐 공력하중 특성에 대한 고속 풍동시험)

  • Ra, Seung-Ho;Ok, Ho-Nam;Kim, In-Sun;Choi, Seong-Wook
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.32 no.9
    • /
    • pp.35-40
    • /
    • 2004
  • The high-speed wind tunnel test of rocket model was performed to investigate the effect of skirt configuration on aerodynamic load characteristics of nozzle. Test parameters were the length and diffusing angle of skirt. Test results showed that the gimbals actuator power could be reduced to 1/10 of that without skirt. The normalized test result was proposed to be used as database for skirt design.

BENT함수와 bent 수열을 중심으로 본 상관성이 우수한 수열군

  • 정하봉
    • Review of KIISC
    • /
    • v.2 no.3
    • /
    • pp.41-49
    • /
    • 1992
  • 레이다 시스템, ranging 시스템, 확산 대역 통신 (spread spectrum communication) 시스템, 그리고 요즈음 각광받고 있는 코드분할방식 다중통신 (CDMA com-munication) 시스템에서는 주지하다시피 상관성(correlation) 이 좋은 수열(sequence)들위 사용이 필수 불가결하다. 수열의 상관성은 그 수열 자신의 상관성이냐 다른 수열간의 상관성이냐에 따라 자기상관관계(crosscorrelation)로 나누어 생각할 수 있고 수열의 주기성의 유무에 따라 주기적 상관관계(Periodic correlation)와 비주기적 상관관계(aperiodic correla-tion)로 나누어 볼 수 있다. 여기서 수열의 상관성이 좋다는 말은 정규화된 수열의 자기상관계수(autocorrelation coefficient)와 수열 간의 교차상관 계수(crosscorrelation coefficient)의 최대 크기가 수열의 길이에 비해 상대적으로 작은 값을 갖는다는 것을 의미한다. 본 논문에서는 주기성을 갖는 이진수열군의 하나인 bent수열과 이 bent수열을 구성하는데 기본이 되는 bent함수를 중심으로 주기적 상관성이 우수한 여러 수열군에 대해 알아보고자 한다.

  • PDF

Two-Dimensional Shape Description of Objects using The Contour Fluctuation Ratio (윤곽선 변동율을 이용한 물체의 2차원 형태 기술)

  • 김민기
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.2
    • /
    • pp.158-166
    • /
    • 2002
  • In this paper, we proposed a contour shape description method which use the CFR(contour fluctuation ratio) feature. The CFR is the ratio of the line length to the curve length of a contour segment. The line length means the distance of two end points on a contour segment, and the curve length means the sum of distance of all adjacent two points on a contour segment. We should acquire rotation and scale invariant contour segments because each CFR is computed from contour segments. By using the interleaved contour segment of which length is proportion to the entire contour length and which is generated from all the points on contour, we could acquire rotation and scale invariant contour segments. The CFR can describes the local or global feature of contour shape according to the unit length of contour segment. Therefore we describe the shape of objects with the feature vector which represents the distribution of CFRs, and calculate the similarity by comparing the feature vector of corresponding unit length segments. We implemented the proposed method and experimented with rotated and scaled 165 fish images of fifteen types. The experimental result shows that the proposed method is not only invariant to rotation and scale but also superior to NCCH and TRP method in the clustering power.

  • PDF

Korean isolated word recognizer using new time alignment method of speech signal (새로운 시간축 정규화 방법을 이용한 한국어 고립단어 인식기)

  • Nam, Myeong-U;Park, Gyu-Hong;No, Seung-Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.5
    • /
    • pp.567-575
    • /
    • 2001
  • This paper suggests new method to get fixed size parameter from different length of voice signals. The efficiency of speech recognizer is determined by how to compare the similarity(distance of each pattern) of the parameter from voice signal. But the variation of voice signal and the difference of speech speed make it difficult to extract the fixed size parameter from the voice signal. The method suggested in this paper is to normalize the parameter at fixed size by using the 2 dimension DCT(Discrete Cosine Transform) after representing the parameter by spectrogram. To prove validity of the suggested method, parameter extracted from 32 auditory filter-bank(it estimates auditory nerve firing probabilities) is used for the input of neural network after being processed by 2 dimension DCT. And to compare with conventional methods, we used one of conventional methods which solve time alignment problem. The result shows more efficient performance and faster recognition speed in the speaker dependent and independent isolated word recognition than conventional method.

  • PDF