Search | Korea Science

GMM based Speaker Identification using Pitch Information (피치 정보를 이용한 GMM 기반의 화자 식별)

Park Taesun;Hahn Minsoo
- MALSORI
- /
- no.47
- /
- pp.121-129
- /
- 2003
This paper describes the use of pitch information for speaker identification. The recognition system is a GMM based one with 4 connected Korean digits speech database. The mean of the pitch period in voiced sections of speech are shown to be ,useful at discriminating between speakers. Utilizing this feature with Gaussian mixture model in the speaker identification system gave a marked improvement, maximum 6% improvement comparing to the baseline Gaussian mixture model.
PDF

Speaker-Independent Isolated Word Recognition Using A Modified ISODATA Method (Modified ISODATA 방법을 이용한 불특정화자 단독어 인식)

Hwang, U-Geun;An, Tae-Ok;Lee, Hyeong-Jun
- The Journal of the Acoustical Society of Korea
- /
- v.6 no.4
- /
- pp.31-43
- /
- 1987
As a study on Speaker-Independent Isolated Word Recognition, a Modified ISODATA clustering method is proposed. This method simplifies the outlier processing and the splitting procedure in conventional ISODATA algorithm, and eliminates the lumping procedure. Through this method, we could find cluster centers precisely and automatically. When this method applied to 11 digits by 10 males and 4 females, its recognition rates of $84.42\%$ for K=4 were better than those of the latest Modified K-means, $82.5\%$. Judging from these results, we proved this method the best method in finding cluster centers precisely.
PDF

Recognition of Handwritten Digits Based on Neural Network and Fuzzy Inference (신경회로망과 퍼지 추론에 의한 필기체 숫자 인식)

Ko, Chang-Ryong
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.10
- /
- pp.63-71
- /
- 2011
We present a method to modify the recognition of neural networks by the fuzzy inference in a handwritten digit recognition with large deformations, and we verified the method by the experiment. The neural networks take long time in learning and recognize 100% on the learning pattern. But the neural networks don't show a good recognition on the testing pattern. So, we apply the modified method as the fuzzy inference. As a result, the recognition and false recognition of neural networks was improved 90.2% and 9.8% respectively at 89.6% and 10.4% initially. This approach decreased especially the false recognition on digit 3, 5. We used the density of digit to extract the fuzzy membership function in this experiment. But, because the handwritten digit have varified input patterns, we will get a better recognition by extracting varifed characteristics and applying the composite fuzzy inference. We also propose the application of fuzzy inference on matching the input pattern, than applying strictly the fuzzy inference.
https://doi.org/10.9708/jksci.2011.16.10.063 인용 PDF KSCI

Line-Segment Feature Analysis Algorithm for Handwritten-Digits Data Reduction (필기체 숫자 데이터 차원 감소를 위한 선분 특징 분석 알고리즘)

Kim, Chang-Min;Lee, Woo-Beom
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.4
- /
- pp.125-132
- /
- 2021
As the layers of artificial neural network deepens, and the dimension of data used as an input increases, there is a problem of high arithmetic operation requiring a lot of arithmetic operation at a high speed in the learning and recognition of the neural network (NN). Thus, this study proposes a data dimensionality reduction method to reduce the dimension of the input data in the NN. The proposed Line-segment Feature Analysis (LFA) algorithm applies a gradient-based edge detection algorithm using median filters to analyze the line-segment features of the objects existing in an image. Concerning the extracted edge image, the eigenvalues corresponding to eight kinds of line-segment are calculated, using 3×3 or 5×5-sized detection filters consisting of the coefficient values, including [0, 1, 2, 4, 8, 16, 32, 64, and 128]. Two one-dimensional 256-sized data are produced, accumulating the same response values from the eigenvalue calculated with each detection filter, and the two data elements are added up. Two LFA256 data are merged to produce 512-sized LAF512 data. For the performance evaluation of the proposed LFA algorithm to reduce the data dimension for the recognition of handwritten numbers, as a result of a comparative experiment, using the PCA technique and AlexNet model, LFA256 and LFA512 showed a recognition performance respectively of 98.7% and 99%.
https://doi.org/10.3745/KTSDE.2021.10.4.125 인용 PDF KSCI

A study on the Speaker Recognition using the Pitch (피치계수를 이용한 화자인식에 관한 연구)

김에녹
- Journal of the Korea Computer Industry Society
- /
- v.2 no.4
- /
- pp.471-480
- /
- 2001
In this thesis, we perform the experiment of speaker recognition by identifying vowels in the pronunciation of each speaker using Adaptive Resource Theory 2(ART2) model. The 5 adult males and 5 adult females pronounce from 0 to 9 digits. We extract the vowels from the pronunciation of each speaker first, we are extracted characteristic coefficient through a pitch detection algorithm, a LPC analysis, and a LPC cepstral analysis to generate an input pattern of ART2. The experimental results showed that pitch coefficients are somewhat more enhanced than LPC or LPC cepstral coefficient.
PDF

A Study on Spoken Digits Analysis and Recognition (숫자음 분석과 인식에 관한 연구)

김득수;황철준
- Journal of Korea Society of Industrial Information Systems
- /
- v.6 no.3
- /
- pp.107-114
- /
- 2001
This paper describes Connected Digit Recognition with Considering Acoustic Feature in Korea. The recognition rate of connected digit is usually lower than word recognition. Therefore, speech feature parameter and acoustic feature are employed to make robust model for digit, and we could confirm the effect of Considering. Acoustic Feature throughout the experience of recognition. We used KLE 4 connected digit as database and 19 continuous distributed HMM as PLUs(Phoneme Like Units) using phonetical rules. For recognition experience, we have tested two cases. The first case, we used usual method like using Mel-Cepstrum and Regressive Coefficient for constructing phoneme model. The second case, we used expanded feature parameter and acoustic feature for constructing phoneme model. In both case, we employed OPDP(One Pass Dynamic Programming) and FSA(Finite State Automata) for recognition tests. When appling FSN for recognition, we applied various acoustic features. As the result, we could get 55.4% recognition rate for Mel-Cepstrum, and 67.4% for Mel-Cepstrum and Regressive Coefficient. Also, we could get 74.3% recognition rate for expanded feature parameter, and 75.4% for applying acoustic feature. Since, the case of applying acoustic feature got better result than former method, we could make certain that suggested method is effective for connected digit recognition in korean.
PDF

Isolated Digit and Command Recognition in Car Environment (자동차 환경에서의 단독 숫자음 및 명령어 인식)

양태영;신원호;김지성;안동순;이충용;윤대희;차일환
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.2
- /
- pp.11-17
- /
- 1999
This paper proposes an observation probability smoothing technique for the robustness of a discrete hidden Markov(DHMM) model based speech recognizer. Also, an appropriate noise robust processing in car environment is suggested from experimental results. The noisy speech is often mislabeled during the vector quantization process. To reduce the effects of such mislabelings, the proposed technique increases the observation probability of similar codewords. For the noise robust processing in car environment, the liftering on the distance measure of feature vectors, the high pass filtering, and the spectral subtraction methods are examined. Recognition experiments on the 14-isolated words consists of the Korean digits and command words were performed. The database was recorded in a stopping car and a running car environments. The recognition rates of the baseline recognizer were 97.4% in a stopping situation and 59.1% in a running situation. Using the proposed observation probability smoothing technique, the liftering, the high pass filtering, and the spectral subtraction the recognition rates were enhanced to 98.3% in a stopping situation and to 88.6% in a running situation.
PDF

Study on Relationship Between Spatial-Perceptual Ability and Driving-Related Situation Awareness (공간지각 능력에 따른 운전-관련 상황의 재인 및 예측에 관한 연구)

Bia Kim ;Jaesik Lee
- Korean Journal of Culture and Social Issue
- /
- v.11 no.4
- /
- pp.83-95
- /
- 2005
The purpose of the present study was to investigate the relationship between spatial-erceptual ability and several aspects of driving-related situation awareness(in particular, recognition and prediction). Video clips of real driving were used in both recognition and prediction tasks, and the digit calculation task during driving the simulator was required as the integration task of recognition and prediction. The results showed that the subjects of higher spatial-perceptual ability performed better in recognition task, especially in terms of sensitivity measured in d'(as signal detection theory), prediction task, and digits calculation performance than those of lower spatial-perceptual ability.

Speech Data Collection for korean Speech Recognition (한국어 음성인식을 위한 음성 데이터 수집)

Park, Jong-Ryeal;Kwon, Oh-Wook;Kim, Do-Yeong;Choi, In-Jeong;Jeong, Ho-Young;Un, Chong-Kwan
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.4
- /
- pp.74-81
- /
- 1995
This paper describes the development of speech databases for the Korean language which were constructed at Communications Research Laboratory in KAIST. The procedure and environment to construct the speech database are presented in detail, and the phonetic and linguistic properties of the databases are presented. the databases were intended for use in designing and evaluating speech recognition algorithms. The databases consist of five different sets of speech contents : trade-related continuous speech with 3,000 words, variable-length connected digits, phoneme-balanced 75 isolated words, 500 isolated Korean provincial names, and Korean A-set words.
PDF

Recognition of isolated digits using Predictive RBF Network (Predictive RBFN을 이용한 단독 숫자음 인식)

Han Hag-Yong;Kim Sang-Berm;Kim Joo-Sung;Kim Soo-Hoon;Hur Kang-In
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.71-76
- /
- 1999
본 논문에서 제안한 예측형 RBFN(Radial Basis Function Network)은 HMM과 신경망을 결합한 하이브리드 구조이다. 이 신경망은 HMM으로 추정한 확률분포 파라미터를 사용하여 중간층의 활성화 함수의 출력을 결정하고, 중간층과 출력층의 연결강도만 네트워크 내에서 학습한다. 그리고 HMM으로 추정한 확률분포 파라미터는 두 가지 방법으로 예측형 RBFN에 이용하였다. 첫 번째는 HMM의 각 상태의 혼합수 만큼의 중간층 유니트를 주는 방법이고, 두 번째는 HMM의 혼합수$\times$출력분포수 만큼의 중간층 유니트를 주는 방법이다. 실험결과, 예측형 RBFN은 다른 방법들의 결과보다 $4.5\~6.5\%$ 저하된 결과를 보였지만 다른 신경망에 비해서 학습 반복 횟수를 작게할 수 있었으며 전체 학습시간을 대폭 단축할 수 있었다.
PDF

Search Result 38, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)