통합 검색 | Korea Science

A Study on the Impact of Speech Data Quality on Speech Recognition Models

Yeong-Jin Kim;Hyun-Jong Cha;Ah Reum Kang
- 한국컴퓨터정보학회논문지
- /
- 제29권1호
- /
- pp.41-49
- /
- 2024
현재 음성인식 기술은 꾸준히 발전하고 다양한 분야에서 널리 사용되고 있다. 본 연구에서는 음성 데이터 품질이 음성인식 모델에 미치는 영향을 알아보기 위해 데이터셋을 전체 데이터셋과 SNR 상위 70%의 데이터셋으로 나눈 후 Seamless M4T와 Google Cloud Speech-to-Text를 이용하여 각 모델의 텍스트 변환 결과를 확인하고 Levenshtein Distance를 사용하여 평가하였다. 실험 결과에서 Seamless M4T는 높은 SNR(신호 대 잡음비)을 가진 데이터를 사용한 모델에서 점수가 13.6으로 전체 데이터셋의 점수인 16.6보다 더 낮게 나왔다. 그러나 Google Cloud Speech-to-Text는 전체 데이터셋에서 8.3으로 높은 SNR을 가진 데이터보다 더 낮은 점수가 나왔다. 이는 새로운 음성인식 모델을 훈련할 때 SNR이 높은 데이터를 사용하는 것이 영향이 있다고 할 수 있으며, Levenshtein Distance 알고리즘이 음성인식 모델을 평가하기 위한 지표 중 하나로 쓰일 수 있음을 나타낸다.
https://doi.org/10.9708/jksci.2024.29.01.041 인용 PDF HTML

방송 축구 영상으로부터 3차원 애니메이션 변환을 위한 축구 선수 동작 인식 (Pose Recognition of Soccer Players for Three Dimensional Animation)

장원철;남시욱;김재희
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 추계종합학술대회 논문집(4)
- /
- pp.33-36
- /
- 2000
To create a more realistic soccer game derived from TV images, we are developing an image synthesis system that generates 3D image sequence from TV images. We propose the method for the team and the pose recognition of players in TV images. The representation includes camera calibration method, team recognition method and pose recognition method. To find the location of a player on the field, a field model is constructed and a player's field position is transformed by 4-feature points. To recognize the team information of players, we compute RGB mean values and standard deviations of a player in TV images. Finally, to recognize pose of a player, this system computes the velocity and the ratio of player(height/width). Experimental results are included to evaluate the performance of the team and the pose recognition.
PDF

BRISK 기반의 눈 영상을 이용한 사람 인식 (Person Recognition using Ocular Image based on BRISK)

김민기
- 한국멀티미디어학회논문지
- /
- 제19권5호
- /
- pp.881-889
- /
- 2016
Ocular region recently emerged as a new biometric trait for overcoming the limitations of iris recognition performance at the situation that cannot expect high user cooperation, because the acquisition of an ocular image does not require high user cooperation and close capture unlike an iris image. This study proposes a new method for ocular image recognition based on BRISK (binary robust invariant scalable keypoints). It uses the distance ratio of the two nearest neighbors to improve the accuracy of the detection of corresponding keypoint pairs, and it also uses geometric constraint for eliminating incorrect keypoint pairs. Experiments for evaluating the validity the proposed method were performed on MMU public database. The person recognition rate on left and right ocular image datasets showed 91.1% and 90.6% respectively. The performance represents about 5% higher accuracy than the SIFT-based method which has been widely used in a biometric field.
https://doi.org/10.9717/kmms.2016.19.5.881 인용 PDF KSCI KPUBS HTML

잡음하의 음성인식을 위한 스펙트럴 보상과 주파수 가중 HMM (A Frequency Weighted HMM with Spectral Compensation for Noisy Speech Recognition)

이광석
- 한국정보통신학회논문지
- /
- 제5권3호
- /
- pp.443-449
- /
- 2001
잡음환경에서의 음성인식은 실제의 환경에서의 음성인식에서 매우 중요한 애로기술로써 이를 해결하기 위한 연구는 꾸준히 연구되고 있다. 따라서 본 연구는 음성인식분야에서 가장 많이 사용하고 있는 HMM처리 시잡음처리의 문제점을 주파수 가중치 부가 HMM으로 해결하는 방법을 제안하고 그 성능을 인식실험을 통하여 검토하였다. 그 결과 SS처리를 함께 사용하는 $MCE-\mu$, MCE-$\rho$가 가장 잡음에 강한 방식임을 알 수 있었다.
PDF

Face Recognition by Using FP-ICA Based on Secant Method

Cho, Yong-Hyun
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- 제5권2호
- /
- pp.131-135
- /
- 2005
This paper proposes an efficient face recognition using independent component analysis(ICA) derived from the fixed point(FP) algorithm based on secant method. The secant method can exclude the complex computation of differential process from the FP based on Newton method. The proposed ICA has been applied to recognize the 20 Yale face images of $324\times324$ pixels. The experimental results show that the proposed ICA is superior to PCA not only in the restoration performance of basis images but also in the recognition performance of the trained images and the test images. Then negative angle as similarity measures has better recognition ratio than city-block and Euclidean.
https://doi.org/10.5391/IJFIS.2005.5.2.131 인용 PDF KSCI

컴퓨터 비젼 시스템에 의한 인쇄악보의 인식과 연주 (The recognition of Printed Music Score and Performance Using Computer Vision system)

이명우;최종수
- 대한전자공학회논문지
- /
- 제22권5호
- /
- pp.10-16
- /
- 1985
본 논문에서는 인쇄 양보 서상을 CCTV 카메라로써 마이크로 컴퓨터에 입력시켜, 이 화상을 인식, 스피커로 노래를 내어주는 컴퓨터 비젼 시스템에 관해 논하고 있다. 이때 내보서조의 특징추출 및 인식에는 가산투영법이 적용되구 그 대상 인식 범위는 내보의 여러 요소 중에서 오연 마디, 음표로 하고 있다. 아울러 분제 내보화징을 취급할 때 반드시 고려되어야 할 전처리 및 잡음 제거 과정을 보였고, 인식된 음표로 화음을 내민주는 간단한 하드웨어 시스템을 구성했다. 그 결과 보호한 인식률로 연주 가능함을 보였다.
PDF

Recognition of Car License Plate using Kohonen Algorithm

Lim, Eun-Kyoung;Yang, Hwang-Kyu;Kwang Baek kim
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 ITC-CSCC -2
- /
- pp.785-788
- /
- 2000
The recognition system of a car plate is largely classified as the extraction and recognition of number plate. In this paper, we extract the number plate domain by using a thresholding method as a preprocess step. The computation of the density in a given mask provides a clue of a candidate domain whose density ratio corresponds to the properties of the number plate obtained in the best condition. The contour of the number plate for the recognition of the texts of number plate is extracted by operating Kohonen Algorithm in a localized region. The algorithm reduces noises around the contour. The recognition system with the density computation and Kohonen Algorithm shows a high performance in the real system in connection with a car number plate.
PDF

Optimised ML-based System Model for Adult-Child Actions Recognition

Alhammami, Muhammad;Hammami, Samir Marwan;Ooi, Chee-Pun;Tan, Wooi-Haw
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제13권2호
- /
- pp.929-944
- /
- 2019
Many critical applications require accurate real-time human action recognition. However, there are many hurdles associated with capturing and pre-processing image data, calculating features, and classification because they consume significant resources for both storage and computation. To circumvent these hurdles, this paper presents a recognition machine learning (ML) based system model which uses reduced data structure features by projecting real 3D skeleton modality on virtual 2D space. The MMU VAAC dataset is used to test the proposed ML model. The results show a high accuracy rate of 97.88% which is only slightly lower than the accuracy when using the original 3D modality-based features but with a 75% reduction ratio from using RGB modality. These results motivate implementing the proposed recognition model on an embedded system platform in the future.
https://doi.org/10.3837/tiis.2019.02.024 인용 PDF KSCI HTML

얼굴 특징점 추적을 통한 사용자 감성 인식 (Emotion Recognition based on Tracking Facial Keypoints)

이용환;김흥준
- 반도체디스플레이기술학회지
- /
- 제18권1호
- /
- pp.97-101
- /
- 2019
Understanding and classification of the human's emotion play an important tasks in interacting with human and machine communication systems. This paper proposes a novel emotion recognition method by extracting facial keypoints, which is able to understand and classify the human emotion, using active Appearance Model and the proposed classification model of the facial features. The existing appearance model scheme takes an expression of variations, which is calculated by the proposed classification model according to the change of human facial expression. The proposed method classifies four basic emotions (normal, happy, sad and angry). To evaluate the performance of the proposed method, we assess the ratio of success with common datasets, and we achieve the best 93% accuracy, average 82.2% in facial emotion recognition. The results show that the proposed method effectively performed well over the emotion recognition, compared to the existing schemes.
PDF KSCI

다중퍼셉트론을 이용한 자동차 번호판의 최적 입출력 노드의 비율 결정에 관한 연구 (Recognition of characters on car number plate and best recognition ratio among their layers using Multi-layer Perceptron)

이의철;이왕헌
- 한국전자통신학회논문지
- /
- 제11권1호
- /
- pp.73-80
- /
- 2016
자동차 번호판 인식은 뺑소니차량의 추적이나 교통량의 측정, 교통사고의 조사 및 차량의 증가에 따른 차량범죄의 추적에 이용되고 있다. 실제 적용되는 교통 환경에서는 눈이나 비 그리고 주야간의 조명 변화에 따라서 입력되는 영상에 외란의 영향을 받기 쉬우며, 또한 영상을 촬영하는 순간의 차량의 직진방향과 카메라가 보는 방향에 따라서 동일한 번호판에 대해서도 기하학적으로 변형된 영상이 입력되게 된다. 본 연구에서는 이러한 카메라를 이용한 번호판 인식 환경의 문제를 해결하는 방법으로 호모그래피를 이용하여 기하학적으로 변형된 영상을 원래의 영상으로 변환하는 방법과 투영 히스토그램을 이용한 문자의 분리 방법을 제안하였다. 분리된 영상은 다중 퍼셉트론방법을 이용하여 문자와 숫자를 인식하였고 특히 최적한 입력, 은닉, 출력 층의 비율을 실험을 통하여 도출 하였다.
https://doi.org/10.13067/JKIECS.2016.11.1.73 인용 PDF KSCI

검색결과 622건 처리시간 0.024초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)