• 제목/요약/키워드: Recognition and Performance

검색결과 3,800건 처리시간 0.032초

음성감정인식 성능 향상을 위한 트랜스포머 기반 전이학습 및 다중작업학습 (Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition)

  • 박순찬;김형순
    • 한국음향학회지
    • /
    • 제40권5호
    • /
    • pp.515-522
    • /
    • 2021
  • 음성감정인식을 위한 훈련 데이터는 감정 레이블링의 어려움으로 인해 충분히 확보하기 어렵다. 본 논문에서는 음성감정인식의 성능 개선을 위해 트랜스포머 기반 모델에 대규모 음성인식용 훈련 데이터를 통한 전이학습을 적용한다. 또한 음성인식과의 다중작업학습을 통해 별도의 디코딩 없이 문맥 정보를 활용하는 방법을 제안한다. IEMOCAP 데이터 셋을 이용한 음성감정인식 실험을 통해, 가중정확도 70.6 % 및 비가중정확도 71.6 %를 달성하여, 제안된 방법이 음성감정인식 성능 향상에 효과가 있음을 보여준다.

Selecting Good Speech Features for Recognition

  • Lee, Young-Jik;Hwang, Kyu-Woong
    • ETRI Journal
    • /
    • 제18권1호
    • /
    • pp.29-41
    • /
    • 1996
  • This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.

  • PDF

부가 주성분분석을 이용한 미지의 환경에서의 화자식별 (Speaker Identification Using Augmented PCA in Unknown Environments)

  • 유하진
    • 대한음성학회지:말소리
    • /
    • 제54호
    • /
    • pp.73-83
    • /
    • 2005
  • The goal of our research is to build a text-independent speaker identification system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severely degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(principal component analysis) can improve the performance in the situation. We also propose an augmented PCA process, which augments class discriminative information to the original feature vectors before PCA transformation and selects the best direction for each pair of highly confusable speakers. The proposed method reduced the relative recognition error by 21%.

  • PDF

업샘플링을 통한 바코드 이미지 인식 성능 개선 (An Improved Recognition Technique for Bar Code Images Using Upsampling)

  • 안희준;도딴뚜안
    • 한국통신학회논문지
    • /
    • 제41권8호
    • /
    • pp.911-913
    • /
    • 2016
  • 최근 이미지기반 바코드 인식 시스템의 활용도가 커지고 있으나, 촬영된 바코드영역의 유효해상도가 낮은 경우 인식률이 현저하게 저하된다. 본 논문에서는 낮은 유효해상도에서도 인식률을 향상시킬 수 있는 업샘플링을 통한 부화소-레벨 동기화 방법을 제안한다. 표준 ITF-18 포맷에 대한 실험결과 VGA ($640{\times}480$)급, CIF ($320{\times}240$)인 영상에서 기존방식과 비교하여 각각 66%, 100%의 인식률 증가를 확인 하였다.

백본 네트워크에 따른 사람 속성 검출 모델의 성능 변화 분석 (Analyzing DNN Model Performance Depending on Backbone Network )

  • 박천수
    • 반도체디스플레이기술학회지
    • /
    • 제22권2호
    • /
    • pp.128-132
    • /
    • 2023
  • Recently, with the development of deep learning technology, research on pedestrian attribute recognition technology using deep neural networks has been actively conducted. Existing pedestrian attribute recognition techniques can be obtained in such a way as global-based, regional-area-based, visual attention-based, sequential prediction-based, and newly designed loss function-based, depending on how pedestrian attributes are detected. It is known that the performance of these pedestrian attribute recognition technologies varies greatly depending on the type of backbone network that constitutes the deep neural networks model. Therefore, in this paper, several backbone networks are applied to the baseline pedestrian attribute recognition model and the performance changes of the model are analyzed. In this paper, the analysis is conducted using Resnet34, Resnet50, Resnet101, Swin-tiny, and Swinv2-tiny, which are representative backbone networks used in the fields of image classification, object detection, etc. Furthermore, this paper analyzes the change in time complexity when inferencing each backbone network using a CPU and a GPU.

  • PDF

On Effective Dual-Channel Noise Reduction for Speech Recognition in Car Environment

  • Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권1호
    • /
    • pp.43-52
    • /
    • 2004
  • This paper concerns an effective dual-channel noise reduction method to increase the performance of speech recognition in a car environment. While various single channel methods have already been developed and dual-channel methods have been studied somewhat, their effectiveness in real environments, such as in cars, has not yet been formally proven in terms of achieving acceptable performance level. Our aim is to remedy the low performance of the single and dual-channel noise reduction methods. This paper proposes an effective dual-channel noise reduction method based on a high-pass filter and front-end processing of the eigendecomposition method. We experimented with a real multi-channel car database and compared the results with respect to the microphones arrangements. From the analysis and results, we show that the enhanced eigendecomposition method combined with high-pass filter indeed significantly improve the speech recognition performance under a dual-channel environment.

  • PDF

강원도 중소기업 품질경영 운영 방안 사례 (A study on Quality Management in Small and Medium Enterprises)

  • 박노국
    • 대한안전경영과학회지
    • /
    • 제8권1호
    • /
    • pp.131-144
    • /
    • 2006
  • Quality system management adapted by small and medium enterprises in Kangwon province to enhance the competitiveness was studied. Variance analysis on several questionnaire answers was performed. Motives for acquiring the accreditation, such as product export, adjustment to international trend, enhancement of brand/product recognition, CEO's mind change, and management innovation, have been changed significantly among business types. Mind changes after the accreditations were setting company's first priority on quality, enhanced recognition on compliance of in-house standards and regulations, employee's performance with the recognition of quality. Amongst service problems to maintain the ace reditations were difficulties in maintaining the recognition of the company's finality management, labor increase to maintain the ISO 9000 enforcement team, and financial burden to keep the accreditation. Quality recognition after the accreditations was significantly improved in setting company's first priority on quality, enhanced recognition on compliance of in-house standards and regulations, employee's performance with the recognition of quality.

VQ와 GMM을 이용한 문맥독립 화자인식기의 성능 비교 (Performance comparison of Text-Independent Speaker Recognizer Using VQ and GMM)

  • 김성종;정훈;정익주
    • 음성과학
    • /
    • 제7권2호
    • /
    • pp.235-244
    • /
    • 2000
  • This paper was focused on realizing the text-independent speaker recognizer using the VQ and GMM algorithm and studying the characteristics of the speaker recognizers that adopt these two algorithms. Because it was difficult ascertain the effect two algorithms have on the speaker recognizer theoretically, we performed the recognition experiments using various parameters and, as the result of the experiments, we could show that GMM algorithm had better recognition performance than VQ algorithm as following. The GMM showed better performance with small training data, and it also showed just a little difference of recognition rate as the kind of feature vectors and the length of input data vary. The GMM showed good recognition performance than the VQ on the whole.

  • PDF

거리 척도에 따른 PCA/LDA기반의 얼굴 인식 성능 분석 (A Performance Analysis of the Face Recognition Based on PCA/LDA on Distance Measures)

  • 송영준;김영길;안재형
    • 한국산학기술학회논문지
    • /
    • 제6권3호
    • /
    • pp.249-254
    • /
    • 2005
  • 본 논문은 얼굴인식에서 사용되고 있는 PCA/LDA 방식의 유사도 측정 방식에 따른 인식 성능을 비교 분석하였다. 총 14가지의 거리 척도를 ORL 얼굴 데이터베이스에 적용하였으며, PCA와 PCA/LDA로 나누어 성능 비교를 하였다. PCA의 경우에는 맨하튼 거리, Weighted SSE 거리의 인식률이 좋지만, PCA/LDA인 경우에는 Angle-based 거리, Modified SSE거리에 대한 인식률이 좋음이 확인되었다. 또한 PCA보다 PCA/LDA의 경우 유사도 비교 차원의 수를 줄이면서 높은 인식률을 유지할 수 있어, PCA/LDA와 Angle-based 거리 척도를 적용하여 얼굴인식을 할 경우 계산의 경제성과 인식률에서 높은 경쟁력을 갖출 수 있다.

  • PDF

신경망을 이용한 휴먼 타이핑 패턴 인식 (Recognition of Human Typing Pattern Using Neural Network)

  • 배중기;김병환;이상규
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2006년 학술대회 논문집 정보 및 제어부문
    • /
    • pp.449-451
    • /
    • 2006
  • With the increasing danger of personal information being exposed, a technique to protect personal information by identifying a non-user in case it is exposed. A study to construct a neural network recognizer for developing a economical and effective user protecting system. For this, time variables regarding user typing patterns from a pattern extraction device. With the variations in the standard deviation for the collected time variables, non-user patterns were generated. The recognition performance increased with the increase in the standard deviation and a higher recognition was achieved at 2.5. Also, five types of training data were generated and the recognition performance was examined as a function of the number of non-user patterns. With the increase in non-suer patterns, the recognition error quantified in the root mean square error (RMSE) was reduced. The smallest RMSE was obtained at the type 5 and 90 non-user patterns. In overall, the type 3 model yielded the highest recognition accuracy Particularly, a perfect recognition of 100% was achieved at 45 non-user patterns.

  • PDF