• 제목/요약/키워드: recognition task

검색결과 613건 처리시간 0.022초

음성-영상 특징 추출 멀티모달 모델을 이용한 감정 인식 모델 개발 (Development of Emotion Recognition Model Using Audio-video Feature Extraction Multimodal Model)

  • 김종구;권장우
    • 융합신호처리학회논문지
    • /
    • 제24권4호
    • /
    • pp.221-228
    • /
    • 2023
  • 감정으로 인해 생기는 신체적 정신적인 변화는 운전이나 학습 행동 등 다양한 행동에 영향을 미칠 수 있다. 따라서 이러한 감정을 인식하는 것은 운전 중 위험한 감정 인식 및 제어 등 다양한 산업에서 이용될 수 있기 때문에 매우 중요한 과업이다. 본 논문에는 서로 도메인이 다른 음성과 영상 데이터를 모두 이용하여 감정을 인식하는 멀티모달 모델을 구현하여 감정 인식 연구를 진행했다. 본 연구에서는 RAVDESS 데이터를 이용하여 영상 데이터에 음성을 추출한 뒤 2D-CNN을 이용한 모델을 통해 음성 데이터 특징을 추출하였으며 영상 데이터는 Slowfast feature extractor를 통해 영상 데이터 특징을 추출하였다. 감정 인식을 위한 제안된 멀티모달 모델에서 음성 데이터와 영상 데이터의 특징 벡터를 통합하여 감정 인식을 시도하였다. 또한 멀티모달 모델을 구현할 때 많이 쓰인 방법론인 각 모델의 결과 스코어를 합치는 방법, 투표하는 방법을 이용하여 멀티모달 모델을 구현하고 본 논문에서 제안하는 방법과 비교하여 각 모델의 성능을 확인하였다.

A Survey of Face Recognition Techniques

  • Jafri, Rabia;Arabnia, Hamid R.
    • Journal of Information Processing Systems
    • /
    • 제5권2호
    • /
    • pp.41-68
    • /
    • 2009
  • Face recognition presents a challenging problem in the field of image analysis and computer vision, and as such has received a great deal of attention over the last few years because of its many applications in various domains. Face recognition techniques can be broadly divided into three categories based on the face data acquisition methodology: methods that operate on intensity images; those that deal with video sequences; and those that require other sensory data such as 3D information or infra-red imagery. In this paper, an overview of some of the well-known methods in each of these categories is provided and some of the benefits and drawbacks of the schemes mentioned therein are examined. Furthermore, a discussion outlining the incentive for using face recognition, the applications of this technology, and some of the difficulties plaguing current systems with regard to this task has also been provided. This paper also mentions some of the most recent algorithms developed for this purpose and attempts to give an idea of the state of the art of face recognition technology.

Covariance-based Recognition Using Machine Learning Model

  • Osman, Hassab Elgawi
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2009년도 IWAIT
    • /
    • pp.223-228
    • /
    • 2009
  • We propose an on-line machine learning approach for object recognition, where new images are continuously added and the recognition decision is made without delay. Random forest (RF) classifier has been extensively used as a generative model for classification and regression applications. We extend this technique for the task of building incremental component-based detector. First we employ object descriptor model based on bag of covariance matrices, to represent an object region then run our on-line RF learner to select object descriptors and to learn an object classifier. Experiments of the object recognition are provided to verify the effectiveness of the proposed approach. Results demonstrate that the propose model yields in object recognition performance comparable to the benchmark standard RF, AdaBoost, and SVM classifiers.

  • PDF

ADD-Net: Attention Based 3D Dense Network for Action Recognition

  • Man, Qiaoyue;Cho, Young Im
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권6호
    • /
    • pp.21-28
    • /
    • 2019
  • Recent years with the development of artificial intelligence and the success of the deep model, they have been deployed in all fields of computer vision. Action recognition, as an important branch of human perception and computer vision system research, has attracted more and more attention. Action recognition is a challenging task due to the special complexity of human movement, the same movement may exist between multiple individuals. The human action exists as a continuous image frame in the video, so action recognition requires more computational power than processing static images. And the simple use of the CNN network cannot achieve the desired results. Recently, the attention model has achieved good results in computer vision and natural language processing. In particular, for video action classification, after adding the attention model, it is more effective to focus on motion features and improve performance. It intuitively explains which part the model attends to when making a particular decision, which is very helpful in real applications. In this paper, we proposed a 3D dense convolutional network based on attention mechanism(ADD-Net), recognition of human motion behavior in the video.

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • 제45권4호
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

외향성과 정서단어의 재인 기억: 정서가, 빈도, 과제 난이도 효과 (Extraversion and Recognition for Emotional Words: Effects of Valence, Frequency, and Task-difficulty)

  • 강은주
    • 인지과학
    • /
    • 제25권4호
    • /
    • pp.385-416
    • /
    • 2014
  • 본 연구는 외향성이라는 성격 특성에 따른 정서적 단어의 기억 수행의 차이를 연구하기 위해, 신호 탐지 분석법을 적용하여 기억 변별력과 재인 반응 편향을 분석하였다. 참여자들은 부호화 시에 제시되는 정서 단어에 대하여 정서 범주 판단과제를 수행하고, 이어서 재인 검사를 받았다. 또한 단어 재인에 미치는 과제의 난이도와 성격의 상호작용 조사하기 위해, 부호화와 인출 사이의 기간을 달리한 두 개의 실험이 수행되었다. 파지 지연기간이 짧은(5분) 저난이도 과제(Study I)에서는 특히 저빈도 단어에 대해, 외향성이 낮은 사람일수록 더 좋은 기억 수행(높은 d')을 보였으며, 재인 반응 편향에는 외향성에 따른 차이가 없었다. 특히, 외향성이 높을수록 오류 재인 후에 과신하는 경향이 높았다. 파지기간이 긴(한 달) 고난이도 과제(Study II)의 경우, 기억 수행은 외향성에 따른 차이가 없이 전반적으로 저조하였나, 고빈도-긍정 단어에서만 외향성이 높은 개인일수록 훨씬 자유로운 반응 준거(높은 적중률과 높은 오경보율)를 적용하는 재인 수행의 특성을 보이는 것이 관찰되었으며, 이런 긍정단어에 대한 자신의 재인에 과신하는 경향도 높았다. 본 결과는 기억 수행이 저조해질 때, 외향성이 높은 개인들이 내적 통제 과정에 더 취약해 지며, 이런 성격차이는 긍정단어의 기억의 재인 준거나 재인 반응에 대한 확신에 영향을 미칠 수 있음을 보인다. 즉 기억의 흔적이 약할 때, 외향성이 높은 개인들은 긍정적 정서가의 단어에 특정적으로 기억 보고와 확신 편향을 보일 수 있음을 시사한다.

Deep recurrent neural networks with word embeddings for Urdu named entity recognition

  • Khan, Wahab;Daud, Ali;Alotaibi, Fahd;Aljohani, Naif;Arafat, Sachi
    • ETRI Journal
    • /
    • 제42권1호
    • /
    • pp.90-100
    • /
    • 2020
  • Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state-of-the-art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short-term memory and back propagation through time approaches. The proposed models consider both language-dependent features, such as part-of-speech tags, and language-independent features, such as the "context windows" of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f-measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.

지역대표도서관의 역할 및 추진 방향에 관한 사서들의 인식 연구 (The Recognition of Librarians about Roles of Regional Central Library)

  • 김홍렬
    • 한국도서관정보학회지
    • /
    • 제40권1호
    • /
    • pp.115-132
    • /
    • 2009
  • 지역대표도서관은 지역의 특성을 반영하여 운영되어야 하며, 그 기능과 업무내용도 우선순위를 설정하여 점진적으로 추진되어야 한다. 이러한 관점에서 본 연구는 지역대표도서관의 정책을 올바르게 이해하고, 해당지역의 도서관정책을 수립하거나 협력사업의 추진내용을 결정할 때 참고할 수 있는 기초 자료를 수집하기 위하여 지역대표도서관의 역할 및 추진방향에 관한 사서들의 인식을 조사하였다. 그 결과, 사서들은 도서관 발전방향 및 정책수립과 대외도서관협력을 지역대표도서관이 추진해야 할 가장 시급한 과제로 인식하고 있었으며, 자료의 납본 및 보존, 지역주민의 정보서비스에서는 상대적으로 낮은 인식을 보였다. 또한 향후 최우선으로 고려해야 하는 협력업무로 교육 및 문화프로그램의 협력과 도서관상호대차가 높게 나타났으며, 공동보존은 상대적으로 낮게 인식하는 협력사업으로 확인되었다.

  • PDF

TSN을 이용한 도로 감시 카메라 영상의 강우량 인식 방법 (Rainfall Recognition from Road Surveillance Videos Using TSN)

  • ;현종환;최호진
    • 한국대기환경학회지
    • /
    • 제34권5호
    • /
    • pp.735-747
    • /
    • 2018
  • Rainfall depth is an important meteorological information. Generally, high spatial resolution rainfall data such as road-level rainfall data are more beneficial. However, it is expensive to set up sufficient Automatic Weather Systems to get the road-level rainfall data. In this paper, we propose to use deep learning to recognize rainfall depth from road surveillance videos. To achieve this goal, we collect a new video dataset and propose a procedure to calculate refined rainfall depth from the original meteorological data. We also propose to utilize the differential frame as well as the optical flow image for better recognition of rainfall depth. Under the Temporal Segment Networks framework, the experimental results show that the combination of the video frame and the differential frame is a superior solution for the rainfall depth recognition. The final model is able to achieve high performance in the single-location low sensitivity classification task and reasonable accuracy in the higher sensitivity classification task for both the single-location and the multi-location case.

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

  • Lithgow-Serrano, Oscar;Cornelius, Joseph;Kanjirangat, Vani;Mendez-Cruz, Carlos-Francisco;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.22.1-22.5
    • /
    • 2021
  • Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.