• Title/Summary/Keyword: temporal feature

Search Result 315, Processing Time 0.023 seconds

A Video Expression Recognition Method Based on Multi-mode Convolution Neural Network and Multiplicative Feature Fusion

  • Ren, Qun
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.556-570
    • /
    • 2021
  • The existing video expression recognition methods mainly focus on the spatial feature extraction of video expression images, but tend to ignore the dynamic features of video sequences. To solve this problem, a multi-mode convolution neural network method is proposed to effectively improve the performance of facial expression recognition in video. Firstly, OpenFace 2.0 is used to detect face images in video, and two deep convolution neural networks are used to extract spatiotemporal expression features. Furthermore, spatial convolution neural network is used to extract the spatial information features of each static expression image, and the dynamic information feature is extracted from the optical flow information of multiple expression images based on temporal convolution neural network. Then, the spatiotemporal features learned by the two deep convolution neural networks are fused by multiplication. Finally, the fused features are input into support vector machine to realize the facial expression classification. Experimental results show that the recognition accuracy of the proposed method can reach 64.57% and 60.89%, respectively on RML and Baum-ls datasets. It is better than that of other contrast methods.

Depth tracking of occluded ships based on SIFT feature matching

  • Yadong Liu;Yuesheng Liu;Ziyang Zhong;Yang Chen;Jinfeng Xia;Yunjie Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.4
    • /
    • pp.1066-1079
    • /
    • 2023
  • Multi-target tracking based on the detector is a very hot and important research topic in target tracking. It mainly includes two closely related processes, namely target detection and target tracking. Where target detection is responsible for detecting the exact position of the target, while target tracking monitors the temporal and spatial changes of the target. With the improvement of the detector, the tracking performance has reached a new level. The problem that always exists in the research of target tracking is the problem that occurs again after the target is occluded during tracking. Based on this question, this paper proposes a DeepSORT model based on SIFT features to improve ship tracking. Unlike previous feature extraction networks, SIFT algorithm does not require the characteristics of pre-training learning objectives and can be used in ship tracking quickly. At the same time, we improve and test the matching method of our model to find a balance between tracking accuracy and tracking speed. Experiments show that the model can get more ideal results.

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

Modeling of Data References with Temporal Locality and Popularity Bias (시간 지역성과 인기 편향성을 가진 데이터 참조의 모델링)

  • Hyokyung Bahn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.6
    • /
    • pp.119-124
    • /
    • 2023
  • This paper proposes a new reference model that can represent data access with temporal locality and popularity bias. Among existing reference models, the LRU-stack model can express temporal locality, which is a characteristic that the more recently referenced data has, the higher the probability of being referenced again. However, it cannot take into account differences in popularity of the data. Conversely, the independent reference model can reflect the different popularity of data, but has the limitation of not being able to model changes in data reference trends over time. The reference model presented in this paper overcomes the limitations of these two models and has the feature of reflecting both the popularity bias of data and their changes over time. This paper also examines the relationship between the cache replacement algorithm and the reference model, and shows the optimality of the proposed model.

An Attention-based Temporal Network for Parkinson's Disease Severity Rating using Gait Signals

  • Huimin Wu;Yongcan Liu;Haozhe Yang;Zhongxiang Xie;Xianchao Chen;Mingzhi Wen;Aite Zhao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2627-2642
    • /
    • 2023
  • Parkinson's disease (PD) is a typical, chronic neurodegenerative disease involving the concentration of dopamine, which can disrupt motor activity and cause different degrees of gait disturbance relevant to PD severity in patients. As current clinical PD diagnosis is a complex, time-consuming, and challenging task that relays on physicians' subjective evaluation of visual observations, gait disturbance has been extensively explored to make automatic detection of PD diagnosis and severity rating and provides auxiliary information for physicians' decisions using gait data from various acquisition devices. Among them, wearable sensors have the advantage of flexibility since they do not limit the wearers' activity sphere in this application scenario. In this paper, an attention-based temporal network (ATN) is designed for the time series structure of gait data (vertical ground reaction force signals) from foot sensor systems, to learn the discriminative differences related to PD severity levels hidden in sequential data. The structure of the proposed method is illuminated by Transformer Network for its success in excavating temporal information, containing three modules: a preprocessing module to map intra-moment features, a feature extractor computing complicated gait characteristic of the whole signal sequence in the temporal dimension, and a classifier for the final decision-making about PD severity assessment. The experiment is conducted on the public dataset PDgait of VGRF signals to verify the proposed model's validity and show promising classification performance compared with several existing methods.

A New Tempo Feature Extraction Based on Modulation Spectrum Analysis for Music Information Retrieval Tasks

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.2
    • /
    • pp.95-106
    • /
    • 2007
  • This paper proposes an effective tempo feature extraction method for music information retrieval. The tempo information is modeled by the narrow-band temporal modulation components, which are decomposed into a modulation spectrum via joint frequency analysis. In implementation, the tempo feature is directly extracted from the modified discrete cosine transform coefficients, which is the output of partial MP3(MPEG 1 Layer 3) decoder. Then, different features are extracted from the amplitudes of modulation spectrum and applied to different music information retrieval tasks. The logarithmic scale modulation frequency coefficients are employed in automatic music emotion classification and music genre classification. The classification precision in both systems is improved significantly. The bit vectors derived from adaptive modulation spectrum is used in audio fingerprinting task That is proved to be able to achieve high robustness in this application. The experimental results in these tasks validate the effectiveness of the proposed tempo feature.

  • PDF

Scalable Hybrid Recommender System with Temporal Information (시간 정보를 이용한 확장성 있는 하이브리드 Recommender 시스템)

  • Ullah, Farman;Sarwar, Ghulam;Kim, Jae-Woo;Moon, Kyeong-Deok;Kim, Jin-Tae;Lee, Sung-Chang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.2
    • /
    • pp.61-68
    • /
    • 2012
  • Recommender Systems have gained much popularity among researchers and is applied in a number of applications. The exponential growth of users and products poses some key challenges for recommender systems. Recommender Systems mostly suffer from scalability and accuracy. The accuracy of Recommender system is somehow inversely proportional to its scalability. In this paper we proposed a Context Aware Hybrid Recommender System using matrix reduction for Hybrid model and clustering technique for predication of item features. In our approach we used user item-feature rating, User Demographic information and context information i.e. specific time and day to improve scalability and accuracy. Our Algorithm produce better results because we reduce the dimension of items features matrix by using different reduction techniques and use user demographic information, construct context aware hybrid user model, cluster the similar user offline, find the nearest neighbors, predict the item features and recommend the Top N- items.

A New Temporal Filtering Method for Improved Automatic Lipreading (향상된 자동 독순을 위한 새로운 시간영역 필터링 기법)

  • Lee, Jong-Seok;Park, Cheol-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.2
    • /
    • pp.123-130
    • /
    • 2008
  • Automatic lipreading is to recognize speech by observing the movement of a speaker's lips. It has received attention recently as a method of complementing performance degradation of acoustic speech recognition in acoustically noisy environments. One of the important issues in automatic lipreading is to define and extract salient features from the recorded images. In this paper, we propose a feature extraction method by using a new filtering technique for obtaining improved recognition performance. The proposed method eliminates frequency components which are too slow or too fast compared to the relevant speech information by applying a band-pass filter to the temporal trajectory of each pixel in the images containing the lip region and, then, features are extracted by principal component analysis. We show that the proposed method produces improved performance in both clean and visually noisy conditions via speaker-independent recognition experiments.

A Reliable Protocol for Real-time Monitoring in Industrial Wireless Sensor Networks (산업 무선 센서 네트워크에서 실시간 모니터링을 위한 신뢰성 향상 기법)

  • Oh, Seungmin;Jung, Kwansoo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.5
    • /
    • pp.424-434
    • /
    • 2017
  • In industrial wireless sensor networks, many applications require integrated QoS supporting. This paper proposes a reliable protocol for real-time monitoring in industrial wireless sensor networks. Retransmission is well-known to recover the transmission failure, however, this might cause the time delay to violate the real-time requirement. The proposed protocol exploits broadcasting feature of wireless networks and the temporal opportunity concept. The opportunities to relay the data packets are shared by the broadcasting feature and the temporal opportunity concept maximize the number of candidates in communication. Simulation results show that the proposed protocol is superior to the existing real-time protocols in term of real-time service and reliability.

Speech detection from broadcast contents using multi-scale time-dilated convolutional neural networks (다중 스케일 시간 확장 합성곱 신경망을 이용한 방송 콘텐츠에서의 음성 검출)

  • Jang, Byeong-Yong;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.89-96
    • /
    • 2019
  • In this paper, we propose a deep learning architecture that can effectively detect speech segmentation in broadcast contents. We also propose a multi-scale time-dilated layer for learning the temporal changes of feature vectors. We implement several comparison models to verify the performance of proposed model and calculated the frame-by-frame F-score, precision, and recall. Both the proposed model and the comparison model are trained with the same training data, and we train the model using 32 hours of Korean broadcast data which is composed of various genres (drama, news, documentary, and so on). Our proposed model shows the best performance with F-score 91.7% in Korean broadcast data. The British and Spanish broadcast data also show the highest performance with F-score 87.9% and 92.6%. As a result, our proposed model can contribute to the improvement of performance of speech detection by learning the temporal changes of the feature vectors.