• Title/Summary/Keyword: Deep Visual Features

Search Result 65, Processing Time 0.033 seconds

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

  • Eom, Youngsik;Bang, Junseong
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.148-154
    • /
    • 2021
  • With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.

Bottleneck-based Siam-CNN Algorithm for Object Tracking (객체 추적을 위한 보틀넥 기반 Siam-CNN 알고리즘)

  • Lim, Su-Chang;Kim, Jong-Chan
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2022
  • Visual Object Tracking is known as the most fundamental problem in the field of computer vision. Object tracking localize the region of target object with bounding box in the video. In this paper, a custom CNN is created to extract object feature that has strong and various information. This network was constructed as a Siamese network for use as a feature extractor. The input images are passed convolution block composed of a bottleneck layers, and features are emphasized. The feature map of the target object and the search area, extracted from the Siamese network, was input as a local proposal network. Estimate the object area using the feature map. The performance of the tracking algorithm was evaluated using the OTB2013 dataset. Success Plot and Precision Plot were used as evaluation matrix. As a result of the experiment, 0.611 in Success Plot and 0.831 in Precision Plot were achieved.

Task Planning Algorithm with Graph-based State Representation (그래프 기반 상태 표현을 활용한 작업 계획 알고리즘 개발)

  • Seongwan Byeon;Yoonseon Oh
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.2
    • /
    • pp.196-202
    • /
    • 2024
  • The ability to understand given environments and plan a sequence of actions leading to goal state is crucial for personal service robots. With recent advancements in deep learning, numerous studies have proposed methods for state representation in planning. However, previous works lack explicit information about relationships between objects when the state observation is converted to a single visual embedding containing all state information. In this paper, we introduce graph-based state representation that incorporates both object and relationship features. To leverage these advantages in addressing the task planning problem, we propose a Graph Neural Network (GNN)-based subgoal prediction model. This model can extract rich information about object and their interconnected relationships from given state graph. Moreover, a search-based algorithm is integrated with pre-trained subgoal prediction model and state transition module to explore diverse states and find proper sequence of subgoals. The proposed method is trained with synthetic task dataset collected in simulation environment, demonstrating a higher success rate with fewer additional searches compared to baseline methods.

Two person Interaction Recognition Based on Effective Hybrid Learning

  • Ahmed, Minhaz Uddin;Kim, Yeong Hyeon;Kim, Jin Woo;Bashar, Md Rezaul;Rhee, Phill Kyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.751-770
    • /
    • 2019
  • Action recognition is an essential task in computer vision due to the variety of prospective applications, such as security surveillance, machine learning, and human-computer interaction. The availability of more video data than ever before and the lofty performance of deep convolutional neural networks also make it essential for action recognition in video. Unfortunately, limited crafted video features and the scarcity of benchmark datasets make it challenging to address the multi-person action recognition task in video data. In this work, we propose a deep convolutional neural network-based Effective Hybrid Learning (EHL) framework for two-person interaction classification in video data. Our approach exploits a pre-trained network model (the VGG16 from the University of Oxford Visual Geometry Group) and extends the Faster R-CNN (region-based convolutional neural network a state-of-the-art detector for image classification). We broaden a semi-supervised learning method combined with an active learning method to improve overall performance. Numerous types of two-person interactions exist in the real world, which makes this a challenging task. In our experiment, we consider a limited number of actions, such as hugging, fighting, linking arms, talking, and kidnapping in two environment such simple and complex. We show that our trained model with an active semi-supervised learning architecture gradually improves the performance. In a simple environment using an Intelligent Technology Laboratory (ITLab) dataset from Inha University, performance increased to 95.6% accuracy, and in a complex environment, performance reached 81% accuracy. Our method reduces data-labeling time, compared to supervised learning methods, for the ITLab dataset. We also conduct extensive experiment on Human Action Recognition benchmarks such as UT-Interaction dataset, HMDB51 dataset and obtain better performance than state-of-the-art approaches.

Visual Model of Pattern Design Based on Deep Convolutional Neural Network

  • Jingjing Ye;Jun Wang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.311-326
    • /
    • 2024
  • The rapid development of neural network technology promotes the neural network model driven by big data to overcome the texture effect of complex objects. Due to the limitations in complex scenes, it is necessary to establish custom template matching and apply it to the research of many fields of computational vision technology. The dependence on high-quality small label sample database data is not very strong, and the machine learning system of deep feature connection to complete the task of texture effect inference and speculation is relatively poor. The style transfer algorithm based on neural network collects and preserves the data of patterns, extracts and modernizes their features. Through the algorithm model, it is easier to present the texture color of patterns and display them digitally. In this paper, according to the texture effect reasoning of custom template matching, the 3D visualization of the target is transformed into a 3D model. The high similarity between the scene to be inferred and the user-defined template is calculated by the user-defined template of the multi-dimensional external feature label. The convolutional neural network is adopted to optimize the external area of the object to improve the sampling quality and computational performance of the sample pyramid structure. The results indicate that the proposed algorithm can accurately capture the significant target, achieve more ablation noise, and improve the visualization results. The proposed deep convolutional neural network optimization algorithm has good rapidity, data accuracy and robustness. The proposed algorithm can adapt to the calculation of more task scenes, display the redundant vision-related information of image conversion, enhance the powerful computing power, and further improve the computational efficiency and accuracy of convolutional networks, which has a high research significance for the study of image information conversion.

Do Galaxy Mergers Enhance Star Formation Rate in Nearby Galaxies?

  • Lim, Gu;Im, Myungshin;Choi, Changsu;Yoon, Yongmin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.1
    • /
    • pp.50.1-50.1
    • /
    • 2017
  • We present our study of the correlation between star formation rate(SFR) and merging activities of nearby galaxies(d<150Mpc). Our study uses 265 UV-selected galaxies which are not classified as AGN. The UV selection is made using the GALEX Atlas of Galaxies (Gil de Paz+07) and the updated UV catalog of nearby galaxies (Bai+15). We use deep R band optical images reaching to $1{\sigma}$ surface brightness detection limit ${\sim}27mag/arcsec^2$ to classify merger features by visual inspection. We also estimated unobscured SFR($SFR_{NUV}$) and obscured SFR($SFR_{W4}$) using Near-UV continuum and 22 micron Mid-IR luminosity respectively as a indicator of star forming activity. The fraction of galaxies with merger features in each SFR bin is obtained to see if how the fraction of galaxies with merging features($F_m$) changes as a function of SFR. As a result, for 203 late type galaxies(LTGs), we found that merger fraction increases from ~8% up to 50% with $SFR_{W4}$, while for 229 LTGs $SFR_{NUV}$ shows relatively consistent fraction(~18%) of merger fraction. For early type galaxies(ETGs), we could also find no significant correlation between $F_m$ and SFR(both $SFR_{NUV}$ and $SFR_{W4}$). This result suggests that a main driver of star forming activity of UV bright galaxies, especially for obscured late types, is mergers.

  • PDF

The Grid System of Women's Jeogori in Joseon Dynasty (조선시대 여성저고리의 그리드체계)

  • Han, Eun-Hye
    • Journal of the Korean Society of Costume
    • /
    • v.62 no.6
    • /
    • pp.200-217
    • /
    • 2012
  • The purpose of this research is to examine the specificity of grids to define the characteristics of clothes styles in the Joseon Dynasty period. The significance of examining of the specificity of grids is to find out arbitrary types of the features of grids involved in structuring the Jeogori in the Joseon Dynasty period one by one. The Visual Linguistic Theory was introduced as a methodological tool to exquisitely analyze the characteristics of grids in deep structures of Jeogori in the Joseon Dynasty period. This theory strives to examine sample distribution, the distribution of samples by quality and the distribution of the types of ploidy features. Through the examination, the results are as follows. The grid systems of the Jeogori consisted of diverse proportion systems reaching 86 cases, that is, sequence systems composed of multi-functional, multi-combined bodies. Most ornamental grids had feature angles distributed in a range of $2-20^{\circ}$ that showed a common preference for low sloped diagonal lines or small curvature. Although the preference for certain feature angles were prominent, the feature angles that were used were generally distributed evenly among diverse feature angles to show the characteristics of separation. Therefore, Jeogori makers in the Joseon Dynasty period can be considered as having experimented with many proportion systems to show their aesthetics. In conclusion, based on the results of the examination of feature distributions and related methods to allocate ploidy features, O-type accounted for 66% and thus it was identified that the Jeogori was characterized by O-type. Therefore, it was identified that the characteristic of the Jeogori in the Joseon Dynasty period consisted of O-type fractal structures which are formative structures unique to our nation.

Blurred Image Enhancement Techniques Using Stack-Attention (Stack-Attention을 이용한 흐릿한 영상 강화 기법)

  • Park Chae Rim;Lee Kwang Ill;Cho Seok Je
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.83-90
    • /
    • 2023
  • Blurred image is an important factor in lowering image recognition rates in Computer vision. This mainly occurs when the camera is unstablely out of focus or the object in the scene moves quickly during the exposure time. Blurred images greatly degrade visual quality, weakening visibility, and this phenomenon occurs frequently despite the continuous development digital camera technology. In this paper, it replace the modified building module based on the Deep multi-patch neural network designed with convolution neural networks to capture details of input images and Attention techniques to focus on objects in blurred images in many ways and strengthen the image. It measures and assigns each weight at different scales to differentiate the blurring of change and restores from rough to fine levels of the image to adjust both global and local region sequentially. Through this method, it show excellent results that recover degraded image quality, extract efficient object detection and features, and complement color constancy.

Correlation between galaxy mergers and AGN activity

  • Hong, Ju-Eun;Im, Myung-Shin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.37 no.1
    • /
    • pp.47.2-47.2
    • /
    • 2012
  • Using deep images taken at Maidanak 1.5m telescope, at McDonald 2.1m telescope and Canada-France-Hawaii Telescope, Dupont 2.5m telescope we investigated the fraction of merging galaxies in hosts of 39 AGN which are brighter than M = -22 mag and nearer than z = 0.3. We found that 16 to 17 of 39 AGN host galaxies show the evidence of mergers like tidal tail, shell via careful visual inspection. We also studied with the merging fraction of a control sample, SDSS Stripe82 early type galaxies of which surface brightness limit and bulge magnitude are similar to that of the AGN sample. We found that merging fraction of the AGN sample is higher than that of early type galaxy samples in the whole range of bulge magnitude. This result implies that AGN activity may be correlated with merging. We also investigated the detailed morphology of merging feature. At least - 1/4 of control samples classified as a tidal and tidal+dust are shell structures. On the other hand only one (5.9%) of AGN sample classified as merger shows shell structures, and almost all merging AGNs show tidal tail features. From point of view that tidal tail may be at the early stage of merging, and shell may be at the late stage of mergers, this result suggests that AGN might be evolved into early-type galaxies after merging.

  • PDF

Correlation between galaxy mergers and AGN activity

  • Hong, Ju-Eun;Im, Myung-Shin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.36 no.2
    • /
    • pp.79-79
    • /
    • 2011
  • Using deep images taken at Maidanak 1.5m telescope, at McDonald 2.1m telescope and Canada-France-Hawaii Telescope, we investigated the fraction of merging galaxies in hosts of 26 AGN which are brighter than M = -22.2 mag and nearer than z = 0.2. We found that 9 to 12 of 26 AGN host galaxies show the evidence of mergers like tidal tail, shell via visual inspection. We also studied with the merging fraction of a control sample, SDSS Stripe82 galaxies. Surface brightness limit and magnitude are similar to that of the AGN sample. We found that merging fraction of the AGN sample is higher than that of normal galaxy samples. This result implies that AGN activity may be correlated with merging. We also investigated the detailed morphology of merging feature. About ~1/4 of control sample classified as a tidal and tidal+dust are shell structures. On the other hand only one of the AGN sample shows shell structures. Almost all merging AGNs show tidal tail features. From point of view that tidal tail may be at the early stage of merging, and shell may be at the late stage of mergers, this result implies that AGN may be evolved into early-type galaxies after merging.

  • PDF