• Title/Summary/Keyword: Visual Classification

Search Result 585, Processing Time 0.033 seconds

Multi-Modal based ViT Model for Video Data Emotion Classification (영상 데이터 감정 분류를 위한 멀티 모달 기반의 ViT 모델)

  • Yerim Kim;Dong-Gyu Lee;Seo-Yeong Ahn;Jee-Hyun Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.9-12
    • /
    • 2023
  • 최근 영상 콘텐츠를 통해 영상물의 메시지뿐 아니라 메시지의 형식을 통해 전달된 감정이 시청하는 사람의 심리 상태에 영향을 주고 있다. 이에 따라, 영상 콘텐츠의 감정을 분류하는 연구가 활발히 진행되고 있고 본 논문에서는 대중적인 영상 스트리밍 플랫폼 중 하나인 유튜브 영상을 7가지의 감정 카테고리로 분류하는 여러 개의 영상 데이터 중 각 영상 데이터에서 오디오와 이미지 데이터를 각각 추출하여 학습에 이용하는 멀티 모달 방식 기반의 영상 감정 분류 모델을 제안한다. 사전 학습된 VGG(Visual Geometry Group)모델과 ViT(Vision Transformer) 모델을 오디오 분류 모델과 이미지 분류 모델에 이용하여 학습하고 본 논문에서 제안하는 병합 방법을 이용하여 병합 후 비교하였다. 본 논문에서는 기존 영상 데이터 감정 분류 방식과 다르게 영상 속에서 화자를 인식하지 않고 감정을 분류하여 최고 48%의 정확도를 얻었다.

  • PDF

The Development and Application of Biotop Value Assessment Tool(B-VAT) Based on GIS to Measure Landscape Value of Biotop (GIS 기반 비오톱 경관가치 평가도구(B-VAT)의 개발 및 적용)

  • Cho, Hyun-Ju;Ra, Jung-Hwa;Kwon, Oh-Sung
    • Journal of Korean Society of Rural Planning
    • /
    • v.18 no.4
    • /
    • pp.13-26
    • /
    • 2012
  • The purpose of this study is to select the study area, which will be formed into Daegu Science Park as an national industrial complex, and to assess the landscape value based on biotop classification with different polygon forms, and to develop and computerize Biotop Value Assessment Tool (B-VAT) based on GIS. The result is as follows. First, according to the result of biotop classification based on an advanced analysis on preliminary data, a field study, and a literature review, total 13 biotop groups such as forrest biotop groups and total 63 biotop types were classified. Second, based on the advanced research on landscape value assessment model of biotop, we development biotop value assessment tool by using visual basic programming language on the ArcGIS. The first application result with B-VAT showed that the first grade was classified into 19 types including riverside forest(BE), the second grade 12 types including artificial plantation(ED), and the third class, the fourth grade, and the fifth grade 12 types, 2 types, and 18 types respectively. Also, according to the second evaluation result with above results, we divided a total number of 31 areas and 34 areas, which had special meaning for landscape conservation(1a, 1b) and which had meaning for landscape conservation(2a, 2b, 2c). As such, biotop type classification and an landscape value evaluation, both of which were suggested from the result of the study, will help to scientifically understand a landscape value for a target land before undertaking reckless development. And it will serve to provide important preliminary data aimed to overcome damaged landscape due to developed and to manage a landscape planning in the future. In particular, we expect that B-VAT based on GIS will help overcome the limitations of applicability for of current value evaluation models, which are based on complicated algorithms, and will be a great contribution to an increase in convenience and popularity. In addition, this will save time and improve the accuracy for hand-counting. However, this study limited to aesthetic-visual part in biotop assessment. Therefore, it is certain that in the future research comprehensive assessment should be conducted with conservation and recreation view.

Design of User Concentration Classification Model by EEG Analysis Based on Visual SCPT

  • Park, Jin Hyeok;Kang, Seok Hwan;Lee, Byung Mun;Kang, Un Gu;Lee, Young Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.129-135
    • /
    • 2018
  • In this study, we designed a model that can measure the level of user's concentration by measuring and analyzing EEG data of the subjects who are performing Continuous Performance Test based on visual stimulus. This study focused on alpha and beta waves, which are closely related to concentration in various brain waves. There are a lot of research and services to enhance not only concentration but also brain activity. However, there are formidable barriers to ordinary people for using routinely because of high cost and complex procedures. Therefore, this study designed the model using the portable EEG measurement device with reasonable cost and Visual Continuous Performance Test which we developed as a simplified version of the existing CPT. This study aims to measure the concentration level of the subject objectively through simple and affordable way, EEG analysis. Concentration is also closely related to various brain diseases such as dementia, depression, and ADHD. Therefore, we believe that our proposed model can be useful not only for improving concentration but also brain disease prediction and monitoring research. In addition, the combination of this model and the Brain Computer Interface technology can create greater synergy in various fields.

Aural-visual two-stream based infant cry recognition (Aural-visual two-stream 기반의 아기 울음소리 식별)

  • Bo, Zhao;Lee, Jonguk;Atif, Othmane;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.354-357
    • /
    • 2021
  • Infants communicate their feelings and needs to the outside world through non-verbal methods such as crying and displaying diverse facial expressions. However, inexperienced parents tend to decode these non-verbal messages incorrectly and take inappropriate actions, which might affect the bonding they build with their babies and the cognitive development of the newborns. In this paper, we propose an aural-visual two-stream based infant cry recognition system to help parents comprehend the feelings and needs of crying babies. The proposed system first extracts the features from the pre-processed audio and video data by using the VGGish model and 3D-CNN model respectively, fuses the extracted features using a fully connected layer, and finally applies a SoftMax function to classify the fused features and recognize the corresponding type of cry. The experimental results show that the proposed system classification exceeds 0.92 in F1-score, which is 0.08 and 0.10 higher than the single-stream aural model and single-stream visual model.

Visual Evaluation according to Changes in Length of Pants and Width of Hem Line of Wide Pants (와이드팬츠의 바지 길이와 바지 부리 폭 변화에 따른 시각적 평가)

  • Lee, Jung-Jin
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.17 no.3
    • /
    • pp.159-168
    • /
    • 2015
  • In this study, visual evaluation was wide pants with changes in length of pants and width of hem line of wide pants design to provide data which can enhance wearing image effects at the production of wide pants. According to the length of pants and width of hem line of wide pants, a total of 9 stimulants were chosen. Then, they were evaluated using a seven-point rating scale against 40 fashion students. The data has been analyzed by Factor Analysis, ANOVA, $Scheff{\acute{e}}$'s Test and the MCA method. The results of the study are as follows : 1. According to factor analysis, the components of visual evaluation depending on the length of pants and width of hem line of wide pants were divided into five factors: individuality, body correction, modesty, body length and cute. 2. According to visual evaluation depending on changes in the length of pants, no significant difference was found in all five positions. 3. According to visual evaluation depending on changes in the length of pants and width of hem line of wide pants, 'width of hem line 60, 100' revealed a significant difference in body correction. 'width of hem line 80' revealed a significant difference in body correction and body length. 4. In terms of interactions over changes in the length of pants and width of hem line of wide pants, no interaction effects were found in all five factors. According to multiple classification analysis(MCA) on the factors without interaction effects, length of pants had more effect on visual image in body correction, body length and cute. In other factors, more influence was observed depending on the width of hem line.

  • PDF

Efficient Object Classification Scheme for Scanned Educational Book Image (교육용 도서 영상을 위한 효과적인 객체 자동 분류 기술)

  • Choi, Young-Ju;Kim, Ji-Hae;Lee, Young-Woon;Lee, Jong-Hyeok;Hong, Gwang-Soo;Kim, Byung-Gyu
    • Journal of Digital Contents Society
    • /
    • v.18 no.7
    • /
    • pp.1323-1331
    • /
    • 2017
  • Despite the fact that the copyright has grown into a large-scale business, there are many constant problems especially in image copyright. In this study, we propose an automatic object extraction and classification system for the scanned educational book image by combining document image processing and intelligent information technology like deep learning. First, the proposed technology removes noise component and then performs a visual attention assessment-based region separation. Then we carry out grouping operation based on extracted block areas and categorize each block as a picture or a character area. Finally, the caption area is extracted by searching around the classified picture area. As a result of the performance evaluation, it can be seen an average accuracy of 83% in the extraction of the image and caption area. For only image region detection, up-to 97% of accuracy is verified.

A Recognition Framework for Facial Expression by Expression HMM and Posterior Probability (표정 HMM과 사후 확률을 이용한 얼굴 표정 인식 프레임워크)

  • Kim, Jin-Ok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.3
    • /
    • pp.284-291
    • /
    • 2005
  • I propose a framework for detecting, recognizing and classifying facial features based on learned expression patterns. The framework recognizes facial expressions by using PCA and expression HMM(EHMM) which is Hidden Markov Model (HMM) approach to represent the spatial information and the temporal dynamics of the time varying visual expression patterns. Because the low level spatial feature extraction is fused with the temporal analysis, a unified spatio-temporal approach of HMM to common detection, tracking and classification problems is effective. The proposed recognition framework is accomplished by applying posterior probability between current visual observations and previous visual evidences. Consequently, the framework shows accurate and robust results of recognition on as well simple expressions as basic 6 facial feature patterns. The method allows us to perform a set of important tasks such as facial-expression recognition, HCI and key-frame extraction.

A new approach to classify barred galaxies based on the potential map

  • Lee, Yun Hee;Park, Myeong-Gu;Ann, Hong Bae;Kim, Taehyun;Seo, Woo-Young
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.1
    • /
    • pp.33.3-33.3
    • /
    • 2019
  • Automatic, yet reliable methods to find and classify barred galaxies are going to be more important in the era of large galaxy surveys. Here, we introduce a new approach to classify barred galaxies by analyzing the butterfly pattern that Buta & Block (2001) reported as a bar signature on the potential map. We make it easy to find the pattern by moving the ratio map from a Cartesian coordinate to a polar coordinate. Our volume-limited sample consists of 1698 spiral galaxies brighter than Mr = -15.2 with z < 0.01 from the Sloan Digital Sky Survey/DR7 visually classified by Ann et al. (2015). We compared the results of the classification obtained by four different methods: visual inspection, ellipse fitting, Fourier analysis, and our new method. We obtain, for the same sample, different bar fractions of 63%, 48%, 36%, and 56% by visual inspection, ellipse fitting, Fourier analysis, and our new approach, respectively. Although automatic classifications detect visually determined, strongly barred galaxies with the concordance of 74% to 86%, automatically selected barred galaxies contain different amount of weak bars. We find a different dependence of bar fraction on the Hubble type for strong and weak bars: SBs are preponderant in early-type spirals, whereas SABs are in late-type spirals. Moreover, the ellipse fitting method often misses strongly barred galaxies in the bulge-dominated galaxies. These explain why previous works showed the contradictory dependence of the bar fraction on the host galaxy properties. Our new method has the highest agreement with visual inspection in terms of the individual classification and the overall bar fraction. In addition, we find another signature on the ratio map to classify barred galaxies into new two classes that are probably related to the age of the bar.

  • PDF

Adversarial Example Detection Based on Symbolic Representation of Image (이미지의 Symbolic Representation 기반 적대적 예제 탐지 방법)

  • Park, Sohee;Kim, Seungjoo;Yoon, Hayeon;Choi, Daeseon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.975-986
    • /
    • 2022
  • Deep learning is attracting great attention, showing excellent performance in image processing, but is vulnerable to adversarial attacks that cause the model to misclassify through perturbation on input data. Adversarial examples generated by adversarial attacks are minimally perturbated where it is difficult to identify, so visual features of the images are not generally changed. Unlikely deep learning models, people are not fooled by adversarial examples, because they classify the images based on such visual features of images. This paper proposes adversarial attack detection method using Symbolic Representation, which is a visual and symbolic features such as color, shape of the image. We detect a adversarial examples by comparing the converted Symbolic Representation from the classification results for the input image and Symbolic Representation extracted from the input images. As a result of measuring performance on adversarial examples by various attack method, detection rates differed depending on attack targets and methods, but was up to 99.02% for specific target attack.

COVID-19 Diagnosis from CXR images through pre-trained Deep Visual Embeddings

  • Khalid, Shahzaib;Syed, Muhammad Shehram Shah;Saba, Erum;Pirzada, Nasrullah
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.175-181
    • /
    • 2022
  • COVID-19 is an acute respiratory syndrome that affects the host's breathing and respiratory system. The novel disease's first case was reported in 2019 and has created a state of emergency in the whole world and declared a global pandemic within months after the first case. The disease created elements of socioeconomic crisis globally. The emergency has made it imperative for professionals to take the necessary measures to make early diagnoses of the disease. The conventional diagnosis for COVID-19 is through Polymerase Chain Reaction (PCR) testing. However, in a lot of rural societies, these tests are not available or take a lot of time to provide results. Hence, we propose a COVID-19 classification system by means of machine learning and transfer learning models. The proposed approach identifies individuals with COVID-19 and distinguishes them from those who are healthy with the help of Deep Visual Embeddings (DVE). Five state-of-the-art models: VGG-19, ResNet50, Inceptionv3, MobileNetv3, and EfficientNetB7, were used in this study along with five different pooling schemes to perform deep feature extraction. In addition, the features are normalized using standard scaling, and 4-fold cross-validation is used to validate the performance over multiple versions of the validation data. The best results of 88.86% UAR, 88.27% Specificity, 89.44% Sensitivity, 88.62% Accuracy, 89.06% Precision, and 87.52% F1-score were obtained using ResNet-50 with Average Pooling and Logistic regression with class weight as the classifier.