• Title/Summary/Keyword: VGG16

Search Result 126, Processing Time 0.026 seconds

Two person Interaction Recognition Based on Effective Hybrid Learning

  • Ahmed, Minhaz Uddin;Kim, Yeong Hyeon;Kim, Jin Woo;Bashar, Md Rezaul;Rhee, Phill Kyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.751-770
    • /
    • 2019
  • Action recognition is an essential task in computer vision due to the variety of prospective applications, such as security surveillance, machine learning, and human-computer interaction. The availability of more video data than ever before and the lofty performance of deep convolutional neural networks also make it essential for action recognition in video. Unfortunately, limited crafted video features and the scarcity of benchmark datasets make it challenging to address the multi-person action recognition task in video data. In this work, we propose a deep convolutional neural network-based Effective Hybrid Learning (EHL) framework for two-person interaction classification in video data. Our approach exploits a pre-trained network model (the VGG16 from the University of Oxford Visual Geometry Group) and extends the Faster R-CNN (region-based convolutional neural network a state-of-the-art detector for image classification). We broaden a semi-supervised learning method combined with an active learning method to improve overall performance. Numerous types of two-person interactions exist in the real world, which makes this a challenging task. In our experiment, we consider a limited number of actions, such as hugging, fighting, linking arms, talking, and kidnapping in two environment such simple and complex. We show that our trained model with an active semi-supervised learning architecture gradually improves the performance. In a simple environment using an Intelligent Technology Laboratory (ITLab) dataset from Inha University, performance increased to 95.6% accuracy, and in a complex environment, performance reached 81% accuracy. Our method reduces data-labeling time, compared to supervised learning methods, for the ITLab dataset. We also conduct extensive experiment on Human Action Recognition benchmarks such as UT-Interaction dataset, HMDB51 dataset and obtain better performance than state-of-the-art approaches.

Application of Deep Learning-Based Nuclear Medicine Lung Study Classification Model (딥러닝 기반의 핵의학 폐검사 분류 모델 적용)

  • Jeong, Eui-Hwan;Oh, Joo-Young;Lee, Ju-Young;Park, Hoon-Hee
    • Journal of radiological science and technology
    • /
    • v.45 no.1
    • /
    • pp.41-47
    • /
    • 2022
  • The purpose of this study is to apply a deep learning model that can distinguish lung perfusion and lung ventilation images in nuclear medicine, and to evaluate the image classification ability. Image data pre-processing was performed in the following order: image matrix size adjustment, min-max normalization, image center position adjustment, train/validation/test data set classification, and data augmentation. The convolutional neural network(CNN) structures of VGG-16, ResNet-18, Inception-ResNet-v2, and SE-ResNeXt-101 were used. For classification model evaluation, performance evaluation index of classification model, class activation map(CAM), and statistical image evaluation method were applied. As for the performance evaluation index of the classification model, SE-ResNeXt-101 and Inception-ResNet-v2 showed the highest performance with the same results. As a result of CAM, cardiac and right lung regions were highly activated in lung perfusion, and upper lung and neck regions were highly activated in lung ventilation. Statistical image evaluation showed a meaningful difference between SE-ResNeXt-101 and Inception-ResNet-v2. As a result of the study, the applicability of the CNN model for lung scintigraphy classification was confirmed. In the future, it is expected that it will be used as basic data for research on new artificial intelligence models and will help stable image management in clinical practice.

A Novel Approach to COVID-19 Diagnosis Based on Mel Spectrogram Features and Artificial Intelligence Techniques

  • Alfaidi, Aseel;Alshahrani, Abdullah;Aljohani, Maha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.195-207
    • /
    • 2022
  • COVID-19 has remained one of the most serious health crises in recent history, resulting in the tragic loss of lives and significant economic impacts on the entire world. The difficulty of controlling COVID-19 poses a threat to the global health sector. Considering that Artificial Intelligence (AI) has contributed to improving research methods and solving problems facing diverse fields of study, AI algorithms have also proven effective in disease detection and early diagnosis. Specifically, acoustic features offer a promising prospect for the early detection of respiratory diseases. Motivated by these observations, this study conceptualized a speech-based diagnostic model to aid in COVID-19 diagnosis. The proposed methodology uses speech signals from confirmed positive and negative cases of COVID-19 to extract features through the pre-trained Visual Geometry Group (VGG-16) model based on Mel spectrogram images. This is used in addition to the K-means algorithm that determines effective features, followed by a Genetic Algorithm-Support Vector Machine (GA-SVM) classifier to classify cases. The experimental findings indicate the proposed methodology's capability to classify COVID-19 and NOT COVID-19 of varying ages and speaking different languages, as demonstrated in the simulations. The proposed methodology depends on deep features, followed by the dimension reduction technique for features to detect COVID-19. As a result, it produces better and more consistent performance than handcrafted features used in previous studies.

Humming: Image Based Automatic Music Composition Using DeepJ Architecture (허밍: DeepJ 구조를 이용한 이미지 기반 자동 작곡 기법 연구)

  • Kim, Taehun;Jung, Keechul;Lee, Insung
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.5
    • /
    • pp.748-756
    • /
    • 2022
  • Thanks to the competition of AlphaGo and Sedol Lee, machine learning has received world-wide attention and huge investments. The performance improvement of computing devices greatly contributed to big data processing and the development of neural networks. Artificial intelligence not only imitates human beings in many fields, but also seems to be better than human capabilities. Although humans' creation is still considered to be better and higher, several artificial intelligences continue to challenge human creativity. The quality of some creative outcomes by AI is as good as the real ones produced by human beings. Sometimes they are not distinguishable, because the neural network has the competence to learn the common features contained in big data and copy them. In order to confirm whether artificial intelligence can express the inherent characteristics of different arts, this paper proposes a new neural network model called Humming. It is an experimental model that combines vgg16, which extracts image features, and DeepJ's architecture, which excels in creating various genres of music. A dataset produced by our experiment shows meaningful and valid results. Different results, however, are produced when the amount of data is increased. The neural network produced a similar pattern of music even though it was a different classification of images, which was not what we were aiming for. However, these new attempts may have explicit significance as a starting point for feature transfer that will be further studied.

Towards Low Complexity Model for Audio Event Detection

  • Saleem, Muhammad;Shah, Syed Muhammad Shehram;Saba, Erum;Pirzada, Nasrullah;Ahmed, Masood
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.175-182
    • /
    • 2022
  • In our daily life, we come across different types of information, for example in the format of multimedia and text. We all need different types of information for our common routines as watching/reading the news, listening to the radio, and watching different types of videos. However, sometimes we could run into problems when a certain type of information is required. For example, someone is listening to the radio and wants to listen to jazz, and unfortunately, all the radio channels play pop music mixed with advertisements. The listener gets stuck with pop music and gives up searching for jazz. So, the above example can be solved with an automatic audio classification system. Deep Learning (DL) models could make human life easy by using audio classifications, but it is expensive and difficult to deploy such models at edge devices like nano BLE sense raspberry pi, because these models require huge computational power like graphics processing unit (G.P.U), to solve the problem, we proposed DL model. In our proposed work, we had gone for a low complexity model for Audio Event Detection (AED), we extracted Mel-spectrograms of dimension 128×431×1 from audio signals and applied normalization. A total of 3 data augmentation methods were applied as follows: frequency masking, time masking, and mixup. In addition, we designed Convolutional Neural Network (CNN) with spatial dropout, batch normalization, and separable 2D inspired by VGGnet [1]. In addition, we reduced the model size by using model quantization of float16 to the trained model. Experiments were conducted on the updated dataset provided by the Detection and Classification of Acoustic Events and Scenes (DCASE) 2020 challenge. We confirm that our model achieved a val_loss of 0.33 and an accuracy of 90.34% within the 132.50KB model size.

Sasang Constitution Classification using Convolutional Neural Network on Facial Images (콘볼루션 신경망 기반의 안면영상을 이용한 사상체질 분류)

  • Ahn, Ilkoo;Kim, Sang-Hyuk;Jeong, Kyoungsik;Kim, Hoseok;Lee, Siwoo
    • Journal of Sasang Constitutional Medicine
    • /
    • v.34 no.3
    • /
    • pp.31-40
    • /
    • 2022
  • Objectives Sasang constitutional medicine is a traditional Korean medicine that classifies humans into four constitutions in consideration of individual differences in physical, psychological, and physiological characteristics. In this paper, we proposed a method to classify Taeeum person (TE) and Non-Taeeum person (NTE), Soeum person (SE) and Non-Soeum person (NSE), and Soyang person (ST) and Non-Soyang person (NSY) using a convolutional neural network with only facial images. Methods Based on the convolutional neural network VGG16 architecture, transfer learning is carried out on the facial images of 3738 subjects to classify TE and NTE, SE and NSE, and SY and NSY. Data augmentation techniques are used to increase classification performance. Results The classification performance of TE and NTE, SE and NSE, and SY and NSY was 77.24%, 85.17%, and 80.18% by F1 score and 80.02%, 85.96%, and 72.76% by Precision-Recall AUC (Area Under the receiver operating characteristic Curve) respectively. Conclusions It was found that Soeum person had the most heterogeneous facial features as it had the best classification performance compared to the rest of the constitution, followed by Taeeum person and Soyang person. The experimental results showed that there is a possibility to classify constitutions only with facial images. The performance is expected to increase with additional data such as BMI or personality questionnaire.

Avocado Classification and Shipping Prediction System based on Transfer Learning Model for Rational Pricing (합리적 가격결정을 위한 전이학습모델기반 아보카도 분류 및 출하 예측 시스템)

  • Seong-Un Yu;Seung-Min Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.329-335
    • /
    • 2023
  • Avocado, a superfood selected by Time magazine and one of the late ripening fruits, is one of the foods with a big difference between local prices and domestic distribution prices. If this sorting process of avocados is automated, it will be possible to lower prices by reducing labor costs in various fields. In this paper, we aim to create an optimal classification model by creating an avocado dataset through crawling and using a number of deep learning-based transfer learning models. Experiments were conducted by directly substituting a deep learning-based transfer learning model from a dataset separated from the produced dataset and fine-tuning the hyperparameters of the model. When an avocado image is input, the model classifies the ripeness of the avocado with an accuracy of over 99%, and proposes a dataset and algorithm that can reduce manpower and increase accuracy in avocado production and distribution households.

Enhancing Alzheimer's Disease Classification using 3D Convolutional Neural Network and Multilayer Perceptron Model with Attention Network

  • Enoch A. Frimpong;Zhiguang Qin;Regina E. Turkson;Bernard M. Cobbinah;Edward Y. Baagyere;Edwin K. Tenagyei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.11
    • /
    • pp.2924-2944
    • /
    • 2023
  • Alzheimer's disease (AD) is a neurological condition that is recognized as one of the primary causes of memory loss. AD currently has no cure. Therefore, the need to develop an efficient model with high precision for timely detection of the disease is very essential. When AD is detected early, treatment would be most likely successful. The most often utilized indicators for AD identification are the Mini-mental state examination (MMSE), and the clinical dementia. However, the use of these indicators as ground truth marking could be imprecise for AD detection. Researchers have proposed several computer-aided frameworks and lately, the supervised model is mostly used. In this study, we propose a novel 3D Convolutional Neural Network Multilayer Perceptron (3D CNN-MLP) based model for AD classification. The model uses Attention Mechanism to automatically extract relevant features from Magnetic Resonance Images (MRI) to generate probability maps which serves as input for the MLP classifier. Three MRI scan categories were considered, thus AD dementia patients, Mild Cognitive Impairment patients (MCI), and Normal Control (NC) or healthy patients. The performance of the model is assessed by comparing basic CNN, VGG16, DenseNet models, and other state of the art works. The models were adjusted to fit the 3D images before the comparison was done. Our model exhibited excellent classification performance, with an accuracy of 91.27% for AD and NC, 80.85% for MCI and NC, and 87.34% for AD and MCI.

Development of a Digital Otoscope-Stethoscope Healthcare Platform for Telemedicine (비대면 원격진단을 위한 디지털 검이경 청진기 헬스케어 플랫폼 개발)

  • Su Young Choi;Hak Yi;Chanyong Park;Subin Joo;Ohwon Kwon;Dongkyu Lee
    • Journal of Biomedical Engineering Research
    • /
    • v.45 no.3
    • /
    • pp.109-117
    • /
    • 2024
  • We developed a device that integrates digital otoscope and stethoscope for telemedicine. The integrated device was utilized for the collection of tympanic membrane images and cardiac auscultation data. Data accumulated on the platform server can support real-time diagnosis of heart and eardrum diseases using artificial intelligence. Public data from Kaggle were used for deep learning. After comparing with various deep learning models, the MobileNetV2 model showed superior performance in analyzing tympanic membrane data, and the VGG16 model excelled in analyzing cardiac data. The classification algorithm achieved an accuracy of 89.9% for eardrums data and 100% for heart sound data. These results demonstrate the possibility of diagnosing diseases without the limitations of time and space by using this platform.

Exploring the Feasibility of Neural Networks for Criminal Propensity Detection through Facial Features Analysis

  • Amal Alshahrani;Sumayyah Albarakati;Reyouf Wasil;Hanan Farouquee;Maryam Alobthani;Someah Al-Qarni
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.11-20
    • /
    • 2024
  • While artificial neural networks are adept at identifying patterns, they can struggle to distinguish between actual correlations and false associations between extracted facial features and criminal behavior within the training data. These associations may not indicate causal connections. Socioeconomic factors, ethnicity, or even chance occurrences in the data can influence both facial features and criminal activity. Consequently, the artificial neural network might identify linked features without understanding the underlying cause. This raises concerns about incorrect linkages and potential misclassification of individuals based on features unrelated to criminal tendencies. To address this challenge, we propose a novel region-based training approach for artificial neural networks focused on criminal propensity detection. Instead of solely relying on overall facial recognition, the network would systematically analyze each facial feature in isolation. This fine-grained approach would enable the network to identify which specific features hold the strongest correlations with criminal activity within the training data. By focusing on these key features, the network can be optimized for more accurate and reliable criminal propensity prediction. This study examines the effectiveness of various algorithms for criminal propensity classification. We evaluate YOLO versions YOLOv5 and YOLOv8 alongside VGG-16. Our findings indicate that YOLO achieved the highest accuracy 0.93 in classifying criminal and non-criminal facial features. While these results are promising, we acknowledge the need for further research on bias and misclassification in criminal justice applications