• Title/Summary/Keyword: Kaggle Dataset

Search Result 27, Processing Time 0.025 seconds

Multiple image classification using label mapping (레이블 매핑을 이용한 다중 이미지 분류)

  • Jeon, Seung-Je;Lee, Dong-jun;Lee, DongHwi
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.367-369
    • /
    • 2022
  • In this paper, the predicted results were confirmed by label mapping for each class while implementing multi-class image classification to confirm accurate results for images in which the trained model failed classification. A CNN model was constructed and trained using Kaggle's Intel Image Classification dataset, and the mapped label values of multiple classes of images and the values classified by the model were compared by label mapping the images of the test dataset.

  • PDF

Analysis of YouTube Trending Video Dataset by Country and Category (YouTube 인기 급상승 동영상 데이터셋의 국가별-카테고리별 분석)

  • Jung, Jimin;Kim, Seungjin;Jung, Sungwook;Lee, Dongyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.209-211
    • /
    • 2022
  • YouTube, a video platform used by millions of people worldwide, provides a rapidly growing video service. This study aims to understand the characteristics and cultural differences of each country using the Kaggle dataset, one of the public datasets, and to show the usefulness of the public dataset. For this purpose, we analyze data from 11 countries, 15 categories, and about 1.1 million trending videos. This study adopts Python to obtain the number of videos by category for data analysis, the selection period of videos rapidly increasing in popularity, and the ratio of unique videos. In the future, based on machine learning, we plan to research to help diagnose individual videos and establish channel operation plans and strategies by predicting the selection possibility and selection period based on machine learning.

  • PDF

On Building the Solar Dataset Form using the Kaggle Platform: The applicability of Machine Learning (캐글 플랫폼 활용한 태양광 데이터셋 형태 구축: 머신 러닝의 적용 가능성)

  • Ko, Ju-won;Park, Jung-jin;Park, Jin-woo;Oh, Do-hee;Kim, Mincheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.255-258
    • /
    • 2022
  • As environmental pollution continues, attention on renewable energy is on the constant rise in recent days. Although various kinds of renewable energy such as solar, wind power and biomass energy have been generated in Jeju, opening and analyzing cases on related data seem insufficient. Therefore, this study is being conducted to deduce the variables which have high relation with solar panel&s output and to understand machine learning methods that can be applied to solar power generation data by utilizing Kaggle platform, which is actively used by a number of scientists. Then, it is planned to propose a form of solar power generation dataset by researching machine learning methods that could be applied to the data. To be specific, analyzing solar power generation data with the Kaggle platform, this study will provide complements on gathering solar power data in Jeju. This study is anticipated to be utilized on data analysis for developing the solar power industry in Jeju. That is, this study is expected to reveal the room for improvement inherent in existing open datasets in Jeju, so that they could be constructed in a suitable form for machine learning for AI analytics. Through this process, a method to increase efficiency of solar power generation is anticipated to be prepared.

  • PDF

Smart Mirror for Facial Expression Recognition Based on Convolution Neural Network (컨볼루션 신경망 기반 표정인식 스마트 미러)

  • Choi, Sung Hwan;Yu, Yun Seop
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.200-203
    • /
    • 2021
  • This paper introduces a smart mirror technology that recognizes a person's facial expressions through image classification among several artificial intelligence technologies and presents them in a mirror. 5 types of facial expression images are trained through artificial intelligence. When someone looks at the smart mirror, the mirror recognizes my expression and shows the recognized result in the mirror. The dataset fer2013 provided by kaggle used the faces of several people to be separated by facial expressions. For image classification, the network structure is trained using convolution neural network (CNN). The face is recognized and presented on the screen in the smart mirror with the embedded board such as Raspberry Pi4.

  • PDF

An Improved Deep Learning Method for Animal Images (동물 이미지를 위한 향상된 딥러닝 학습)

  • Wang, Guangxing;Shin, Seong-Yoon;Shin, Kwang-Weong;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.123-124
    • /
    • 2019
  • This paper proposes an improved deep learning method based on small data sets for animal image classification. Firstly, we use a CNN to build a training model for small data sets, and use data augmentation to expand the data samples of the training set. Secondly, using the pre-trained network on large-scale datasets, such as VGG16, the bottleneck features in the small dataset are extracted and to be stored in two NumPy files as new training datasets and test datasets. Finally, training a fully connected network with the new datasets. In this paper, we use Kaggle famous Dogs vs Cats dataset as the experimental dataset, which is a two-category classification dataset.

  • PDF

Improvement Method of Classification Rate in ML Antivirus systems using Kaggle Datasets (캐글 데이터셋을 이용한 머신러닝 악성코드 분류시스템에서 분류정확도 향상방법)

  • Kim, Kyungshin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.49-52
    • /
    • 2019
  • 머신러닝을 이용한 악성코드 분류 시스템의 대부분이 캐글 데이터셋 10,868건을 사용하여 분류의 정확도를 측정한다. 이 데이터셋에 포함된 바이러스 바이트코드에는 미확인(undefined)필드라는 부분이 과도하게 존재한다. 캐글 데이터셋 특정 Label의 미확인필드 포함도는 75%가 넘는 경우도 존재한다. 이 경우 미확인 필드를 어떻게 처리하느냐가 시스템의 성능에 가장 큰 영향을 끼친다. 본 연구에서는 이러한 캐글 데이터셋의 미확인필드 처리방법을 제시하고 그에 따른 분류 정확도를 연구하였다. 다양한 처리방법에 대한 정확도를 측정하여 제안한 방식의 타당성을 증명하였다.

  • PDF

Feature Selection and Hyper-Parameter Tuning for Optimizing Decision Tree Algorithm on Heart Disease Classification

  • Tsehay Admassu Assegie;Sushma S.J;Bhavya B.G;Padmashree S
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.150-154
    • /
    • 2024
  • In recent years, there are extensive researches on the applications of machine learning to the automation and decision support for medical experts during disease detection. However, the performance of machine learning still needs improvement so that machine learning model produces result that is more accurate and reliable for disease detection. Selecting the hyper-parameter that could produce the possible maximum classification accuracy on medical dataset is the most challenging task in developing decision support systems with machine learning algorithms for medical dataset classification. Moreover, selecting the features that best characterizes a disease is another challenge in developing machine-learning model with better classification accuracy. In this study, we have proposed an optimized decision tree model for heart disease classification by using heart disease dataset collected from kaggle data repository. The proposed model is evaluated and experimental test reveals that the performance of decision tree improves when an optimal number of features are used for training. Overall, the accuracy of the proposed decision tree model is 98.2% for heart disease classification.

Multi-type Image Noise Classification by Using Deep Learning

  • Waqar Ahmed;Zahid Hussain Khand;Sajid Khan;Ghulam Mujtaba;Muhammad Asif Khan;Ahmad Waqas
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.143-147
    • /
    • 2024
  • Image noise classification is a classical problem in the field of image processing, machine learning, deep learning and computer vision. In this paper, image noise classification is performed using deep learning. Keras deep learning library of TensorFlow is used for this purpose. 6900 images images are selected from the Kaggle database for the classification purpose. Dataset for labeled noisy images of multiple type was generated with the help of Matlab from a dataset of non-noisy images. Labeled dataset comprised of Salt & Pepper, Gaussian and Sinusoidal noise. Different training and tests sets were partitioned to train and test the model for image classification. In deep neural networks CNN (Convolutional Neural Network) is used due to its in-depth and hidden patterns and features learning in the images to be classified. This deep learning of features and patterns in images make CNN outperform the other classical methods in many classification problems.

COVID-19: Improving the accuracy using data augmentation and pre-trained DCNN Models

  • Saif Hassan;Abdul Ghafoor;Zahid Hussain Khand;Zafar Ali;Ghulam Mujtaba;Sajid Khan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.170-176
    • /
    • 2024
  • Since the World Health Organization (WHO) has declared COVID-19 as pandemic, many researchers have started working on developing vaccine and developing AI systems to detect COVID-19 patient using Chest X-ray images. The purpose of this work is to improve the performance of pre-trained Deep convolution neural nets (DCNNs) on Chest X-ray images dataset specially COVID-19 which is developed by collecting from different sources such as GitHub, Kaggle. To improve the performance of Deep CNNs, data augmentation is used in this study. The COVID-19 dataset collected from GitHub was containing 257 images while the other two classes normal and pneumonia were having more than 500 images each class. There were two issues whike training DCNN model on this dataset, one is unbalanced and second is the data is very less. In order to handle these both issues, we performed data augmentation such as rotation, flipping to increase and balance the dataset. After data augmentation each class contains 510 images. Results show that augmentation on Chest X-ray images helps in improving accuracy. The accuracy before and after augmentation produced by our proposed architecture is 96.8% and 98.4% respectively.

RDNN: Rumor Detection Neural Network for Veracity Analysis in Social Media Text

  • SuthanthiraDevi, P;Karthika, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3868-3888
    • /
    • 2022
  • A widely used social networking service like Twitter has the ability to disseminate information to large groups of people even during a pandemic. At the same time, it is a convenient medium to share irrelevant and unverified information online and poses a potential threat to society. In this research, conventional machine learning algorithms are analyzed to classify the data as either non-rumor data or rumor data. Machine learning techniques have limited tuning capability and make decisions based on their learning. To tackle this problem the authors propose a deep learning-based Rumor Detection Neural Network model to predict the rumor tweet in real-world events. This model comprises three layers, AttCNN layer is used to extract local and position invariant features from the data, AttBi-LSTM layer to extract important semantic or contextual information and HPOOL to combine the down sampling patches of the input feature maps from the average and maximum pooling layers. A dataset from Kaggle and ground dataset #gaja are used to train the proposed Rumor Detection Neural Network to determine the veracity of the rumor. The experimental results of the RDNN Classifier demonstrate an accuracy of 93.24% and 95.41% in identifying rumor tweets in real-time events.