• Title/Summary/Keyword: 랜덤포레스트 분류기

Search Result 32, Processing Time 0.037 seconds

Activity Type Detection Of Random Forest Model Using UWB Radar And Indoor Environmental Measurement Sensor (UWB 레이더와 실내 환경 측정 센서를 이용한 랜덤 포레스트 모델의 재실활동 유형 감지)

  • Park, Jin Su;Jeong, Ji Seong;Yang, Chul Seung;Lee, Jeong Gi
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.899-904
    • /
    • 2022
  • As the world becomes an aging society due to a decrease in the birth rate and an increase in life expectancy, a system for health management of the elderly population is needed. Among them, various studies on occupancy and activity types are being conducted for smart home care services for indoor health management. In this paper, we propose a random forest model that classifies activity type as well as occupancy status through indoor temperature and humidity, CO2, fine dust values and UWB radar positioning for smart home care service. The experiment measures indoor environment and occupant positioning data at 2-second intervals using three sensors that measure indoor temperature and humidity, CO2, and fine dust and two UWB radars. The measured data is divided into 80% training set data and 20% test set data after correcting outliers and missing values, and the random forest model is applied to evaluate the list of important variables, accuracy, sensitivity, and specificity.

A Study on Classification Models for Predicting Bankruptcy using XAI (XAI 를 활용한 기업 부도예측 분류모델 연구)

  • Kim, Jihong;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.571-573
    • /
    • 2022
  • 최근 금융기관에서는 축적된 금융 빅데이터를 활용하여 차별화된 서비스를 강화하고 있다. 기업고객에 투자하기 위해서는 보다 정밀한 기업분석이 필요하다. 본 연구는 대만기업 6,819개의 95개 재무데이터를 가지고, 비대칭 데이터 문제해결, 데이터 표준화 등 데이터 전처리 작업을 하였다. 해당 데이터는 로지스틱 회기, SVM, K-NN, 나이브 베이즈, 의사결정나무, 랜덤포레스트 등 9가지 분류모델에 5겹 교차검증을 적용하여 학습한 후 모델 성능을 비교하였다. 이 중에서 성능이 가장 우수한 분류모델을 선택하여 예측 결정 이유를 판단하고자 설명 가능한 인공지능(XAI)을 적용하여 예측 결과에 대한 설명을 부여하여 이를 분석하였다. 본 연구를 통해 데이터 전처리에서부터 모델 예측 결과 설명에 이르는 분류예측모델의 전주기를 자동화하는 시스템을 제시하고자 한다.

A New Method for Engagement Analysis in Online Game using ECG (심전도를 이용한 온라인 게임 몰입 상태 분석 방법)

  • Kim, Young-Jin;Kang, Hang-Bong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.975-977
    • /
    • 2014
  • 논문은 심전도를 이용하여 디지털 환경에서의 이용자의 몰입 상태를 측정하기 위한 시스템에 대한 것이다. 기본적으로 피험자에게 설문을 통하여 얻어낸 시간 단위의 몰입 여부에 대해서 측정된 심전도와의 상관 관계를 랜덤 포레스트를 이용하여 학습된 분류기를 이용하여 분석한다.

Detecting Fake Job Recruitment with a Machine Learning Approach (머신 러닝 접근 방식을 통한 가짜 채용 탐지)

  • Taghiyev Ilkin;Jae Heung Lee
    • Smart Media Journal
    • /
    • v.12 no.2
    • /
    • pp.36-41
    • /
    • 2023
  • With the advent of applicant tracking systems, online recruitment has become more popular, and recruitment fraud has become a serious problem. This research aims to develop a reliable model to detect recruitment fraud in online recruitment environments to reduce cost losses and enhance privacy. The main contribution of this paper is to provide an automated methodology that leverages insights gained from exploratory analysis of data to distinguish which job postings are fraudulent and which are legitimate. Using EMSCAD, a recruitment fraud dataset provided by Kaggle, we trained and evaluated various single-classifier and ensemble-classifier-based machine learning models, and found that the ensemble classifier, the random forest classifier, performed best with an accuracy of 98.67% and an F1 score of 0.81.

Object Classification Using Point Cloud and True Ortho-image by Applying Random Forest and Support Vector Machine Techniques (랜덤포레스트와 서포트벡터머신 기법을 적용한 포인트 클라우드와 실감정사영상을 이용한 객체분류)

  • Seo, Hong Deok;Kim, Eui Myoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.405-416
    • /
    • 2019
  • Due to the development of information and communication technology, the production and processing speed of data is getting faster. To classify objects using machine learning, which is a field of artificial intelligence, data required for training can be easily collected due to the development of internet and geospatial information technology. In the field of geospatial information, machine learning is also being applied to classify or recognize objects using images and point clouds. In this study, the problem of manually constructing training data using existing digital map version 1.0 was improved, and the technique of classifying roads, buildings and vegetation using image and point clouds were proposed. Through experiments, it was possible to classify roads, buildings, and vegetation that could clearly distinguish colors when using true ortho-image with only RGB (Red, Green, Blue) bands. However, if the colors of the objects to be classified are similar, it was possible to identify the limitations of poor classification of the objects. To improve the limitations, random forest and support vector machine techniques were applied after band fusion of true ortho-image and normalized digital surface model, and roads, buildings, and vegetation were classified with more than 85% accuracy.

Deep Learning based Scrapbox Accumulated Status Measuring

  • Seo, Ye-In;Jeong, Eui-Han;Kim, Dong-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.27-32
    • /
    • 2020
  • In this paper, we propose an algorithm to measure the accumulated status of scrap boxes where metal scraps are accumulated. The accumulated status measuring is defined as a multi-class classification problem, and the method with deep learning classify the accumulated status using only the scrap box image. The learning was conducted by the Transfer Learning method, and the deep learning model was NASNet-A. In order to improve the accuracy of the model, we combined the Random Forest classifier with the trained NASNet-A and improved the model through post-processing. Testing with 4,195 data collected in the field showed 55% accuracy when only NASNet-A was applied, and the proposed method, NASNet with Random Forest, improved the accuracy by 88%.

Modeling and Selecting Optimal Features for Machine Learning Based Detections of Android Malwares (머신러닝 기반 악성 안드로이드 모바일 앱의 최적특징점 선정 및 모델링 방안 제안)

  • Lee, Kye Woong;Oh, Seung Taek;Yoon, Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.164-167
    • /
    • 2019
  • 모바일 운영체제 중 안드로이드의 점유율이 높아지면서 모바일 악성코드 위협은 대부분 안드로이드에서 발생하고 있다. 그러나 정상앱이나 악성앱이 진화하면서 권한 등의 단일 특징점으로 악성여부를 연구하는 방법은 유효성 문제가 발생하여 본 논문에서는 다양한 특징점 추출 및 기계학습을 활용하여 극복하고자 한다. 본 논문에서는 APK 파일에서 구동에 필요한 다섯 종류의 특징점들을 안드로가드라는 정적분석 툴을 통해 학습데이터의 특성을 추출한다. 또한 추출된 중요 특징점을 기반으로 모델링을 하는 세 가지 방법을 제시한다. 첫 번째 방법은 보안 전문가에 의해 엄선된 132가지의 특징점 조합을 바탕으로 모델링하는 것이다. 두 번째는 학습 데이터 7,000개의 앱에서 발생 빈도수가 높은 상위 99%인 8,004가지의 특징점들 중 랜덤포레스트 분류기를 이용하여 특성중요도가 가장 높은 300가지를 선정 후 모델링 하는 방법이다. 마지막 방법은 300가지의 특징점을 학습한 다수의 모델을 통합하여 하나의 가중치 투표 모델을 구성하는 방법이다. 최종적으로 가중치 투표 모델인 앙상블 알고리즘 모델을 사용하여 97퍼센트로 정확도가 개선되었고 오탐률도 1.6%로 성능이 개선되었다.

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

Applicability of Image Classification Using Deep Learning in Small Area : Case of Agricultural Lands Using UAV Image (딥러닝을 이용한 소규모 지역의 영상분류 적용성 분석 : UAV 영상을 이용한 농경지를 대상으로)

  • Choi, Seok-Keun;Lee, Soung-Ki;Kang, Yeon-Bin;Seong, Seon-Kyeong;Choi, Do-Yeon;Kim, Gwang-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Recently, high-resolution images can be easily acquired using UAV (Unmanned Aerial Vehicle), so that it is possible to produce small area observation and spatial information at low cost. In particular, research on the generation of cover maps in crop production areas is being actively conducted for monitoring the agricultural environment. As a result of comparing classification performance by applying RF(Random Forest), SVM(Support Vector Machine) and CNN(Convolutional Neural Network), deep learning classification method has many advantages in image classification. In particular, land cover classification using satellite images has the advantage of accuracy and time of classification using satellite image data set and pre-trained parameters. However, UAV images have different characteristics such as satellite images and spatial resolution, which makes it difficult to apply them. In order to solve this problem, we conducted a study on the application of deep learning algorithms that can be used for analyzing agricultural lands where UAV data sets and small-scale composite cover exist in Korea. In this study, we applied DeepLab V3 +, FC-DenseNet (Fully Convolutional DenseNets) and FRRN-B (Full-Resolution Residual Networks), the semantic image classification of the state-of-art algorithm, to UAV data set. As a result, DeepLab V3 + and FC-DenseNet have an overall accuracy of 97% and a Kappa coefficient of 0.92, which is higher than the conventional classification. The applicability of the cover classification using UAV images of small areas is shown.

Stock Price Direction Prediction Using Convolutional Neural Network: Emphasis on Correlation Feature Selection (합성곱 신경망을 이용한 주가방향 예측: 상관관계 속성선택 방법을 중심으로)

  • Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.4
    • /
    • pp.21-39
    • /
    • 2020
  • Recently, deep learning has shown high performance in various applications such as pattern analysis and image classification. Especially known as a difficult task in the field of machine learning research, stock market forecasting is an area where the effectiveness of deep learning techniques is being verified by many researchers. This study proposed a deep learning Convolutional Neural Network (CNN) model to predict the direction of stock prices. We then used the feature selection method to improve the performance of the model. We compared the performance of machine learning classifiers against CNN. The classifiers used in this study are as follows: Logistic Regression, Decision Tree, Neural Network, Support Vector Machine, Adaboost, Bagging, and Random Forest. The results of this study confirmed that the CNN showed higher performancecompared with other classifiers in the case of feature selection. The results show that the CNN model effectively predicted the stock price direction by analyzing the embedded values of the financial data