• Title/Summary/Keyword: 랜덤포레스트 분류기

Search Result 32, Processing Time 0.03 seconds

Modeling and Selecting Optimal Features for Machine Learning Based Detections of Android Malwares (머신러닝 기반 안드로이드 모바일 악성 앱의 최적 특징점 선정 및 모델링 방안 제안)

  • Lee, Kye Woong;Oh, Seung Taek;Yoon, Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.11
    • /
    • pp.427-432
    • /
    • 2019
  • In this paper, we propose three approaches to modeling Android malware. The first method involves human security experts for meticulously selecting feature sets. With the second approach, we choose 300 features with the highest importance among the top 99% features in terms of occurrence rate. The third approach is to combine multiple models and identify malware through weighted voting. In addition, we applied a novel method of eliminating permission information which used to be regarded as a critical factor for distinguishing malware. With our carefully generated feature sets and the weighted voting by the ensemble algorithm, we were able to reach the highest malware detection accuracy of 97.8%. We also verified that discarding the permission information lead to the improvement in terms of false positive and false negative rates.

Hand Gesture Recognition from Kinect Sensor Data (키넥트 센서 데이터를 이용한 손 제스처 인식)

  • Cho, Sun-Young;Byun, Hye-Ran;Lee, Hee-Kyung;Cha, Ji-Hun
    • Journal of Broadcast Engineering
    • /
    • v.17 no.3
    • /
    • pp.447-458
    • /
    • 2012
  • We present a method to recognize hand gestures using skeletal joint data obtained from Microsoft's Kinect sensor. We propose a combination feature of multi-angle histograms robust to orientation variations to represent the observation sequence of skeletons. The proposed feature efficiently represents the orientation variations of gestures that can be occurred according to person or environment by combining the multiple angle histograms with various angular-quantization levels. The gesture represented as combination of multi-angle histograms and random decision forest classifier improve the recognition performance. We conduct the experiments in hand gesture dataset obtained from a kinect sensor and show that our method outperforms the other methods by comparing the recognition performance.

A Method for Identifying Nicknames of a User based on User Behavior Patterns in an Online Community (온라인 커뮤니티 사용자의 행동 패턴을 고려한 동일 사용자의 닉네임 식별 기법)

  • Park, Sang-Hyun;Park, Seog
    • Journal of KIISE
    • /
    • v.45 no.2
    • /
    • pp.165-174
    • /
    • 2018
  • An online community is a virtual group whose members share their interests and hobbies anonymously with nicknames unlike Social Network Services. However, there are malicious user problems such as users who write offensive contents and there may exist data fragmentation problems in which the data of the same user exists in different nicknames. In addition, nicknames are frequently changed in the online community, so it is difficult to identify them. Therefore, in this paper, to remedy these problems we propose a behavior pattern feature vectors for users considering online community characteristics, propose a new implicit behavior pattern called relationship pattern, and identify the nickname of the same user based on Random Forest classifier. Also, Experimental results with the collected real world online community data demonstrate that the proposed behavior pattern and classifier can identify the same users at a meaningful level.

Smartphone Addiction Detection Based Emotion Detection Result Using Random Forest (랜덤 포레스트를 이용한 감정인식 결과를 바탕으로 스마트폰 중독군 검출)

  • Lee, Jin-Kyu;Kang, Hyeon-Woo;Kang, Hang-Bong
    • Journal of IKEEE
    • /
    • v.19 no.2
    • /
    • pp.237-243
    • /
    • 2015
  • Recently, eight out of ten people have smartphone in Korea. Also, many applications of smartphone have increased. So, smartphone addiction has become a social issue. Especially, many people in smartphone addiction can't control themselves. Sometimes they don't realize that they are smartphone addiction. Many studies, mostly surveys, have been conducted to diagnose smartphone addiction, e.g. S-measure. In this paper, we suggest how to detect smartphone addiction based on ECG and Eye Gaze. We measure the signals of ECG from the Shimmer and the signals of Eye Gaze from the smart eye when the subjects see the emotional video. In addition, we extract features from the S-transform of ECG. Using Eye Gaze signals(pupil diameter, Gaze distance, Eye blinking), we extract 12 features. The classifier is trained using Random Forest. The classifiers detect the smartphone addiction using the ECG and Eye Gaze signals. We compared the detection results with S-measure results that surveyed before test. It showed 87.89% accuracy in ECG and 60.25% accuracy in Eye Gaze.

Comparison of resampling methods for dealing with imbalanced data in binary classification problem (이분형 자료의 분류문제에서 불균형을 다루기 위한 표본재추출 방법 비교)

  • Park, Geun U;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.349-374
    • /
    • 2019
  • A class imbalance problem arises when one class outnumbers the other class by a large proportion in binary data. Studies such as transforming the learning data have been conducted to solve this imbalance problem. In this study, we compared resampling methods among methods to deal with an imbalance in the classification problem. We sought to find a way to more effectively detect the minority class in the data. Through simulation, a total of 20 methods of over-sampling, under-sampling, and combined method of over- and under-sampling were compared. The logistic regression, support vector machine, and random forest models, which are commonly used in classification problems, were used as classifiers. The simulation results showed that the random under sampling (RUS) method had the highest sensitivity with an accuracy over 0.5. The next most sensitive method was an over-sampling adaptive synthetic sampling approach. This revealed that the RUS method was suitable for finding minority class values. The results of applying to some real data sets were similar to those of the simulation.

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.

A Study on Pre-evaluation of Tree Species Classification Possibility of CAS500-4 Using RapidEye Satellite Imageries (농림위성 활용 수종분류 가능성 평가를 위한 래피드아이 영상 기반 시험 분석)

  • Kwon, Soo-Kyung;Kim, Kyoung-Min;Lim, Joongbin
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.291-304
    • /
    • 2021
  • Updating a forest type map is essential for sustainable forest resource management and monitoring to cope with climate change and various environmental problems. According to the necessity of efficient and wide-area forestry remote sensing, CAS500-4 (Compact Advanced Satellite 500-4; The agriculture and forestry satellite) project has been confirmed and scheduled for launch in 2023. Before launching and utilizing CAS500-4, this study aimed to pre-evaluation the possibility of satellite-based tree species classification using RapidEye, which has similar specifications to the CAS500-4. In this study, the study area was the Chuncheon forest management complex, Gangwon-do. The spectral information was extracted from the growing season image. And the GLCM texture information was derived from the growing and non-growing seasons NIR bands. Both information were used to classification with random forest machine learning method. In this study, tree species were classified into nine classes to the coniferous tree (Korean red pine, Korean pine, Japanese larch), broad-leaved trees (Mongolian oak, Oriental cork oak, East Asian white birch, Korean Castanea, and other broad-leaved trees), and mixed forest. Finally, the classification accuracy was calculated by comparing the forest type map and classification results. As a result, the accuracy was 39.41% when only spectral information was used and 69.29% when both spectral information and texture information was used. For future study, the applicability of the CAS500-4 will be improved by substituting additional variables that more effectively reflect vegetation's ecological characteristics.

Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending (P2P 대부 우수 대출자 예측을 위한 합성 소수집단 오버샘플링 기법 성과에 관한 탐색적 연구)

  • Costello, Francis Joseph;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.71-78
    • /
    • 2019
  • This study aims to identify good borrowers within the context of P2P lending. P2P lending is a growing platform that allows individuals to lend and borrow money from each other. Inherent in any loans is credit risk of borrowers and needs to be considered before any lending. Specifically in the context of P2P lending, traditional models fall short and thus this study aimed to rectify this as well as explore the problem of class imbalances seen within credit risk data sets. This study implemented an over-sampling technique known as Synthetic Minority Over-sampling Technique (SMOTE). To test our approach, we implemented five benchmarking classifiers such as support vector machines, logistic regression, k-nearest neighbor, random forest, and deep neural network. The data sample used was retrieved from the publicly available LendingClub dataset. The proposed SMOTE revealed significantly improved results in comparison with the benchmarking classifiers. These results should help actors engaged within P2P lending to make better informed decisions when selecting potential borrowers eliminating the higher risks present in P2P lending.

Fraud Detection System Model Using Generative Adversarial Networks and Deep Learning (생성적 적대 신경망과 딥러닝을 활용한 이상거래탐지 시스템 모형)

  • Ye Won Kim;Ye Lim Yu;Hong Yong Choi
    • Information Systems Review
    • /
    • v.22 no.1
    • /
    • pp.59-72
    • /
    • 2020
  • Artificial Intelligence is establishing itself as a familiar tool from an intractable concept. In this trend, financial sector is also looking to improve the problem of existing system which includes Fraud Detection System (FDS). It is being difficult to detect sophisticated cyber financial fraud using original rule-based FDS. This is because diversification of payment environment and increasing number of electronic financial transactions has been emerged. In order to overcome present FDS, this paper suggests 3 types of artificial intelligence models, Generative Adversarial Network (GAN), Deep Neural Network (DNN), and Convolutional Neural Network (CNN). GAN proves how data imbalance problem can be developed while DNN and CNN show how abnormal financial trading patterns can be precisely detected. In conclusion, among the experiments on this paper, WGAN has the highest improvement effects on data imbalance problem. DNN model reflects more effects on fraud classification comparatively.

Wafer bin map failure pattern recognition using hierarchical clustering (계층적 군집분석을 이용한 반도체 웨이퍼의 불량 및 불량 패턴 탐지)

  • Jeong, Joowon;Jung, Yoonsuh
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.407-419
    • /
    • 2022
  • The semiconductor fabrication process is complex and time-consuming. There are sometimes errors in the process, which results in defective die on the wafer bin map (WBM). We can detect the faulty WBM by finding some patterns caused by dies. When one manually seeks the failure on WBM, it takes a long time due to the enormous number of WBMs. We suggest a two-step approach to discover the probable pattern on the WBMs in this paper. The first step is to separate the normal WBMs from the defective WBMs. We adapt a hierarchical clustering for de-noising, which nicely performs this work by wisely tuning the number of minimum points and the cutting height. Once declared as a faulty WBM, then it moves to the next step. In the second step, we classify the patterns among the defective WBMs. For this purpose, we extract features from the WBM. Then machine learning algorithm classifies the pattern. We use a real WBM data set (WM-811K) released by Taiwan semiconductor manufacturing company.