Search | Korea Science

Finding a plan to improve recognition rate using classification analysis

Kim, SeungJae;Kim, SungHwan
- International journal of advanced smart convergence
- /
- v.9 no.4
- /
- pp.184-191
- /
- 2020
With the emergence of the 4th Industrial Revolution, core technologies that will lead the 4th Industrial Revolution such as AI (artificial intelligence), big data, and Internet of Things (IOT) are also at the center of the topic of the general public. In particular, there is a growing trend of attempts to present future visions by discovering new models by using them for big data analysis based on data collected in a specific field, and inferring and predicting new values with the models. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable, the correlation between the variables, and multicollinearity. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified according to the purpose of analysis. Therefore, in this study, data is classified using a decision tree technique and a random forest technique among classification analysis, which is a machine learning technique that implements AI technology. And by evaluating the degree of classification of the data, we try to find a way to improve the classification and analysis rate of the data.
https://doi.org/10.7236/IJASC.2020.9.4.184 인용 PDF KSCI

Classification Model of Types of Crime based on Random-Forest Algorithms and Monitoring Interface Design Factors for Real-time Crime Prediction (실시간 범죄 예측을 위한 랜덤포레스트 알고리즘 기반의 범죄 유형 분류모델 및 모니터링 인터페이스 디자인 요소 제안)

Park, Joonyoung;Chae, Myungsu;Jung, Sungkwan
- KIISE Transactions on Computing Practices
- /
- v.22 no.9
- /
- pp.455-460
- /
- 2016
Recently, with more severe types felonies such as robbery and sexual violence, the importance of crime prediction and prevention is emphasized. For accurate and prompt crime prediction and prevention, both a classification model of crime with high accuracy based on past criminal records and well-designed system interface are required. However previous studies on the analysis of crime factors have limitations in terms of accuracy due to the difficulty of data preprocessing. In addition, existing crime monitoring systems merely offer a vast amount of crime analysis results, thereby they fail to provide users with functions for more effective monitoring. In this paper, we propose a classification model for types of crime based on random-forest algorithms and system design factors for real-time crime prediction. From our experiments, we proved that our proposed classification model is superior to others that only use criminal records in terms of accuracy. Through the analysis of existing crime monitoring systems, we also designed and developed a system for real-time crime monitoring.
https://doi.org/10.5626/KTCP.2016.22.9.455 인용 KSCI

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

Park, Dain;Yoon, Sanghoo
- Journal of the Korean Data and Information Science Society
- /
- v.28 no.2
- /
- pp.395-406
- /
- 2017
The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.
https://doi.org/10.7465/jkdi.2017.28.2.395 인용 PDF KSCI

Analysis of Malware Group Classification with eXplainable Artificial Intelligence (XAI기반 악성코드 그룹분류 결과 해석 연구)

Kim, Do-yeon;Jeong, Ah-yeon;Lee, Tae-jin
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.31 no.4
- /
- pp.559-571
- /
- 2021
Along with the increase prevalence of computers, the number of malware distributions by attackers to ordinary users has also increased. Research to detect malware continues to this day, and in recent years, research on malware detection and analysis using AI is focused. However, the AI algorithm has a disadvantage that it cannot explain why it detects and classifies malware. XAI techniques have emerged to overcome these limitations of AI and make it practical. With XAI, it is possible to provide a basis for judgment on the final outcome of the AI. In this paper, we conducted malware group classification using XGBoost and Random Forest, and interpreted the results through SHAP. Both classification models showed a high classification accuracy of about 99%, and when comparing the top 20 API features derived through XAI with the main APIs of malware, it was possible to interpret and understand more than a certain level. In the future, based on this, a direct AI reliability improvement study will be conducted.
https://doi.org/10.13089/JKIISC.2021.31.4.559 인용 PDF KSCI HTML

Ensemble Learning for Underwater Target Classification (수중 표적 식별을 위한 앙상블 학습)

Seok, Jongwon
- Journal of Korea Multimedia Society
- /
- v.18 no.11
- /
- pp.1261-1267
- /
- 2015
The problem of underwater target detection and classification has been attracted a substantial amount of attention and studied from many researchers for both military and non-military purposes. The difficulty is complicate due to various environmental conditions. In this paper, we study classifier ensemble methods for active sonar target classification to improve the classification performance. In general, classifier ensemble method is useful for classifiers whose variances relatively large such as decision trees and neural networks. Bagging, Random selection samples, Random subspace and Rotation forest are selected as classifier ensemble methods. Using the four ensemble methods based on 31 neural network classifiers, the classification tests were carried out and performances were compared.
https://doi.org/10.9717/kmms.2015.18.11.1261 인용 PDF KSCI KPUBS HTML

Simple hypotheses testing for the number of trees in a random forest

Park, Cheol-Yong
- Journal of the Korean Data and Information Science Society
- /
- v.21 no.2
- /
- pp.371-377
- /
- 2010
In this study, we propose two informal hypothesis tests which may be useful in determining the number of trees in a random forest for use in classification. The first test declares that a case is 'easy' if the hypothesis of the equality of probabilities of two most popular classes is rejected. The second test declares that a case is 'hard' if the hypothesis that the relative difference or the margin of victory between the probabilities of two most popular classes is greater than or equal to some small number, say 0.05, is rejected. We propose to continue generating trees until all (or all but a small fraction) of the training cases are declared easy or hard. The advantage of combining the second test along with the first test is that the number of trees required to stop becomes much smaller than the first test only, where all (or all but a small fraction) of the training cases should be declared easy.
PDF KSCI

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

Young-Jin, Han;In-Whee, Joe
- KIPS Transactions on Computer and Communication Systems
- /
- v.11 no.12
- /
- pp.445-452
- /
- 2022
Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.
https://doi.org/10.3745/KTCCS.2022.11.12.445 인용 PDF KSCI

Accuracy Evaluation of Supervised Classification by Using Morphological Attribute Profiles and Additional Band of Hyperspectral Imagery (초분광 영상의 Morphological Attribute Profiles와 추가 밴드를 이용한 감독분류의 정확도 평가)

Park, Hong Lyun;Choi, Jae Wan
- Journal of Korean Society for Geospatial Information Science
- /
- v.25 no.1
- /
- pp.9-17
- /
- 2017
Hyperspectral imagery is used in the land cover classification with the principle component analysis and minimum noise fraction to reduce the data dimensionality and noise. Recently, studies on the supervised classification using various features having spectral information and spatial characteristic have been carried out. In this study, principle component bands and normalized difference vegetation index(NDVI) was utilized in the supervised classification for the land cover classification. To utilize additional information not included in the principle component bands by the hyperspectral imagery, we tried to increase the classification accuracy by using the NDVI. In addition, the extended attribute profiles(EAP) generated using the morphological filter was used as the input data. The random forest algorithm, which is one of the representative supervised classification, was used. The classification accuracy according to the application of various features based on EAP was compared. Two areas was selected in the experiments, and the quantitative evaluation was performed by using reference data. The classification accuracy of the proposed algorithm showed the highest classification accuracy of 85.72% and 91.14% compared with existing algorithms. Further research will need to develop a supervised classification algorithm and additional input datasets to improve the accuracy of land cover classification using hyperspectral imagery.
https://doi.org/10.7319/kogsis.2017.25.1.009 인용 PDF KSCI

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
- Steel and Composite Structures
- /
- v.37 no.2
- /
- pp.193-209
- /
- 2020
In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.
https://doi.org/10.12989/scs.2020.37.2.193 인용 KSCI

Developing Degenerative Arthritis Patient Classification Algorithm based on 3D Walking Video (3차원 보행 영상 기반 퇴행성 관절염 환자 분류 알고리즘 개발)

Tea-Ho Kang;Si-Yul Sung;Sang-Hyeok Han;Dong-Hyun Park;Sungwoo Kang
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.46 no.3
- /
- pp.161-169
- /
- 2023
Degenerative arthritis is a common joint disease that affects many elderly people and is typically diagnosed through radiography. However, the need for remote diagnosis is increasing because knee pain and walking disorders caused by degenerative arthritis make face-to-face treatment difficult. This study collects three-dimensional joint coordinates in real time using Azure Kinect DK and calculates 6 gait features through visualization and one-way ANOVA verification. The random forest classifier, trained with these characteristics, classified degenerative arthritis with an accuracy of 97.52%, and the model's basis for classification was identified through classification algorithm by features. Overall, this study not only compensated for the shortcomings of existing diagnostic methods, but also constructed a high-accuracy prediction model using statistically verified gait features and provided detailed prediction results.
https://doi.org/10.11627/jksie.2023.46.3.161 인용 PDF

Search Result 299, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)