• Title/Summary/Keyword: Decision Tree analysis

Search Result 725, Processing Time 0.026 seconds

Performance Analysis of Opinion Mining using Word2vec (Word2vec을 이용한 오피니언 마이닝 성과분석 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.7-8
    • /
    • 2018
  • This study proposes an analysis of the Word2vec-based machine learning classifiers for the sake of opinion mining tasks. As a bench-marking method, BOW (Bag-of-Words) was adopted. On the basis of utilizing the Word2vec and BOW as feature extraction methods, we applied Laptop and Restaurant dataset to LR, DT, SVM, RF classifiers. The results showed that the Word2vec feature extraction yields more improved performance.

  • PDF

A Study on the Insider Behavior Analysis Using Machine Learning for Detecting Information Leakage (정보 유출 탐지를 위한 머신 러닝 기반 내부자 행위 분석 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.2
    • /
    • pp.1-11
    • /
    • 2017
  • In this paper, we design and implement PADIL(Prediction And Detection of Information Leakage) system that predicts and detect information leakage behavior of insider by analyzing network traffic and applying a variety of machine learning methods. we defined the five-level information leakage model(Reconnaissance, Scanning, Access and Escalation, Exfiltration, Obfuscation) by referring to the cyber kill-chain model. In order to perform the machine learning for detecting information leakage, PADIL system extracts various features by analyzing the network traffic and extracts the behavioral features by comparing it with the personal profile information and extracts information leakage level features. We tested various machine learning methods and as a result, the DecisionTree algorithm showed excellent performance in information leakage detection and we showed that performance can be further improved by fine feature selection.

The Identification of the Characteristics of Cancer Patients Who Defected to Other Medical Institutions (타 의료기관으로 이탈한 암환자의 특성 파악)

  • Cha, Jae-Bin;Nam, Jung-He;Ahn, Sung-Sik
    • The Korean Journal of Health Service Management
    • /
    • v.7 no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This study intends to identify the characteristics of cancer in-patients and those of cancer patients who defected to other medical institutions based on the summary of hospital discharge information of a university hospital for the purpose of improving work efficiency and maximizing the number of patients. The study used data on cancer patients registered in the database of C University Hospital in Gyeonggi Province for a period of one year between January 1 and December 31. The analysis results suggest that the commonalities of the cancer patients who defected to other medical institutions include no specific job, old age, and hospitalization through emergency room. In conclusion, hospitals need to identify the characteristics of cancer patients classified as patients who are prone to defect and the defection factors through this prediction model.

A Detailed Analysis of Classifier Ensembles for Intrusion Detection in Wireless Network

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1203-1212
    • /
    • 2017
  • Intrusion detection systems (IDSs) are crucial in this overwhelming increase of attacks on the computing infrastructure. It intelligently detects malicious and predicts future attack patterns based on the classification analysis using machine learning and data mining techniques. This paper is devoted to thoroughly evaluate classifier ensembles for IDSs in IEEE 802.11 wireless network. Two ensemble techniques, i.e. voting and stacking are employed to combine the three base classifiers, i.e. decision tree (DT), random forest (RF), and support vector machine (SVM). We use area under ROC curve (AUC) value as a performance metric. Finally, we conduct two statistical significance tests to evaluate the performance differences among classifiers.

Machine Learning based Seismic Response Prediction Methods for Steel Frame Structures (기계학습 기반 강 구조물 지진응답 예측기법)

  • Lee, Seunghye;Lee, Jaehong
    • Journal of Korean Association for Spatial Structures
    • /
    • v.24 no.2
    • /
    • pp.91-99
    • /
    • 2024
  • In this paper, machine learning models were applied to predict the seismic response of steel frame structures. Both geometric and material nonlinearities were considered in the structural analysis, and nonlinear inelastic dynamic analysis was performed. The ground acceleration response of the El Centro earthquake was applied to obtain the displacement of the top floor, which was used as the dataset for the machine learning methods. Learning was performed using two methods: Decision Tree and Random Forest, and their efficiency was demonstrated through application to 2-story and 6-story 3-D steel frame structure examples.

A Study on the Remain Life with Aging in 22kV CV cable (22kV 전력케이블의 열화 판정에 관한 연구)

  • Lee, Kwan-Woo;Mok, Young-Soo;Kim, Bo-Kyeong;Park, Bok-Ki;Park, Dae-Hee
    • Proceedings of the KIEE Conference
    • /
    • 2003.10a
    • /
    • pp.19-21
    • /
    • 2003
  • In this paper, we studied on life-decision of underground cable of live-lines state. As all equipments have been wear, underground cables decided design-life on the whole 30 years because underground cable have been occurred aging as time goes. CV cable has been become about 30 years after installation in the South Korea, is come to a important point of time with estimation about life. Study target cable is 22 kV CV cables in this point of view and installation cable is about 10 years before and behind. Measurement method used dc leakage method of live-lines state that applied voltage of 50V in neutral point and data is analyzing result that is measured during 5 years. In this result, insulation resistance could confirm that change according to season and cause is effect of humidity, seasons and load current. Also, according as data is gone aging, insulation resistance by Weibull distribution could confirm functionally its decrease. As a result, the aging speed of cable that water tree is gone could confirm fastness very. Numerical analysis result, cable that water tree is not gone could confirm that life of cable that has passed 10 years remains about $10{\sim}20$ years.

  • PDF

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

  • Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
    • Steel and Composite Structures
    • /
    • v.37 no.2
    • /
    • pp.193-209
    • /
    • 2020
  • In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.

Using a Hybrid Model of DEA and Decision Tree Algorithm C5.0 to Evaluate the Efficiency of Ports (DEA와 의사결정 나무(C5.0)의 하이브리드 모델을 사용한 항만의 효율성 평가)

  • Hong, Han-Kook;Leem, Byung-hak;Kim, Sam-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.7
    • /
    • pp.99-109
    • /
    • 2019
  • Data Envelopment Analysis (DEA), a non-parametric productivity analysis tool, has become an accepted approach for assessing efficiency in a wide range of fields. Despite of its extensive applications, some features of DEA remain bothersome. For example DEA is good at estimating "relative" efficiency of a DMU(Decision Making Unit), it only tells us how well we are doing compared with our peers but not compared with a "theoretical maximum." Thus, in order to measure efficiency of a new DMU, we have to develop entirely new DEA with the data of previously used DMUs. Also we cannot predict the efficiency level of the new DMU without another DEA analysis. We aim to show that DEA can be used to evaluate the efficiency of ports and suggest the methodology which overcomes the limitation of DEA through hybrid analysis utilizing DEA along with C5.0. We can generate classification rules C5.0 in order to classify any new Port without perturbing previously existing evaluation structures by proposed methodology.

a Study on Using Social Big Data for Expanding Analytical Knowledge - Domestic Big Data supply-demand expectation - (분석지의 확장을 위한 소셜 빅데이터 활용연구 - 국내 '빅데이터' 수요공급 예측 -)

  • Kim, Jung-Sun;Kwon, Eun-Ju;Song, Tae-Min
    • Knowledge Management Research
    • /
    • v.15 no.3
    • /
    • pp.169-188
    • /
    • 2014
  • Big data seems to change knowledge management system and method of enterprises to large extent. Further, the type of method for utilization of unstructured data including image, v ideo, sensor data a nd text may determine the decision on expansion of knowledge management of the enterprise or government. This paper, in this light, attempts to figure out the prediction model of demands and supply for big data market of Korea trough data mining decision making tree by utilizing text bit data generated for 3 years on web and SNS for expansion of form for knowledge management. The results indicate that the market focused on H/W and storage leading by the government is big data market of Korea. Further, the demanders of big data have been found to put important on attribute factors including interest, quickness and economics. Meanwhile, innovation and growth have been found to be the attribute factors onto which the supplier puts importance. The results of this research show that the factors affect acceptance of big data technology differ for supplier and demander. This article may provide basic method for study on expansion of analysis form of enterprise and connection with its management activities.

  • PDF

Reliability Centered Maintenance (보전에 중점을 둔 신뢰성)

  • 김환중
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2002.11a
    • /
    • pp.199-204
    • /
    • 2002
  • Reliability Centered Maintenance(RCM) was initially developed for the commercial aviation industry in the late 1960s and now is equally applicable to a variety of equipment other than aircraft. RCM is a method for establishing a preventive maintenance program which will efficiently and effectively allow the achivement of the required safety and availability levels of equipment and structures. RCM provides for the use of a decision logic tree to identify applicable and effective preventive maintenance requirements for equipment and structures. The end result of working through the decision logic is a judgement as to the necessity of performing a maintenance task. In this paper, we provide guiding principles based on IEC 60300-3-11 for RCM analysis methods and operational method of structure and equipment.

  • PDF