• 제목/요약/키워드: Random forest algorithm

검색결과 216건 처리시간 0.029초

웹사이트 특징을 이용한 휴리스틱 피싱 탐지 방안 연구 (Phishing Detection Methodology Using Web Sites Heuristic)

  • 이진이;박두호;이창훈
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제4권10호
    • /
    • pp.349-360
    • /
    • 2015
  • 웹을 이용하는 사용자가 증가함에 따라 피싱 공격이 점차 증가하고 있다. 다양한 피싱 공격에 효과적으로 대응하기 위해서는 피싱 공격에 대한 올바른 이해가 필요하며 적절한 대응 방법을 활용할 수 있어야 한다. 이를 위해 본 논문에서는 피싱 공격의 절차를 접근 유도 단계와 공격 실행 단계로 정의하고 각 단계에서 발생하는 피싱 공격의 유형을 분석한다. 이와 같은 분석을 통해 피싱 공격에 대한 인식을 재고하고 피싱 공격의 피해를 사전에 예방할 수 있다. 또한, 분석된 내용을 기반으로 각 피싱 유형에 대한 대응 방안을 제시한다. 제안하는 대응 방안은 각 단계별로 적합한 웹사이트 특징을 활용한 방식이다. 대응 방안의 유효성을 판단하기 위하여 제안한 특징 추출 방안을 통해 휴리스틱 기반 악성 사이트 분류 모델을 생성하고 각 모델의 정확도를 검증한다. 결론적으로 본 논문에서 제안하는 방안은 안티 피싱 기술을 강화하는 기초가 되고 웹사이트 보안 강화의 기반이 된다.

Effects of a Newly Designed Pelvic Belt Orthosis on Functional Mobility of Adults with Post-Stroke Hemiparesis

  • Cho, Byeong-Mo;Zarayeneh, Neda;Suh, Sang C.
    • 대한통합의학회지
    • /
    • 제8권4호
    • /
    • pp.125-131
    • /
    • 2020
  • Purpose : Lower extremity orthoses have been used as conservative methods to recover gait of the stroke patients. The purpose of this study is to examine how newly designed pelvic belt orthosis can improve gait ability and dynamic balance of adults with Hemiparesis after stroke. Methods : 22 patients who had hemiparesis after stroke participated in this study. Two groups were randomly created by assigning 10 subjects to the experimental group and the rest of the 12 subjects to the control group. The control group was treated by conventional physical therapy and occupational therapy. Identical therapy protocols were used to treat the experimental group who were assigned to wear the pelvic belt orthosis during post measurement. This study has a group of independent variables including group, gender, age, height, MAS, lesion side, cause and a group of dependent variables including gait speed, cadence, step length, stride length, and dynamic balance. The GAITRite system was used to measure spatial-temporal gain parameters and the balance system SD to measure dynamic balance. The data was analyzed using R version 3.3.1. Random forest, boosting algorithm, and MANOVA test were conducted to determine the effects of independent variables on dependent variables. Results : This study has a group of independent variables including group, gender, age, height, MAS, lesion side, cause and a group of dependent variables including gait speed, cadence, step length, stride length, and dynamic balance. The independent variable "group" has the most important value, which is approximately 25.42 (%IncMSE) representing a value three times greater than the second important predictor "height." Conclusion : As a result of this research, the hypothesis is validated with conclusion that Pelvic Belt orthosis could be effectively used for improving gait ability and balance of the patients with post-stroke hemiparesis.

스마트폰 과의존 판별을 위한 기계 학습 기법의 응용 (Application of Machine Learning Techniques for Problematic Smartphone Use)

  • 김우성;한준희
    • 아태비즈니스연구
    • /
    • 제13권3호
    • /
    • pp.293-309
    • /
    • 2022
  • Purpose - The purpose of this study is to explore the possibility of predicting the degree of smartphone overdependence based on mobile phone usage patterns. Design/methodology/approach - In this study, a survey conducted by Korea Internet and Security Agency(KISA) called "problematic smartphone use survey" was analyzed. The survey consists of 180 questions, and data were collected from 29,712 participants. Based on the data on the smartphone usage pattern obtained through the questionnaire, the smartphone addiction level was predicted using machine learning techniques. k-NN, gradient boosting, XGBoost, CatBoost, AdaBoost and random forest algorithms were employed. Findings - First, while various factors together influence the smartphone overdependence level, the results show that all machine learning techniques perform well to predict the smartphone overdependence level. Especially, we focus on the features which can be obtained from the smartphone log data (without psychological factors). It means that our results can be a basis for diagnostic programs to detect problematic smartphone use. Second, the results show that information on users' age, marriage and smartphone usage patterns can be used as predictors to determine whether users are addicted to smartphones. Other demographic characteristics such as sex or region did not appear to significantly affect smartphone overdependence levels. Research implications or Originality - While there are some studies that predict smartphone overdependence level using machine learning techniques, but the studies only present algorithm performance based on survey data. In this study, based on the information gain measure, questions that have more influence on the smartphone overdependence level are presented, and the performance of algorithms according to the questions is compared. Through the results of this study, it is shown that smartphone overdependence level can be predicted with less information if questions about smartphone use are given appropriately.

Data anomaly detection for structural health monitoring of bridges using shapelet transform

  • Arul, Monica;Kareem, Ahsan
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.93-103
    • /
    • 2022
  • With the wider availability of sensor technology through easily affordable sensor devices, several Structural Health Monitoring (SHM) systems are deployed to monitor vital civil infrastructure. The continuous monitoring provides valuable information about the health of the structure that can help provide a decision support system for retrofits and other structural modifications. However, when the sensors are exposed to harsh environmental conditions, the data measured by the SHM systems tend to be affected by multiple anomalies caused by faulty or broken sensors. Given a deluge of high-dimensional data collected continuously over time, research into using machine learning methods to detect anomalies are a topic of great interest to the SHM community. This paper contributes to this effort by proposing a relatively new time series representation named "Shapelet Transform" in combination with a Random Forest classifier to autonomously identify anomalies in SHM data. The shapelet transform is a unique time series representation based solely on the shape of the time series data. Considering the individual characteristics unique to every anomaly, the application of this transform yields a new shape-based feature representation that can be combined with any standard machine learning algorithm to detect anomalous data with no manual intervention. For the present study, the anomaly detection framework consists of three steps: identifying unique shapes from anomalous data, using these shapes to transform the SHM data into a local-shape space and training machine learning algorithms on this transformed data to identify anomalies. The efficacy of this method is demonstrated by the identification of anomalies in acceleration data from an SHM system installed on a long-span bridge in China. The results show that multiple data anomalies in SHM data can be automatically detected with high accuracy using the proposed method.

SVM 기반 Bagging과 OoD 탐색을 활용한 제조공정의 불균형 Dataset에 대한 예측모델의 성능향상 (Boosting the Performance of the Predictive Model on the Imbalanced Dataset Using SVM Based Bagging and Out-of-Distribution Detection)

  • 김종훈;오하영
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제11권11호
    • /
    • pp.455-464
    • /
    • 2022
  • 제조업의 공정에서 생성되는 데이터셋은 크게 두 가지 특징을 가진다. 타겟 클래스의 심각한 불균형과 지속적인 Out-of-Distribution(OoD) 샘플의 발생이다. 클래스 불균형은 SMOTE 및 다양한 샘플링 전략을 통해서 대응할 수 있다. 그러나, OoD 탐색은 현재까지 인공신경망 영역에서만 다뤄져 왔다. OoD 탐색의 적용이 가능한 인공신경망은 제조공정 데이터셋에 대해서 만족스러운 성능을 발현하지 못한다. 원인은 제조공정의 데이터셋이 인공신경망에서 일반적으로 다루는 이미지, 텍스트 데이터셋과 비교해서 크기가 매우 작고, 노이즈가 심하다는 것이다. 또한 인공신경망의 과적합(overfitting) 문제도 제조업 데이터셋에서 인공신경망의 성능을 저하하는 원인으로 지적된다. 이에 현재까지 시도된 바 없는 SVM 알고리즘과 OoD 탐색의 접목을 시도하였다. 또한 예측모델의 정밀도 향상을 위해 배깅(Bagging) 알고리즘을 모델링에 반영하였다.

Projecting the spatial-temporal trends of extreme climatology in South Korea based on optimal multi-model ensemble members

  • Mirza Junaid Ahmad;Kyung-sook Choi
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.314-314
    • /
    • 2023
  • Extreme climate events can have a large impact on human life by hampering social, environmental, and economic development. Global circulation models (GCMs) are the widely used numerical models to understand the anticipated future climate change. However, different GCMs can project different future climates due to structural differences, varying initial boundary conditions and assumptions about the physical phenomena. The multi-model ensemble (MME) approach can improve the uncertainties associated with the different GCM outcomes. In this study, a comprehensive rating metric was used to select the best-performing GCMs out of 11 CMIP5 and 13 CMIP6 GCMs, according to their skills in terms of four temporal and five spatial performance indices, in replicating the 21 extreme climate indices during the baseline (1975-2017) in South Korea. The MME data were derived by averaging the simulations from all selected GCMs and three top-ranked GCMs. The random forest (RF) algorithm was also used to derive the MME data from the three top-ranked GCMs. The RF-derived MME data of the three top-ranked GCMs showed the highest performance in simulating the baseline extreme climate which was subsequently used to project the future extreme climate indices under both the representative concentration pathway (RCP) and the socioeconomic concentration pathway scenarios (SSP). The extreme cold and warming indices had declining and increasing trends, respectively, and most extreme precipitation indices had increasing trends over the period 2031-2100. Compared to all scenarios, RCP8.5 showed drastic changes in future extreme climate indices. The coasts in the east, south and west had stronger warming than the rest of the country, while mountain areas in the north experienced more extreme cold. While extreme cold climatology gradually declined from north to south, extreme warming climatology continuously grew from coastal to inland and northern mountainous regions. The results showed that the socially, environmentally and agriculturally important regions of South Korea were at increased risk of facing the detrimental impacts of extreme climatology.

  • PDF

머신러닝을 통한 잉크 필요량 예측 알고리즘 (Machine Learning Algorithm for Estimating Ink Usage)

  • 권세욱;현영주;태현철
    • 산업경영시스템학회지
    • /
    • 제46권1호
    • /
    • pp.23-31
    • /
    • 2023
  • Research and interest in sustainable printing are increasing in the packaging printing industry. Currently, predicting the amount of ink required for each work is based on the experience and intuition of field workers. Suppose the amount of ink produced is more than necessary. In this case, the rest of the ink cannot be reused and is discarded, adversely affecting the company's productivity and environment. Nowadays, machine learning models can be used to figure out this problem. This study compares the ink usage prediction machine learning models. A simple linear regression model, Multiple Regression Analysis, cannot reflect the nonlinear relationship between the variables required for packaging printing, so there is a limit to accurately predicting the amount of ink needed. This study has established various prediction models which are based on CART (Classification and Regression Tree), such as Decision Tree, Random Forest, Gradient Boosting Machine, and XGBoost. The accuracy of the models is determined by the K-fold cross-validation. Error metrics such as root mean squared error, mean absolute error, and R-squared are employed to evaluate estimation models' correctness. Among these models, XGBoost model has the highest prediction accuracy and can reduce 2134 (g) of wasted ink for each work. Thus, this study motivates machine learning's potential to help advance productivity and protect the environment.

머신러닝 기반 노지 환경 변수에 따른 예측 토양 수분에 미치는 영향에 대한 연구 (A study on the impact on predicted soil moisture based on machine learning-based open-field environment variables)

  • 정광훈;이명훈
    • 스마트미디어저널
    • /
    • 제12권10호
    • /
    • pp.47-54
    • /
    • 2023
  • 지구 온난화로 인해 갑작스러운 기후변화와 농업 생산성에 대한 이해가 점점 중요해지면서, 토양 수분 예측은 농업에서 핵심 주제로 떠오르고 있다. 토양 수분은 농작물의 성장과 건강에 큰 영향을 미치며, 적절한 관리와 정확한 예측은 농업 생산성 향상과 자원 관리의 핵심 요소이다. 이러한 이유로 토양 수분 예측은 농업 및 환경 분야에서 큰 주목을 받고 있다. 본 논문에서는 머신러닝 알고리즘인 랜덤 포레스트를 통하여 시범포를 이용하여 노지 환경 데이터를 수집하고 분석하여 데이터 특성들과 토양 수분의 상관관계를 구하고 토양 수분 실제 값과 예측값을 비교하였으며 비교 결과 예측률이 약 92%의 정확성을 갖는다는 것을 확인하였다. 추후 연구를 통해 작물의 생장 데이터 변수들을 추가하여 토양 수분 예측을 진행한다면 토양 수분에 따른 작물의 생장 속도, 적절한 관수 타이밍 등의 주요 정보를 정확하게 제어함으로써 작물의 품질 상승, 물 관리 효율 증가 등 생산성 및 자원 효율성에 좋은 영향을 미칠 것이라고 기대된다.

광학 영상의 구름 제거를 위한 기계학습 알고리즘의 예측 성능 평가: 농경지 사례 연구 (Performance Evaluation of Machine Learning Algorithms for Cloud Removal of Optical Imagery: A Case Study in Cropland)

  • 박소연;곽근호;안호용;박노욱
    • 대한원격탐사학회지
    • /
    • 제39권5_1호
    • /
    • pp.507-519
    • /
    • 2023
  • Multi-temporal optical images have been utilized for time-series monitoring of croplands. However, the presence of clouds imposes limitations on image availability, often requiring a cloud removal procedure. This study assesses the applicability of various machine learning algorithms for effective cloud removal in optical imagery. We conducted comparative experiments by focusing on two key variables that significantly influence the predictive performance of machine learning algorithms: (1) land-cover types of training data and (2) temporal variability of land-cover types. Three machine learning algorithms, including Gaussian process regression (GPR), support vector machine (SVM), and random forest (RF), were employed for the experiments using simulated cloudy images in paddy fields of Gunsan. GPR and SVM exhibited superior prediction accuracy when the training data had the same land-cover types as the cloud region, and GPR showed the best stability with respect to sampling fluctuations. In addition, RF was the least affected by the land-cover types and temporal variations of training data. These results indicate that GPR is recommended when the land-cover type and spectral characteristics of the training data are the same as those of the cloud region. On the other hand, RF should be applied when it is difficult to obtain training data with the same land-cover types as the cloud region. Therefore, the land-cover types in cloud areas should be taken into account for extracting informative training data along with selecting the optimal machine learning algorithm.

Clinicoradiological Characteristics in the Differential Diagnosis of Follicular-Patterned Lesions of the Thyroid: A Multicenter Cohort Study

  • Jeong Hoon Lee;Eun Ju Ha;Da Hyun Lee;Miran Han;Jung Hyun Park;Ji-hoon Kim
    • Korean Journal of Radiology
    • /
    • 제23권7호
    • /
    • pp.763-772
    • /
    • 2022
  • Objective: Preoperative differential diagnosis of follicular-patterned lesions is challenging. This multicenter cohort study investigated the clinicoradiological characteristics relevant to the differential diagnosis of such lesions. Materials and Methods: From June to September 2015, 4787 thyroid nodules (≥ 1.0 cm) with a final diagnosis of benign follicular nodule (BN, n = 4461), follicular adenoma (FA, n = 136), follicular carcinoma (FC, n = 62), or follicular variant of papillary thyroid carcinoma (FVPTC, n = 128) collected from 26 institutions were analyzed. The clinicoradiological characteristics of the lesions were compared among the different histological types using multivariable logistic regression analyses. The relative importance of the characteristics that distinguished histological types was determined using a random forest algorithm. Results: Compared to BN (as the control group), the distinguishing features of follicular-patterned neoplasms (FA, FC, and FVPTC) were patient's age (odds ratio [OR], 0.969 per 1-year increase), lesion diameter (OR, 1.054 per 1-mm increase), presence of solid composition (OR, 2.255), presence of hypoechogenicity (OR, 2.181), and presence of halo (OR, 1.761) (all p < 0.05). Compared to FA (as the control), FC differed with respect to lesion diameter (OR, 1.040 per 1-mm increase) and rim calcifications (OR, 17.054), while FVPTC differed with respect to patient age (OR, 0.966 per 1-year increase), lesion diameter (OR, 0.975 per 1-mm increase), macrocalcifications (OR, 3.647), and non-smooth margins (OR, 2.538) (all p < 0.05). The five important features for the differential diagnosis of follicular-patterned neoplasms (FA, FC, and FVPTC) from BN are maximal lesion diameter, composition, echogenicity, orientation, and patient's age. The most important features distinguishing FC and FVPTC from FA are rim calcifications and macrocalcifications, respectively. Conclusion: Although follicular-patterned lesions have overlapping clinical and radiological features, the distinguishing features identified in our large clinical cohort may provide valuable information for preoperative distinction between them and decision-making regarding their management.