• Title/Summary/Keyword: forest machine

Search Result 755, Processing Time 0.023 seconds

Application and Comparison of Data Mining Technique to Prevent Metal-Bush Omission (메탈부쉬 누락예방을 위한 데이터마이닝 기법의 적용 및 비교)

  • Sang-Hyun Ko;Dongju Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.139-147
    • /
    • 2023
  • The metal bush assembling process is a process of inserting and compressing a metal bush that serves to reduce the occurrence of noise and stable compression in the rotating section. In the metal bush assembly process, the head diameter defect and placement defect of the metal bush occur due to metal bush omission, non-pressing, and poor press-fitting. Among these causes of defects, it is intended to prevent defects due to omission of the metal bush by using signals from sensors attached to the facility. In particular, a metal bush omission is predicted through various data mining techniques using left load cell value, right load cell value, current, and voltage as independent variables. In the case of metal bush omission defect, it is difficult to get defect data, resulting in data imbalance. Data imbalance refers to a case where there is a large difference in the number of data belonging to each class, which can be a problem when performing classification prediction. In order to solve the problem caused by data imbalance, oversampling and composite sampling techniques were applied in this study. In addition, simulated annealing was applied for optimization of parameters related to sampling and hyper-parameters of data mining techniques used for bush omission prediction. In this study, the metal bush omission was predicted using the actual data of M manufacturing company, and the classification performance was examined. All applied techniques showed excellent results, and in particular, the proposed methods, the method of mixing Random Forest and SA, and the method of mixing MLP and SA, showed better results.

A Method for Generating Malware Countermeasure Samples Based on Pixel Attention Mechanism

  • Xiangyu Ma;Yuntao Zhao;Yongxin Feng;Yutao Hu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.456-477
    • /
    • 2024
  • With information technology's rapid development, the Internet faces serious security problems. Studies have shown that malware has become a primary means of attacking the Internet. Therefore, adversarial samples have become a vital breakthrough point for studying malware. By studying adversarial samples, we can gain insights into the behavior and characteristics of malware, evaluate the performance of existing detectors in the face of deceptive samples, and help to discover vulnerabilities and improve detection methods for better performance. However, existing adversarial sample generation methods still need help regarding escape effectiveness and mobility. For instance, researchers have attempted to incorporate perturbation methods like Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and others into adversarial samples to obfuscate detectors. However, these methods are only effective in specific environments and yield limited evasion effectiveness. To solve the above problems, this paper proposes a malware adversarial sample generation method (PixGAN) based on the pixel attention mechanism, which aims to improve adversarial samples' escape effect and mobility. The method transforms malware into grey-scale images and introduces the pixel attention mechanism in the Deep Convolution Generative Adversarial Networks (DCGAN) model to weigh the critical pixels in the grey-scale map, which improves the modeling ability of the generator and discriminator, thus enhancing the escape effect and mobility of the adversarial samples. The escape rate (ASR) is used as an evaluation index of the quality of the adversarial samples. The experimental results show that the adversarial samples generated by PixGAN achieve escape rates of 97%, 94%, 35%, 39%, and 43% on the Random Forest (RF), Support Vector Machine (SVM), Convolutional Neural Network (CNN), Convolutional Neural Network and Recurrent Neural Network (CNN_RNN), and Convolutional Neural Network and Long Short Term Memory (CNN_LSTM) algorithmic detectors, respectively.

Clinicoradiological Characteristics in the Differential Diagnosis of Follicular-Patterned Lesions of the Thyroid: A Multicenter Cohort Study

  • Jeong Hoon Lee;Eun Ju Ha;Da Hyun Lee;Miran Han;Jung Hyun Park;Ji-hoon Kim
    • Korean Journal of Radiology
    • /
    • v.23 no.7
    • /
    • pp.763-772
    • /
    • 2022
  • Objective: Preoperative differential diagnosis of follicular-patterned lesions is challenging. This multicenter cohort study investigated the clinicoradiological characteristics relevant to the differential diagnosis of such lesions. Materials and Methods: From June to September 2015, 4787 thyroid nodules (≥ 1.0 cm) with a final diagnosis of benign follicular nodule (BN, n = 4461), follicular adenoma (FA, n = 136), follicular carcinoma (FC, n = 62), or follicular variant of papillary thyroid carcinoma (FVPTC, n = 128) collected from 26 institutions were analyzed. The clinicoradiological characteristics of the lesions were compared among the different histological types using multivariable logistic regression analyses. The relative importance of the characteristics that distinguished histological types was determined using a random forest algorithm. Results: Compared to BN (as the control group), the distinguishing features of follicular-patterned neoplasms (FA, FC, and FVPTC) were patient's age (odds ratio [OR], 0.969 per 1-year increase), lesion diameter (OR, 1.054 per 1-mm increase), presence of solid composition (OR, 2.255), presence of hypoechogenicity (OR, 2.181), and presence of halo (OR, 1.761) (all p < 0.05). Compared to FA (as the control), FC differed with respect to lesion diameter (OR, 1.040 per 1-mm increase) and rim calcifications (OR, 17.054), while FVPTC differed with respect to patient age (OR, 0.966 per 1-year increase), lesion diameter (OR, 0.975 per 1-mm increase), macrocalcifications (OR, 3.647), and non-smooth margins (OR, 2.538) (all p < 0.05). The five important features for the differential diagnosis of follicular-patterned neoplasms (FA, FC, and FVPTC) from BN are maximal lesion diameter, composition, echogenicity, orientation, and patient's age. The most important features distinguishing FC and FVPTC from FA are rim calcifications and macrocalcifications, respectively. Conclusion: Although follicular-patterned lesions have overlapping clinical and radiological features, the distinguishing features identified in our large clinical cohort may provide valuable information for preoperative distinction between them and decision-making regarding their management.

Comparison of Solar Power Generation Forecasting Performance in Daejeon and Busan Based on Preprocessing Methods and Artificial Intelligence Techniques: Using Meteorological Observation and Forecast Data (전처리 방법과 인공지능 모델 차이에 따른 대전과 부산의 태양광 발전량 예측성능 비교: 기상관측자료와 예보자료를 이용하여)

  • Chae-Yeon Shim;Gyeong-Min Baek;Hyun-Su Park;Jong-Yeon Park
    • Atmosphere
    • /
    • v.34 no.2
    • /
    • pp.177-185
    • /
    • 2024
  • As increasing global interest in renewable energy due to the ongoing climate crisis, there is a growing need for efficient technologies to manage such resources. This study focuses on the predictive skill of daily solar power generation using weather observation and forecast data. Meteorological data from the Korea Meteorological Administration and solar power generation data from the Korea Power Exchange were utilized for the period from January 2017 to May 2023, considering both inland (Daejeon) and coastal (Busan) regions. Temperature, wind speed, relative humidity, and precipitation were selected as relevant meteorological variables for solar power prediction. All data was preprocessed by removing their systematic components to use only their residuals and the residual of solar data were further processed with weighted adjustments for homoscedasticity. Four models, MLR (Multiple Linear Regression), RF (Random Forest), DNN (Deep Neural Network), and RNN (Recurrent Neural Network), were employed for solar power prediction and their performances were evaluated based on predicted values utilizing observed meteorological data (used as a reference), 1-day-ahead forecast data (referred to as fore1), and 2-day-ahead forecast data (fore2). DNN-based prediction model exhibits superior performance in both regions, with RNN performing the least effectively. However, MLR and RF demonstrate competitive performance comparable to DNN. The disparities in the performance of the four different models are less pronounced than anticipated, underscoring the pivotal role of fitting models using residuals. This emphasizes that the utilized preprocessing approach, specifically leveraging residuals, is poised to play a crucial role in the future of solar power generation forecasting.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Building battery deterioration prediction model using real field data (머신러닝 기법을 이용한 납축전지 열화 예측 모델 개발)

  • Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.243-264
    • /
    • 2018
  • Although the worldwide battery market is recently spurring the development of lithium secondary battery, lead acid batteries (rechargeable batteries) which have good-performance and can be reused are consumed in a wide range of industry fields. However, lead-acid batteries have a serious problem in that deterioration of a battery makes progress quickly in the presence of that degradation of only one cell among several cells which is packed in a battery begins. To overcome this problem, previous researches have attempted to identify the mechanism of deterioration of a battery in many ways. However, most of previous researches have used data obtained in a laboratory to analyze the mechanism of deterioration of a battery but not used data obtained in a real world. The usage of real data can increase the feasibility and the applicability of the findings of a research. Therefore, this study aims to develop a model which predicts the battery deterioration using data obtained in real world. To this end, we collected data which presents change of battery state by attaching sensors enabling to monitor the battery condition in real time to dozens of golf carts operated in the real golf field. As a result, total 16,883 samples were obtained. And then, we developed a model which predicts a precursor phenomenon representing deterioration of a battery by analyzing the data collected from the sensors using machine learning techniques. As initial independent variables, we used 1) inbound time of a cart, 2) outbound time of a cart, 3) duration(from outbound time to charge time), 4) charge amount, 5) used amount, 6) charge efficiency, 7) lowest temperature of battery cell 1 to 6, 8) lowest voltage of battery cell 1 to 6, 9) highest voltage of battery cell 1 to 6, 10) voltage of battery cell 1 to 6 at the beginning of operation, 11) voltage of battery cell 1 to 6 at the end of charge, 12) used amount of battery cell 1 to 6 during operation, 13) used amount of battery during operation(Max-Min), 14) duration of battery use, and 15) highest current during operation. Since the values of the independent variables, lowest temperature of battery cell 1 to 6, lowest voltage of battery cell 1 to 6, highest voltage of battery cell 1 to 6, voltage of battery cell 1 to 6 at the beginning of operation, voltage of battery cell 1 to 6 at the end of charge, and used amount of battery cell 1 to 6 during operation are similar to that of each battery cell, we conducted principal component analysis using verimax orthogonal rotation in order to mitigate the multiple collinearity problem. According to the results, we made new variables by averaging the values of independent variables clustered together, and used them as final independent variables instead of origin variables, thereby reducing the dimension. We used decision tree, logistic regression, Bayesian network as algorithms for building prediction models. And also, we built prediction models using the bagging of each of them, the boosting of each of them, and RandomForest. Experimental results show that the prediction model using the bagging of decision tree yields the best accuracy of 89.3923%. This study has some limitations in that the additional variables which affect the deterioration of battery such as weather (temperature, humidity) and driving habits, did not considered, therefore, we would like to consider the them in the future research. However, the battery deterioration prediction model proposed in the present study is expected to enable effective and efficient management of battery used in the real filed by dramatically and to reduce the cost caused by not detecting battery deterioration accordingly.

Monitoring Ground-level SO2 Concentrations Based on a Stacking Ensemble Approach Using Satellite Data and Numerical Models (위성 자료와 수치모델 자료를 활용한 스태킹 앙상블 기반 SO2 지상농도 추정)

  • Choi, Hyunyoung;Kang, Yoojin;Im, Jungho;Shin, Minso;Park, Seohui;Kim, Sang-Min
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1053-1066
    • /
    • 2020
  • Sulfur dioxide (SO2) is primarily released through industrial, residential, and transportation activities, and creates secondary air pollutants through chemical reactions in the atmosphere. Long-term exposure to SO2 can result in a negative effect on the human body causing respiratory or cardiovascular disease, which makes the effective and continuous monitoring of SO2 crucial. In South Korea, SO2 monitoring at ground stations has been performed, but this does not provide spatially continuous information of SO2 concentrations. Thus, this research estimated spatially continuous ground-level SO2 concentrations at 1 km resolution over South Korea through the synergistic use of satellite data and numerical models. A stacking ensemble approach, fusing multiple machine learning algorithms at two levels (i.e., base and meta), was adopted for ground-level SO2 estimation using data from January 2015 to April 2019. Random forest and extreme gradient boosting were used as based models and multiple linear regression was adopted for the meta-model. The cross-validation results showed that the meta-model produced the improved performance by 25% compared to the base models, resulting in the correlation coefficient of 0.48 and root-mean-square-error of 0.0032 ppm. In addition, the temporal transferability of the approach was evaluated for one-year data which were not used in the model development. The spatial distribution of ground-level SO2 concentrations based on the proposed model agreed with the general seasonality of SO2 and the temporal patterns of emission sources.

Spatial Downscaling of Ocean Colour-Climate Change Initiative (OC-CCI) Forel-Ule Index Using GOCI Satellite Image and Machine Learning Technique (GOCI 위성영상과 기계학습 기법을 이용한 Ocean Colour-Climate Change Initiative (OC-CCI) Forel-Ule Index의 공간 상세화)

  • Sung, Taejun;Kim, Young Jun;Choi, Hyunyoung;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_1
    • /
    • pp.959-974
    • /
    • 2021
  • Forel-Ule Index (FUI) is an index which classifies the colors of inland and seawater exist in nature into 21 gradesranging from indigo blue to cola brown. FUI has been analyzed in connection with the eutrophication, water quality, and light characteristics of water systems in many studies, and the possibility as a new water quality index which simultaneously contains optical information of water quality parameters has been suggested. In thisstudy, Ocean Colour-Climate Change Initiative (OC-CCI) based 4 km FUI was spatially downscaled to the resolution of 500 m using the Geostationary Ocean Color Imager (GOCI) data and Random Forest (RF) machine learning. Then, the RF-derived FUI was examined in terms of its correlation with various water quality parameters measured in coastal areas and its spatial distribution and seasonal characteristics. The results showed that the RF-derived FUI resulted in higher accuracy (Coefficient of Determination (R2)=0.81, Root Mean Square Error (RMSE)=0.7784) than GOCI-derived FUI estimated by Pitarch's OC-CCI FUI algorithm (R2=0.72, RMSE=0.9708). RF-derived FUI showed a high correlation with five water quality parameters including Total Nitrogen, Total Phosphorus, Chlorophyll-a, Total Suspended Solids, Transparency with the correlation coefficients of 0.87, 0.88, 0.97, 0.65, and -0.98, respectively. The temporal pattern of the RF-derived FUI well reflected the physical relationship with various water quality parameters with a strong seasonality. The research findingssuggested the potential of the high resolution FUI in coastal water quality management in the Korean Peninsula.

Prediction of patent lifespan and analysis of influencing factors using machine learning (기계학습을 활용한 특허수명 예측 및 영향요인 분석)

  • Kim, Yongwoo;Kim, Min Gu;Kim, Young-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.147-170
    • /
    • 2022
  • Although the number of patent which is one of the core outputs of technological innovation continues to increase, the number of low-value patents also hugely increased. Therefore, efficient evaluation of patents has become important. Estimation of patent lifespan which represents private value of a patent, has been studied for a long time, but in most cases it relied on a linear model. Even if machine learning methods were used, interpretation or explanation of the relationship between explanatory variables and patent lifespan was insufficient. In this study, patent lifespan (number of renewals) is predicted based on the idea that patent lifespan represents the value of the patent. For the research, 4,033,414 patents applied between 1996 and 2017 and finally granted were collected from USPTO (US Patent and Trademark Office). To predict the patent lifespan, we use variables that can reflect the characteristics of the patent, the patent owner's characteristics, and the inventor's characteristics. We build four different models (Ridge Regression, Random Forest, Feed Forward Neural Network, Gradient Boosting Models) and perform hyperparameter tuning through 5-fold Cross Validation. Then, the performance of the generated models are evaluated, and the relative importance of predictors is also presented. In addition, based on the Gradient Boosting Model which have excellent performance, Accumulated Local Effects Plot is presented to visualize the relationship between predictors and patent lifespan. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the evaluation reason of individual patents, and discuss applicability to the patent evaluation system. This study has academic significance in that it cumulatively contributes to the existing patent life estimation research and supplements the limitations of existing patent life estimation studies based on linearity. It is academically meaningful that this study contributes cumulatively to the existing studies which estimate patent lifespan, and that it supplements the limitations of linear models. Also, it is practically meaningful to suggest a method for deriving the evaluation basis for individual patent value and examine the applicability to patent evaluation systems.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.