• Title/Summary/Keyword: Light GBM

Search Result 86, Processing Time 0.028 seconds

A Study on the Prediction of Nitrogen Oxide Emissions in Rotary Kiln Process using Machine Learning (머신러닝 기법을 이용한 로터리 킬른 공정의 질소산화물 배출예측에 관한 연구)

  • Je-Hyeung Yoo;Cheong-Yeul Park;Jae Kwon Bae
    • Journal of Industrial Convergence
    • /
    • v.21 no.7
    • /
    • pp.19-27
    • /
    • 2023
  • As the secondary battery market expands, the process of producing laterite ore using the rotary kiln and electric furnace method is expanding worldwide. As ESG management expands, the management of air pollutants such as nitrogen oxides in exhaust gases is strengthened. The rotary kiln, one of the main facilities of the pyrometallurgy process, is a facility for drying and preliminary reduction of ore, and it generate nitrogen oxides, thus prediction of nitrogen oxide is important. In this study, LSTM for regression prediction and LightGBM for classification prediction were used to predict and then model optimization was performed using AutoML. When applying LSTM, the predicted value after 5 minutes was 0.86, MAE 5.13ppm, and after 40 minutes, the predicted value was 0.38 and MAE 10.84ppm. As a result of applying LightGBM for classification prediction, the test accuracy rose from 0.75 after 5 minutes to 0.61 after 40 minutes, to a level that can be used for actual operation, and as a result of model optimization through AutoML, the accuracy of the prediction after 5 minutes improved from 0.75 to 0.80 and from 0.61 to 0.70. Through this study, nitrogen oxide prediction values can be applied to actual operations to contribute to compliance with air pollutant emission regulations and ESG management.

Force-mediated proinvasive matrix remodeling driven by tumor-associated mesenchymal stem-like cells in glioblastoma

  • Lim, Eun-Jung;Suh, Yongjoon;Kim, Seungmo;Kang, Seok-Gu;Lee, Su-Jae
    • BMB Reports
    • /
    • v.51 no.4
    • /
    • pp.182-187
    • /
    • 2018
  • In carcinoma, cancer-associated fibroblasts participate in force-mediated extracellular matrix (ECM) remodeling, consequently leading to invasion of cancer cells. Likewise, the ECM remodeling actively occurs in glioblastoma (GBM) and the consequent microenvironmental stiffness is strongly linked to migration behavior of GBM cells. However, in GBM the stromal cells responsible for force-mediated ECM remodeling remain unidentified. We show that tumor-associated mesenchymal stem-like cells (tMSLCs) provide a proinvasive matrix condition in GBM by force-mediated ECM remodeling. Importantly, CCL2-mediated Janus kinase 1 (JAK1) activation increased phosphorylation of myosin light chain 2 in tMSLCs and led to collagen assembly and actomyosin contractility. Collectively, our findings implicate tMSLCs as stromal cells providing force-mediated proinvasive ECM remodeling in the GBM microenvironment, and reminiscent of fibroblasts in carcinoma.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

IoT Attack Detection Using PCA and Machine Learning (주성분 분석과 기계학습을 이용한 사물인터넷 공격 탐지)

  • Lee, Ji-Gu;Lee, Soo-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.245-246
    • /
    • 2022
  • 최근 IoT 환경에서 기계학습을 이용한 공격 탐지 모델의 연구가 활발히 진행되고 있으며, 탐지 정확도도 점차 향상되고 있다. 하지만, IoT 환경의 특징인 저 사양 하드웨어, 고차원의 특징, 방대한 트래픽 등으로 인해 탐지성능이 저하되는 문제가 있다. 따라서 본 논문에서는 MQTT(Message Queuing Telementry Transport) 프로토콜 기반의 IoT 환경에서 수집된 데이터셋을 대상으로 주성분 분석(Principal Component Analysis)과 LightGBM을 이용하여 데이터셋 차원을 감소시키고, 공격 클래스를 분류하였다. 실험결과 원본 데이터셋 차원을 주성분 3개(약 9%)로 감소시켰음에도 모든 특징(33개)을 사용한 실험결과와 거의 유사한 성능을 보였다. 또한 기존 연구의 특징 선택을 통한 탐지 모델과 비교하였을 때도 분류성능이 더 우수한 것으로 나타났다.

  • PDF

머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구

  • Yun, Yang-Hyeon;Kim, Tae-Gyeong;Kim, Su-Yeong;Park, Yong-Gyun
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2021.11a
    • /
    • pp.185-187
    • /
    • 2021
  • 관리종목 지정 제도는 상장 기업 내 기업의 부실화를 경고하여 기업에게는 회생 기회를 주고, 투자자들에게는 투자 위험을 경고하기 위한 시장규제 제도이다. 본 연구는 관리종목과 비관리종목의 기업의 재무 데이터를 표본으로 하여 관리종목 지정 예측에 대한 연구를 진행하였다. 분석에 쓰인 분석 방법은 로지스틱 회귀분석, 의사결정나무, 서포트 벡터 머신, 소프트 보팅, 랜덤 포레스트, LightGBM이며 분류 정확도가 82.73%인 LightGBM이 가장 우수한 예측 모형이었으며 분류 정확도가 가장 낮은 예측 모형은 정확도가 71.94%인 의사결정나무였다. 대체적으로 앙상블을 이용한 학습 모형이 단일 학습 모형보다 예측 성능이 높았다.

  • PDF

Potential of multispectral imaging for maturity classification and recognition of oriental melon

  • Seongmin Lee;Kyoung-Chul Kim;Kangjin Lee;Jinhwan Ryu;Youngki Hong;Byeong-Hyo Cho
    • Korean Journal of Agricultural Science
    • /
    • v.50 no.3
    • /
    • pp.485-496
    • /
    • 2023
  • In this study, we aimed to apply multispectral imaging (713 - 920 nm, 10 bands) for maturity classification and recognition of oriental melons grown in hydroponic greenhouses. A total of 20 oriental melons were selected, and time series multispectral imaging of oriental melons was 7 - 9 times for each sample from April 21, 2023, to May 12, 2023. We used several approaches, such as Savitzky-Golay (SG), standard normal variate (SNV), and Combination of SG and SNV (SG + SNV), for pre-processing the multispectral data. As a result, 713 - 759 nm bands were preprocessed with SG for the maturity classification of oriental melons. Additionally, a Light Gradient Boosting Machine (LightGBM) was used to train the recognition model for oriental melon. R2 of recognition model were 0.92, 0.91 for the training and validation sets, respectively, and the F-scores were 96.6 and 79.4% for the training and testing sets, respectively. Therefore, multispectral imaging in the range of 713 - 920 nm can be used to classify oriental melons maturity and recognize their fruits.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

  • Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.171-177
    • /
    • 2021
  • Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.

A Study on Fraud Detection in the C2C Used Trade Market Using Doc2vec

  • Lim, Do Hyun;Ahn, Hyunchul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.173-182
    • /
    • 2022
  • In this paper, we propose a machine learning model that can prevent fraudulent transactions in advance and interpret them using the XAI approach. For the experiment, we collected a real data set of 12,258 mobile phone sales posts from Joonggonara, a major domestic online C2C resale trading platform. Characteristics of the text corresponding to the post body were extracted using Doc2vec, dimensionality was reduced through PCA, and various derived variables were created based on previous research. To mitigate the data imbalance problem in the preprocessing stage, a complex sampling method that combines oversampling and undersampling was applied. Then, various machine learning models were built to detect fraudulent postings. As a result of the analysis, LightGBM showed the best performance compared to other machine learning models. And as a result of SHAP, if the price is unreasonably low compared to the market price and if there is no indication of the transaction area, there was a high probability that it was a fraudulent post. Also, high price, no safe transaction, the more the courier transaction, and the higher the ratio of 0 in the price also led to fraud.