Search | Korea Science

Predicting Administrative Issue Designation in KOSDAQ Market Using Machine Learning Techniques (머신러닝을 활용한 코스닥 관리종목지정 예측)

Chae, Seung-Il;Lee, Dong-Joo
- Asia-Pacific Journal of Business
- /
- v.13 no.2
- /
- pp.107-122
- /
- 2022
Purpose - This study aims to develop machine learning models to predict administrative issue designation in KOSDAQ Market using financial data. Design/methodology/approach - Employing four classification techniques including logistic regression, support vector machine, random forest, and gradient boosting to a matched sample of five hundred and thirty-six firms over an eight-year period, the authors develop prediction models and explore the practicality of the models. Findings - The resulting four binary selection models reveal overall satisfactory classification performance in terms of various measures including AUC (area under the receiver operating characteristic curve), accuracy, F1-score, and top quartile lift, while the ensemble models (random forest and gradienct boosting) outperform the others in terms of most measures. Research implications or Originality - Although the assessment of administrative issue potential of firms is critical information to investors and financial institutions, detailed empirical investigation has lagged behind. The current research fills this gap in the literature by proposing parsimonious prediction models based on a few financial variables and validating the applicability of the models.
https://doi.org/10.32599/apjb.13.2.202206.107 인용 PDF

Calibration of Portable Particulate Mattere-Monitoring Device using Web Query and Machine Learning

Loh, Byoung Gook;Choi, Gi Heung
- Safety and Health at Work
- /
- v.10 no.4
- /
- pp.452-460
- /
- 2019
Background: Monitoring and control of PM_2.5 are being recognized as key to address health issues attributed to PM_2.5. Availability of low-cost PM_2.5 sensors made it possible to introduce a number of portable PM_2.5 monitors based on light scattering to the consumer market at an affordable price. Accuracy of light scatteringe-based PM_2.5 monitors significantly depends on the method of calibration. Static calibration curve is used as the most popular calibration method for low-cost PM_2.5 sensors particularly because of ease of application. Drawback in this approach is, however, the lack of accuracy. Methods: This study discussed the calibration of a low-cost PM_2.5-monitoring device (PMD) to improve the accuracy and reliability for practical use. The proposed method is based on construction of the PM_2.5 sensor network using Message Queuing Telemetry Transport (MQTT) protocol and web query of reference measurement data available at government-authorized PM monitoring station (GAMS) in the republic of Korea. Four machine learning (ML) algorithms such as support vector machine, k-nearest neighbors, random forest, and extreme gradient boosting were used as regression models to calibrate the PMD measurements of PM_2.5. Performance of each ML algorithm was evaluated using stratified K-fold cross-validation, and a linear regression model was used as a reference. Results: Based on the performance of ML algorithms used, regression of the output of the PMD to PM_2.5 concentrations data available from the GAMS through web query was effective. The extreme gradient boosting algorithm showed the best performance with a mean coefficient of determination (R²) of 0.78 and standard error of 5.0 ㎍/㎥, corresponding to 8% increase in R² and 12% decrease in root mean square error in comparison with the linear regression model. Minimum 100 hours of calibration period was found required to calibrate the PMD to its full capacity. Calibration method proposed poses a limitation on the location of the PMD being in the vicinity of the GAMS. As the number of the PMD participating in the sensor network increases, however, calibrated PMDs can be used as reference devices to nearby PMDs that require calibration, forming a calibration chain through MQTT protocol. Conclusions: Calibration of a low-cost PMD, which is based on construction of PM_2.5 sensor network using MQTT protocol and web query of reference measurement data available at a GAMS, significantly improves the accuracy and reliability of a PMD, thereby making practical use of the low-cost PMD possible.
https://doi.org/10.1016/j.shaw.2019.08.002 인용 PDF KSCI

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
- Tuberculosis and Respiratory Diseases
- /
- v.86 no.3
- /
- pp.203-215
- /
- 2023
Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R² and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R² value was 0.27 and in set II, LightGBM was the best model with the highest R² value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R² value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.
https://doi.org/10.4046/trd.2022.0048 인용 PDF

Development of The Irregular Radial Pulse Detection Algorithm Based on Statistical Learning Model (통계적 학습 모형에 기반한 불규칙 맥파 검출 알고리즘 개발)

Bae, Jang-Han;Jang, Jun-Su;Ku, Boncho
- Journal of Biomedical Engineering Research
- /
- v.41 no.5
- /
- pp.185-194
- /
- 2020
Arrhythmia is basically diagnosed with the electrocardiogram (ECG) signal, however, ECG is difficult to measure and it requires expert help in analyzing the signal. On the other hand, the radial pulse can be measured with easy and uncomplicated way in daily life, and could be suitable bio-signal for the recent untact paradigm and extensible signal for diagnosis of Korean medicine based on pulse pattern. In this study, we developed an irregular radial pulse detection algorithm based on a learning model and considered its applicability as arrhythmia screening. A total of 1432 pulse waves including irregular pulse data were used in the experiment. Three data sets were prepared with minimal preprocessing to avoid the heuristic feature extraction. As classification algorithms, elastic net logistic regression, random forest, and extreme gradient boosting were applied to each data set and the irregular pulse detection performances were estimated using area under the receiver operating characteristic curve based on a 10-fold cross-validation. The extreme gradient boosting method showed the superior performance than others and found that the classification accuracy reached 99.7%. The results confirmed that the proposed algorithm could be used for arrhythmia screening. To make a fusion technology integrating western and Korean medicine, arrhythmia subtype classification from the perspective of Korean medicine will be needed for future research.
https://doi.org/10.9718/JBER.2020.41.5.185 인용 PDF KSCI

Estimation of Ground-level PM₁₀ and PM_2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

Park, Seohui;Kim, Miae;Im, Jungho
- Korean Journal of Remote Sensing
- /
- v.37 no.2
- /
- pp.321-335
- /
- 2021
Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.
https://doi.org/10.7780/kjrs.2021.37.2.11 인용 PDF KSCI HTML

Prediction of patent lifespan and analysis of influencing factors using machine learning (기계학습을 활용한 특허수명 예측 및 영향요인 분석)

Kim, Yongwoo;Kim, Min Gu;Kim, Young-Min
- Journal of Intelligence and Information Systems
- /
- v.28 no.2
- /
- pp.147-170
- /
- 2022
Although the number of patent which is one of the core outputs of technological innovation continues to increase, the number of low-value patents also hugely increased. Therefore, efficient evaluation of patents has become important. Estimation of patent lifespan which represents private value of a patent, has been studied for a long time, but in most cases it relied on a linear model. Even if machine learning methods were used, interpretation or explanation of the relationship between explanatory variables and patent lifespan was insufficient. In this study, patent lifespan (number of renewals) is predicted based on the idea that patent lifespan represents the value of the patent. For the research, 4,033,414 patents applied between 1996 and 2017 and finally granted were collected from USPTO (US Patent and Trademark Office). To predict the patent lifespan, we use variables that can reflect the characteristics of the patent, the patent owner's characteristics, and the inventor's characteristics. We build four different models (Ridge Regression, Random Forest, Feed Forward Neural Network, Gradient Boosting Models) and perform hyperparameter tuning through 5-fold Cross Validation. Then, the performance of the generated models are evaluated, and the relative importance of predictors is also presented. In addition, based on the Gradient Boosting Model which have excellent performance, Accumulated Local Effects Plot is presented to visualize the relationship between predictors and patent lifespan. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the evaluation reason of individual patents, and discuss applicability to the patent evaluation system. This study has academic significance in that it cumulatively contributes to the existing patent life estimation research and supplements the limitations of existing patent life estimation studies based on linearity. It is academically meaningful that this study contributes cumulatively to the existing studies which estimate patent lifespan, and that it supplements the limitations of linear models. Also, it is practically meaningful to suggest a method for deriving the evaluation basis for individual patent value and examine the applicability to patent evaluation systems.
https://doi.org/10.13088/jiis.2022.28.2.147 인용 PDF KSCI

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
- Advances in nano research
- /
- v.13 no.5
- /
- pp.499-512
- /
- 2022
This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m³, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.
https://doi.org/10.12989/anr.2022.13.5.499 인용 KSCI

Assessment of concrete macrocrack depth using infrared thermography

Bae, Jaehoon;Jang, Arum;Park, Min Jae;Lee, Jonghoon;Ju, Young K.
- Steel and Composite Structures
- /
- v.43 no.4
- /
- pp.501-509
- /
- 2022
Cracks are common defects in concrete structures. Thus far, crack inspection has been manually performed using the contact inspection method. This manpower-dependent method inevitably increases the cost and work hours. Various non-contact studies have been conducted to overcome such difficulties. However, previous studies have focused on developing a methodology for non-contact inspection or local quantitative detection of crack width or length on concrete surfaces. However, crack depth can affect the safety of concrete structures. In particular, although macrocrack depth is structurally fatal, it is difficult to find it with the existing method. Therefore, an experimental investigation based on non-contact infrared thermography and multivariate machine learning was performed in this study to estimate the hidden macrocrack depth. To consider practical applications for inspection, an experiment was conducted that considered the simulated piloting of an unmanned aerial vehicle equipped with infrared thermography equipment. The crack depths (10-60 mm) were comparatively evaluated using linear regression, gradient boosting, and random forest (AI regression methods).
https://doi.org/10.12989/scs.2022.43.4.501 인용 KSCI

Classification of Soil Creep Hazard Class Using Machine Learning (기계학습기법을 이용한 땅밀림 위험등급 분류)

Lee, Gi Ha;Le, Xuan-Hien;Yeon, Min Ho;Seo, Jun Pyo;Lee, Chang Woo
- Journal of Korean Society of Disaster and Security
- /
- v.14 no.3
- /
- pp.17-27
- /
- 2021
In this study, classification models were built using machine learning techniques that can classify the soil creep risk into three classes from A to C (A: risk, B: moderate, C: good). A total of six machine learning techniques were used: K-Nearest Neighbor, Support Vector Machine, Logistic Regression, Decision Tree, Random Forest, and Extreme Gradient Boosting and then their classification accuracy was analyzed using the nationwide soil creep field survey data in 2019 and 2020. As a result of classification accuracy analysis, all six methods showed excellent accuracy of 0.9 or more. The methods where numerical data were applied for data training showed better performance than the methods based on character data of field survey evaluation table. Moreover, the methods learned with the data group (R1~R4) reflecting the expert opinion had higher accuracy than the field survey evaluation score data group (C1~C4). The machine learning can be used as a tool for prediction of soil creep if high-quality data are continuously secured and updated in the future.
https://doi.org/10.21729/ksds.2021.14.3.17 인용 PDF KSCI

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

Do Young Kim;Na Yeon Kim;Hyon Hee Kim
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.4
- /
- pp.173-178
- /
- 2023
As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.
https://doi.org/10.3745/KTSDE.2023.12.4.173 인용 PDF

Search Result 80, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)