Search | Korea Science

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.2
- /
- pp.171-177
- /
- 2021
Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.
https://doi.org/10.6109/jkiice.2021.25.2.171 인용 PDF KSCI

A Study on Fraud Detection in the C2C Used Trade Market Using Doc2vec

Lim, Do Hyun;Ahn, Hyunchul
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.3
- /
- pp.173-182
- /
- 2022
In this paper, we propose a machine learning model that can prevent fraudulent transactions in advance and interpret them using the XAI approach. For the experiment, we collected a real data set of 12,258 mobile phone sales posts from Joonggonara, a major domestic online C2C resale trading platform. Characteristics of the text corresponding to the post body were extracted using Doc2vec, dimensionality was reduced through PCA, and various derived variables were created based on previous research. To mitigate the data imbalance problem in the preprocessing stage, a complex sampling method that combines oversampling and undersampling was applied. Then, various machine learning models were built to detect fraudulent postings. As a result of the analysis, LightGBM showed the best performance compared to other machine learning models. And as a result of SHAP, if the price is unreasonably low compared to the market price and if there is no indication of the transaction area, there was a high probability that it was a fraudulent post. Also, high price, no safe transaction, the more the courier transaction, and the higher the ratio of 0 in the price also led to fraud.
https://doi.org/10.9708/jksci.2022.27.03.173 인용 PDF KSCI HTML

Prediction of Vertical Sea Water Temperature Profile in the East Sea Based on Machine Learning and XBT Data

Kim, Young-Joo;Lee, Soo-Jin;Kim, Young-Won
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.11
- /
- pp.47-55
- /
- 2022
Recently, researches on the prediction of sea water temperature using artificial intelligence models has been actively conducted in Korea. However, most researches in the sea around the Korean peninsula mainly focus on predicting sea surface temperatures. Unlike previous researches, this research predicted the vertical sea water temperature profile of the East Sea, which is very important in submarine operations and anti-submarine warfare, using XBT(eXpendable Bathythermograph) data and machine learning models(RandomForest, XGBoost, LightGBM). The model was trained using XBT data measured from sea surface to depth of 200m in a specific area of the East Sea, and the prediction accuracy was evaluated through MAE(Mean Absolute Error) and vertical sea water temperature profile graphs.
https://doi.org/10.9708/jksci.2022.27.11.047 인용 PDF KSCI HTML

Vacant House Prediction and Important Features Exploration through Artificial Intelligence: In Case of Gunsan (인공지능 기반 빈집 추정 및 주요 특성 분석)

Lim, Gyoo Gun;Noh, Jong Hwa;Lee, Hyun Tae;Ahn, Jae Ik
- Journal of Information Technology Services
- /
- v.21 no.3
- /
- pp.63-72
- /
- 2022
The extinction crisis of local cities, caused by a population density increase phenomenon in capital regions, directly causes the increase of vacant houses in local cities. According to population and housing census, Gunsan-si has continuously shown increasing trend of vacant houses during 2015 to 2019. In particular, since Gunsan-si is the city which suffers from doughnut effect and industrial decline, problems regrading to vacant house seems to exacerbate. This study aims to provide a foundation of a system which can predict and deal with the building that has high risk of becoming vacant house through implementing a data driven vacant house prediction machine learning model. Methodologically, this study analyzes three types of machine learning model by differing the data components. First model is trained based on building register, individual declared land value, house price and socioeconomic data and second model is trained with the same data as first model but with additional POI(Point of Interest) data. Finally, third model is trained with same data as the second model but with excluding water usage and electricity usage data. As a result, second model shows the best performance based on F1-score. Random Forest, Gradient Boosting Machine, XGBoost and LightGBM which are tree ensemble series, show the best performance as a whole. Additionally, the complexity of the model can be reduced through eliminating independent variables that have correlation coefficient between the variables and vacant house status lower than the 0.1 based on absolute value. Finally, this study suggests XGBoost and LightGBM based machine learning model, which can handle missing values, as final vacant house prediction model.
https://doi.org/10.9716/KITS.2022.21.3.063 인용 PDF KSCI

Data Quality Assessment and Improvement for Water Level Prediction of the Han River (한강 수위 예측을 위한 데이터 품질 진단 및 개선)

Ji-Hyun Choi;Jin-Yeop Kang;Hyun Ahn
- Journal of Advanced Navigation Technology
- /
- v.27 no.1
- /
- pp.133-138
- /
- 2023
As a side effect of recent rapid climate change and global warming, the frequency and scale of flood disasters are increasing worldwide. In Korea, the water level of the Han River is a major management target for preventing flood disasters in Seoul, the capital of Korea. In this paper, to improve the water level prediction of the Han River based on machine learning, we perform a comprehensive assessment of the quality of related dataset and propose data preprocessing methods to improve it. Specifically, we improve the dataset in terms of completeness, validity, and accuracy through missing value processing and cross-correlation analysis. In addition, we conduct a performance evaluation using random forest and LightGBM to analyze the effect of the proposed data improvement method on the water level prediction performance of the Han River.
https://doi.org/10.12673/jant.2023.27.1.133 인용 PDF HTML

A Study on Predicting Student Dropout in College: The Importance of Early Academic Performance (전문대학 학생의 학업중단 예측에 관한 연구: 초기 학업 성적의 중요성)

Sangjo Oh;JiHwan Sim
- Journal of Industrial Convergence
- /
- v.22 no.2
- /
- pp.23-32
- /
- 2024
This study utilized minimum number of demographic variables and first-semester GPA of students to predict the final academic status of students at a vocational college in Seoul. The results from XGBoost and LightGBM models revealed that these variables significantly impacted the prediction of students' dismissal. This suggests that early academic performance could be an important indicator of potential academic dropout. Additionally, the possibility that academic years required to award an associate degree at the vocational college could influence the final academic status was confirmed, indicating that the duration of study is a crucial factor in students' decisions to discontinue their studies. The study attempted to model without relying on psychological, social, or economic factors, focusing solely on academic achievement. This is expected to aid in the development of an early warning system for preventing academic dropout in the future.
https://doi.org/10.22678/JIC.2024.22.2.023 인용 PDF

A Study on the Prediction Model for Analysis of Water Quality in Gwangju Stream using Machine Learning Algorithm (머신러닝 학습 알고리즘을 이용한 광주천 수질 분석에 대한 예측 모델 연구)

Yu-Jeong Jeong;Jung-Jae Lee
- The Journal of the Korea institute of electronic communication sciences
- /
- v.19 no.3
- /
- pp.531-538
- /
- 2024
While the importance of the water quality environment is being emphasized, the water quality index for improving the water quality of urban rivers in Gwangju Metropolitan City is an important factor affecting the aquatic ecosystem and requires accurate prediction. In this paper, the XGBoost and LightGBM machine learning algorithms were used to compare the performance of the water quality inspection items of the downstream Pyeongchon Bridge and upstream BanghakBr_Gwangjucheon1 water systems, which are important points of Gwangju Stream, as a result of statistical verification, three water quality indicators, Nitrogen(TN), Nitrate(NO3), and Ammonia amount(NH3) were predicted, and the performance of the predictive model was evaluated by using RMSE, a regression model evaluation index. As a result of comparing the performance after cross-validation by implementing individual models for each water system, the XGBoost model showed excellent predictive ability.
https://doi.org/10.13067/JKIECS.2024.19.3.531 인용 PDF

Model Interpretation through LIME and SHAP Model Sharing (LIME과 SHAP 모델 공유에 의한 모델 해석)

Yong-Gil Kim
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.24 no.2
- /
- pp.177-184
- /
- 2024
In the situation of increasing data at fast speed, we use all kinds of complex ensemble and deep learning algorithms to get the highest accuracy. It's sometimes questionable how these models predict, classify, recognize, and track unknown data. Accomplishing this technique and more has been and would be the goal of intensive research and development in the data science community. A variety of reasons, such as lack of data, imbalanced data, biased data can impact the decision rendered by the learning models. Many models are gaining traction for such interpretations. Now, LIME and SHAP are commonly used, in which are two state of the art open source explainable techniques. However, their outputs represent some different results. In this context, this study introduces a coupling technique of LIME and Shap, and demonstrates analysis possibilities on the decisions made by LightGBM and Keras models in classifying a transaction for fraudulence on the IEEE CIS dataset.
https://doi.org/10.7236/JIIBC.2024.24.2.177 인용 PDF HTML

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
- Asia-Pacific Journal of Business Venturing and Entrepreneurship
- /
- v.17 no.1
- /
- pp.229-249
- /
- 2022
This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.
PDF KSCI

Radio Frequency-based Drone Detection and Classification Using Discrete Fourier Transform and LightGBM

Ki-Hyeon Sung;Soo-Jin Lee
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.10
- /
- pp.59-68
- /
- 2024
In this study, we proposed an efficient model that can detect and classify the drones and related devices based on radio frequency signals. In order to increase the applicability in the battlefield, proposed model was designed to be lightweight, to ensure rapid detection and high detection accuracy. Data preprocessing was performed by applying a Discrete Fourier Transform (DFT) that is faster than Hilbert-Huang Transform (HHT). We adopted the LightGBM model as the learning model, which can be easily used by non-professionals and guarantees excellent performance in terms of classification speed and accuracy. CardRF dataset was used to verify the performance of the proposed model. As a result of the experiment, the accuracy of 3 classes classification for detecting and classifying drones, WiFi, and Bluetooth device was 99.63% when the number of sample points was set to 100k and 99.40% when set to 500k during the data preprocessing with DFT. And, in the 10 classes classification for 6 drones, 2 Bluetooth devices, and 2 WiFi devices, the accuracy was 95.65% for 100k and 96.83% for 500k, confirming significantly improved detection performance compared to previous studies.
https://doi.org/10.9708/jksci.2024.29.10.059 인용 PDF

Search Result 91, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)