Search | Korea Science

Application of Random Over Sampling Examples(ROSE) for an Effective Bankruptcy Prediction Model (효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용)

Ahn, Cheolhwi;Ahn, Hyunchul
- The Journal of the Korea Contents Association
- /
- v.18 no.8
- /
- pp.525-535
- /
- 2018
If the frequency of a particular class is excessively higher than the frequency of other classes in the classification problem, data imbalance problems occur, which make machine learning distorted. Corporate bankruptcy prediction often suffers from data imbalance problems since the ratio of insolvent companies is generally very low, whereas the ratio of solvent companies is very high. To mitigate these problems, it is required to apply a proper sampling technique. Until now, oversampling techniques which adjust the class distribution of a data set by sampling minor class with replacement have popularly been used. However, they are a risk of overfitting. Under this background, this study proposes ROSE(Random Over Sampling Examples) technique which is proposed by Menardi and Torelli in 2014 for the effective corporate bankruptcy prediction. The ROSE technique creates new learning samples by synthesizing the samples for learning, so it leads to better prediction accuracy of the classifiers while avoiding the risk of overfitting. Specifically, our study proposes to combine the ROSE method with SVM(support vector machine), which is known as the best binary classifier. We applied the proposed method to a real-world bankruptcy prediction case of a Korean major bank, and compared its performance with other sampling techniques. Experimental results showed that ROSE contributed to the improvement of the prediction accuracy of SVM in bankruptcy prediction compared to other techniques, with statistical significance. These results shed a light on the fact that ROSE can be a good alternative for resolving data imbalance problems of the prediction problems in social science area other than bankruptcy prediction.
https://doi.org/10.5392/JKCA.2018.18.08.525 인용 PDF KSCI

A Method of Bank Telemarketing Customer Prediction based on Hybrid Sampling and Stacked Deep Networks (혼성 표본 추출과 적층 딥 네트워크에 기반한 은행 텔레마케팅 고객 예측 방법)

Lee, Hyunjin
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.15 no.3
- /
- pp.197-206
- /
- 2019
Telemarketing has been used in finance due to the reduction of offline channels. In order to select telemarketing target customers, various machine learning techniques have emerged to maximize the effect of minimum cost. However, there are problems that the class imbalance, which the number of marketing success customers is smaller than the number of failed customers, and the recall rate is lower than accuracy. In this paper, we propose a method that solve the imbalanced class problem and increase the recall rate to improve the efficiency. The hybrid sampling method is applied to balance the data in the class, and the stacked deep network is applied to improve the recall and precision as well as the accuracy. The proposed method is applied to actual bank telemarketing data. As a result of the comparison experiment, the accuracy, the recall, and the precision is improved higher than that of the conventional methods.
https://doi.org/10.17662/ksdim.2019.15.3.197 인용 PDF KSCI

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process (LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로)

Kang-Min An;Ju-Eun Shin;Dong Hyun Baek
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.45 no.4
- /
- pp.86-98
- /
- 2022
Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.
https://doi.org/10.11627/jksie.2022.45.4.086 인용 PDF KSCI

Classification Abnormal temperatures based on Meteorological Environment using Random forests (랜덤포레스트를 이용한 기상 환경에 따른 이상기온 분류)

Youn Su Kim;Kwang Yoon Song;In Hong Chang
- Journal of Integrative Natural Science
- /
- v.17 no.1
- /
- pp.1-12
- /
- 2024
Many abnormal climate events are occurring around the world. The cause of abnormal climate is related to temperature. Factors that affect temperature include excessive emissions of carbon and greenhouse gases from a global perspective, and air circulation from a local perspective. Due to the air circulation, many abnormal climate phenomena such as abnormally high temperature and abnormally low temperature are occurring in certain areas, which can cause very serious human damage. Therefore, the problem of abnormal temperature should not be approached only as a case of climate change, but should be studied as a new category of climate crisis. In this study, we proposed a model for the classification of abnormal temperature using random forests based on various meteorological data such as longitudinal observations, yellow dust, ultraviolet radiation from 2018 to 2022 for each region in Korea. Here, the meteorological data had an imbalance problem, so the imbalance problem was solved by oversampling. As a result, we found that the variables affecting abnormal temperature are different in different regions. In particular, the central and southern regions are influenced by high pressure (Mainland China, Siberian high pressure, and North Pacific high pressure) due to their regional characteristics, so pressure-related variables had a significant impact on the classification of abnormal temperature. This suggests that a regional approach can be taken to predict abnormal temperatures from the surrounding meteorological environment. In addition, in the event of an abnormal temperature, it seems that it is possible to take preventive measures in advance according to regional characteristics.
https://doi.org/10.13160/ricns.2024.17.1.1 인용 PDF

The Development of Biodegradable Fiber Tensile Tenacity and Elongation Prediction Model Considering Data Imbalance and Measurement Error (데이터 불균형과 측정 오차를 고려한 생분해성 섬유 인장 강신도 예측 모델 개발)

Se-Chan, Park;Deok-Yeop, Kim;Kang-Bok, Seo;Woo-Jin, Lee
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.12
- /
- pp.489-498
- /
- 2022
Recently, the textile industry, which is labor-intensive, is attempting to reduce process costs and optimize quality through artificial intelligence. However, the fiber spinning process has a high cost for data collection and lacks a systematic data collection and processing system, so the amount of accumulated data is small. In addition, data imbalance occurs by preferentially collecting only data with changes in specific variables according to the purpose of fiber spinning, and there is an error even between samples collected under the same fiber spinning conditions due to difference in the measurement environment of physical properties. If these data characteristics are not taken into account and used for AI models, problems such as overfitting and performance degradation may occur. Therefore, in this paper, we propose an outlier handling technique and data augmentation technique considering the characteristics of the spinning process data. And, by comparing it with the existing outlier handling technique and data augmentation technique, it is shown that the proposed technique is more suitable for spinning process data. In addition, by comparing the original data and the data processed with the proposed method to various models, it is shown that the performance of the tensile tenacity and elongation prediction model is improved in the models using the proposed methods compared to the models not using the proposed methods.
https://doi.org/10.3745/KTSDE.2022.11.12.489 인용 PDF KSCI

Detecting Malicious Social Robots with Generative Adversarial Networks

Wu, Bin;Liu, Le;Dai, Zhengge;Wang, Xiujuan;Zheng, Kangfeng
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.11
- /
- pp.5594-5615
- /
- 2019
Malicious social robots, which are disseminators of malicious information on social networks, seriously affect information security and network environments. The detection of malicious social robots is a hot topic and a significant concern for researchers. A method based on classification has been widely used for social robot detection. However, this method of classification is limited by an unbalanced data set in which legitimate, negative samples outnumber malicious robots (positive samples), which leads to unsatisfactory detection results. This paper proposes the use of generative adversarial networks (GANs) to extend the unbalanced data sets before training classifiers to improve the detection of social robots. Five popular oversampling algorithms were compared in the experiments, and the effects of imbalance degree and the expansion ratio of the original data on oversampling were studied. The experimental results showed that the proposed method achieved better detection performance compared with other algorithms in terms of the F1 measure. The GAN method also performed well when the imbalance degree was smaller than 15%.
https://doi.org/10.3837/tiis.2019.11.018 인용 PDF KSCI HTML

Effects of Iyengar Yoga Practice for 12 weeks on Lower Body Imbalance in Middle-aged Women (중년여성의 12주간 아헹가 요가 수련이 하체 불균형에 미치는 영향)

Park, Yunha;Kim, Donghee
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.18 no.1
- /
- pp.431-440
- /
- 2017
The purpose of this study was to investigate the effects of Iyengar yoga practice on the lower body imbalance in middle-aged women. The subjects (n=24), who had not performed yoga training prior to this study (and) were not attending any other training programs, participated after undergoing an X-RAY examination with the Gonstead Technique and then their lower body imbalance (was reevaluated). The subjects completed the yoga program for 12 weeks (3 times per week, 90 minutes per session). The data were analyzed with the paired t-test and alpha was set at 0.05. It was found that 1) the height differences between the right and left iliac crests (p < 0.001), width (p < 0.001) and length (p < 0.001) differences between the right and left iliac fossa, and width differences between the right and left sacrum (p < 0.001) were significantly reduced after the training program. In addition, 2) the lower limb length discrepancy was significantly reduced (p < 0.001). Our data suggest that Iyengar yoga training for 12 weeks reduces the pelvic imbalance and length differences between the right and left lower limbs in middle-aged females.
https://doi.org/10.5762/KAIS.2017.18.1.431 인용 PDF KSCI

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 (기계학습과 GPT3를 시용한 조작된 리뷰의 탐지)

Chernyaeva, Olga;Hong, Taeho
- Journal of Intelligence and Information Systems
- /
- v.28 no.4
- /
- pp.347-364
- /
- 2022
Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.
https://doi.org/10.13088/jiis.2022.28.4.347 인용 PDF KSCI

Effective Gait Imbalance Judgment Method based on Thigh Location (대퇴부 위치 기반 효과적인 보행 불균형 측정 방법)

Kim, Seojun;Kim, Yoohyun;Shim, Hyeonmin;Lee, Sangmin
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.63 no.4
- /
- pp.541-545
- /
- 2014
In this paper, the angle of the thighs that appear during walking condition to balance estimation to the left and right leg was occurred during normal walking. Get over to the limitations of gait analysis using image processing or foot pressure that was used a lot in the previous, the angle of the thigh were used for estimation of asymmetric gait. We implemented heathy five adult male to test targeting and gait and obtained cycle data from 10 times. For this research, Thigh-Angle measurement device were developed, and attached to in a position of $20^{\circ}$ for flexion and $15^{\circ}$ for extension to measure the angle of the thigh. Also, in order to verify the reliability of estimation of asymmetric gait using thigh-angle, it was compared with the result of asymmetric gait estimation using foot pressure. The results of this paper, using the thigh angle is the average of 16.84% higher than using pressure to accuracy of determine the gait imbalance.
https://doi.org/10.5370/KIEE.2014.63.4.541 인용 PDF KSCI KPUBS HTML

The associations between dietary behavior and subjective measurements of serious dental diseases in nursing home staff (일부 병원종사자의 식행동과 주관적 중대 구강병과의 연관성)

Shim, Youn-Soo;An, So-Youn;Park, So-Young
- Journal of Korean society of Dental Hygiene
- /
- v.13 no.3
- /
- pp.377-385
- /
- 2013
Objectives : The objective of this study is to determine the associations between dietary behaviour and subjective measurements of dental caries and periodontal disease in a cohort of nursing home staff. Methods : A self-reported survey was carried out in 280 nursing home staff in Jeollabukdo Province, Korea. The collected data were analyzed using SPSS Version 19.0 program. Multiple regression analysis was conducted to examine the effects of dietary behavior and food intake on subjective measurements of the two serious dental diseases. Results : The irregular meal tended to increase dietary imbalance and periodontal diseases in the nursing staff. For example, it had influences on the imbalance of sugar, vegetable, and safood intake. Conclusions : It is important to take regular meal because irregular eating behavior tended to increase dietary imbalance and periodontal diseases in the nursing staff.
https://doi.org/10.13065/jksdh.2013.13.3.377 인용 PDF KSCI

Search Result 482, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)