• Title/Summary/Keyword: model minority

Search Result 78, Processing Time 0.027 seconds

Optimization of Uneven Margin SVM to Solve Class Imbalance in Bankruptcy Prediction (비대칭 마진 SVM 최적화 모델을 이용한 기업부실 예측모형의 범주 불균형 문제 해결)

  • Sung Yim Jo;Myoung Jong Kim
    • Information Systems Review
    • /
    • v.24 no.4
    • /
    • pp.23-40
    • /
    • 2022
  • Although Support Vector Machine(SVM) has been used in various fields such as bankruptcy prediction model, the hyperplane learned by SVM in class imbalance problem can be severely skewed toward minority class and has a negative impact on performance because the area of majority class is expanded while the area of minority class is invaded. This study proposed optimized uneven margin SVM(OPT-UMSVM) combining threshold moving or post scaling method with UMSVM to cope with the limitation of the traditional even margin SVM(EMSVM) in class imbalance problem. OPT-UMSVM readjusted the skewed hyperplane to the majority class and had better generation ability than EMSVM improving the sensitivity of minority class and calculating the optimized performance. To validate OPT-UMSVM, 10-fold cross validations were performed on five sub-datasets with different imbalance ratio values. Empirical results showed two main findings. First, UMSVM had a weak effect on improving the performance of EMSVM in balanced datasets, but it greatly outperformed EMSVM in severely imbalanced datasets. Second, compared to EMSVM and conventional UMSVM, OPT-UMSVM had better performance in both balanced and imbalanced datasets and showed a significant difference performance especially in severely imbalanced datasets.

Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms (SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법)

  • Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.6
    • /
    • pp.5-12
    • /
    • 2023
  • Proactive assessment of landslide susceptibility is necessary for minimizing casualties. This study proposes a methodology for classifying the landslide safety factor using a classification algorithm based on machine learning techniques. The high-risk area model is adopted to perform the classification and eight geotechnical parameters are adopted as inputs. Four classification algorithms-namely decision tree, k-nearest neighbor, logistic regression, and random forest-are employed for comparing classification accuracy for the safety factors ranging between 1.2 and 2.0. Notably, a high accuracy is demonstrated in the safety factor range of 1.2~1.7, but a relatively low accuracy is obtained in the range of 1.8~2.0. To overcome this issue, the synthetic minority over-sampling technique (SMOTE) is adopted to generate additional data. The application of SMOTE improves the average accuracy by ~250% in the safety factor range of 1.8~2.0. The results demonstrate that SMOTE algorithm improves the accuracy of classification algorithms when applied to geotechnical data.

A Study on the Improvement of Image Classification Performance in the Defense Field through Cost-Sensitive Learning of Imbalanced Data (불균형데이터의 비용민감학습을 통한 국방분야 이미지 분류 성능 향상에 관한 연구)

  • Jeong, Miae;Ma, Jungmok
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.3
    • /
    • pp.281-292
    • /
    • 2021
  • With the development of deep learning technology, researchers and technicians keep attempting to apply deep learning in various industrial and academic fields, including the defense. Most of these attempts assume that the data are balanced. In reality, since lots of the data are imbalanced, the classifier is not properly built and the model's performance can be low. Therefore, this study proposes cost-sensitive learning as a solution to the imbalance data problem of image classification in the defense field. In the proposed model, cost-sensitive learning is a method of giving a high weight on the cost function of a minority class. The results of cost-sensitive based model shows the test F1-score is higher when cost-sensitive learning is applied than general learning's through 160 experiments using submarine/non-submarine dataset and warship/non-warship dataset. Furthermore, statistical tests are conducted and the results are shown significantly.

기하 증명 읽기 이해 모델의 적용 효과

  • Hwang, Chul-Ju;Lee, Ji-Youn;Kim, Sun-Hee
    • East Asian mathematical journal
    • /
    • v.25 no.3
    • /
    • pp.299-320
    • /
    • 2009
  • In mathematics, the education of the geometry proof has been playing an important role in promoting the ability for logical thinking by means of developing the deductive reasoning. However, despite of those importance mentioned above, considering the present condition for the education of the geometry proof in middle schools, it is still found that most of classes are led mainly by teachers, operating the cramming system of eduction, and students in those classes have many difficulties in learning the geometry proof course. Accordingly this thesis suggests the other method that is distinguished from previous proof educations. The thesis of Kai-Lin Yang and Fou-Lai Lin on 'A Model of Reading Comprehension of Geometry Proof (RCGP)', which was published in 2007, have various practical examples based on the model. After composing classes based on those examples and instructing the geometry proof, found out a problem. And then advance a new teaching model that amendment and supplementation However, it is considered to have limitation because subjects were minority and classes were operated by man-to-man method. Hopefully, the method of proof education will be more developed through performing more active researches on this in the nearest future.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Evaluation of Multi-classification Model Performance for Algal Bloom Prediction Using CatBoost (머신러닝 CatBoost 다중 분류 알고리즘을 이용한 조류 발생 예측 모형 성능 평가 연구)

  • Juneoh Kim;Jungsu Park
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • Monitoring and prediction of water quality are essential for effective river pollution prevention and water quality management. In this study, a multi-classification model was developed to predict chlorophyll-a (Chl-a) level in rivers. A model was developed using CatBoost, a novel ensemble machine learning algorithm. The model was developed using hourly field monitoring data collected from January 1 to December 31, 2015. For model development, chl-a was classified into class 1 (Chl-a≤10 ㎍/L), class 2 (10<Chl-a≤50 ㎍/L), and class 3 (Chl-a>50 ㎍/L), where the number of data used for the model training were 27,192, 11,031, and 511, respectively. The macro averages of precision, recall, and F1-score for the three classes were 0.58, 0.58, and 0.58, respectively, while the weighted averages were 0.89, 0.90, and 0.89, for precision, recall, and F1-score, respectively. The model showed relatively poor performance for class 3 where the number of observations was much smaller compared to the other two classes. The imbalance of data distribution among the three classes was resolved by using the synthetic minority over-sampling technique (SMOTE) algorithm, where the number of data used for model training was evenly distributed as 26,868 for each class. The model performance was improved with the macro averages of precision, rcall, and F1-score of the three classes as 0.58, 0.70, and 0.59, respectively, while the weighted averages were 0.88, 0.84, and 0.86 after SMOTE application.

The Optimum Frequency Response of GaAs/(Ga, Al) As DH-LED for Optical Communication (광통신용 GaAs/(Ga, Al)As DH-LED의 최적 주파수 응용에 대한 연구)

  • 오환술;김영권
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.21 no.3
    • /
    • pp.60-65
    • /
    • 1984
  • In this paper, we have used symmetrical GaAs/(Ga, Al) As DH-LED as a model for the optimization of frequency response which is the most important design parameter of the optical communication-LED. And optimum design parameters have been chosen to improve performance factors of the DH-LED by computer simulation. This is for the purpose of systematic consideration of the interrelation of the physical parameters such as impurity concentration of the active layer, thickness of the active layer, minority carrier lifetime, space charge capacitance and injected current density.

  • PDF

Carrier Lfetime and Anormal Cnduction Penomena in Silicon Epitaxial Layer-substrate Junction (Epitaxial에 의한 Si epi층의 케리어 수명과 P-N접합의 이상전도현상)

  • 성영권;민남기;김승배
    • 전기의세계
    • /
    • v.26 no.5
    • /
    • pp.83-89
    • /
    • 1977
  • This paper described the minority carrier lifetime in Si epitaxial layer, and also the voltage (V) versus current (I) characteristics of high resistivity Si epitaxial layer0substrate junction. The measured lifetime in Si epi-layer was much shorter than in bulk, and the temperature dependence of lifetime was found to agree well with Shockley-Read model of recombination which applies to high resistivity n-type materials. The V-I curve showed; an ohmic region (I.var.V), a sublinear region (I.var.V$^{1}$2/), a space charge limited current region (I.var.V$^{2}$), and finally a negative resistance region. We investigated these phenomena by the theory of the relaxation semiconductor.

  • PDF

Bert-based Classification Model Improvement through Minority Class Data Augmentation (소수 클래스 데이터 증강을 통한 BERT 기반의 유형 분류 모델 성능 개선)

  • Kim, Jeong-Woo;Jang, Kwangho;Lee, Yong Tae;Park, Won-joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.810-813
    • /
    • 2020
  • 자연어처리 분야에서 딥러닝 기반의 분류 모델은 획기적인 성능을 보여주고 있다. 특히 2018 년 발표된 구글의 BERT 는 다양한 태스크에서 높은 성능을 보여준다. 본 논문에서는 이러한 BERT 가 클래스 불균형이 심한 데이터에 대해 어느 정도 성능을 보여주는지 확인하고 이를 해결하는 방법으로 EDA 를 선택해 성능을 개선하고자 한다. BERT 에 알맞게 적용하기 위해 다양한 방법으로 EDA 를 구현했고 이에 대한 성능을 평가하였다.

Enhancing Malware Detection with TabNetClassifier: A SMOTE-based Approach

  • Rahimov Faridun;Eul Gyu Im
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.294-297
    • /
    • 2024
  • Malware detection has become increasingly critical with the proliferation of end devices. To improve detection rates and efficiency, the research focus in malware detection has shifted towards leveraging machine learning and deep learning approaches. This shift is particularly relevant in the context of the widespread adoption of end devices, including smartphones, Internet of Things devices, and personal computers. Machine learning techniques are employed to train models on extensive datasets and evaluate various features, while deep learning algorithms have been extensively utilized to achieve these objectives. In this research, we introduce TabNet, a novel architecture designed for deep learning with tabular data, specifically tailored for enhancing malware detection techniques. Furthermore, the Synthetic Minority Over-Sampling Technique is utilized in this work to counteract the challenges posed by imbalanced datasets in machine learning. SMOTE efficiently balances class distributions, thereby improving model performance and classification accuracy. Our study demonstrates that SMOTE can effectively neutralize class imbalance bias, resulting in more dependable and precise machine learning models.