• Title/Summary/Keyword: method validations

Search Result 76, Processing Time 0.02 seconds

Optimization of Uneven Margin SVM to Solve Class Imbalance in Bankruptcy Prediction (비대칭 마진 SVM 최적화 모델을 이용한 기업부실 예측모형의 범주 불균형 문제 해결)

  • Sung Yim Jo;Myoung Jong Kim
    • Information Systems Review
    • /
    • v.24 no.4
    • /
    • pp.23-40
    • /
    • 2022
  • Although Support Vector Machine(SVM) has been used in various fields such as bankruptcy prediction model, the hyperplane learned by SVM in class imbalance problem can be severely skewed toward minority class and has a negative impact on performance because the area of majority class is expanded while the area of minority class is invaded. This study proposed optimized uneven margin SVM(OPT-UMSVM) combining threshold moving or post scaling method with UMSVM to cope with the limitation of the traditional even margin SVM(EMSVM) in class imbalance problem. OPT-UMSVM readjusted the skewed hyperplane to the majority class and had better generation ability than EMSVM improving the sensitivity of minority class and calculating the optimized performance. To validate OPT-UMSVM, 10-fold cross validations were performed on five sub-datasets with different imbalance ratio values. Empirical results showed two main findings. First, UMSVM had a weak effect on improving the performance of EMSVM in balanced datasets, but it greatly outperformed EMSVM in severely imbalanced datasets. Second, compared to EMSVM and conventional UMSVM, OPT-UMSVM had better performance in both balanced and imbalanced datasets and showed a significant difference performance especially in severely imbalanced datasets.

A Study on Phthalate Analysis of Nail Related Products (네일 관련 제품들의 프탈레이트 분석에 관한 연구)

  • Rark, Sin-Hee;Song, Seo-Hyeon;Kim, Hyun-Joo;Cho, Youn-Sik;Kim, Ae-Ran;Kim, Beom-Ho;Hong, Mi-Yeun;Park, Sang-Hyun;Yoon, Mi-Hye
    • Journal of the Society of Cosmetic Scientists of Korea
    • /
    • v.45 no.3
    • /
    • pp.217-224
    • /
    • 2019
  • Phthalates, endocrine disrupting chemicals, are similar in structure to sex hormones and mainly show reproductive toxicity and developmental toxicity. In this study, we analyzed 11 phthalates, including 3 kinds of phthalates prohibited in cosmetic use and 8 kinds of phthalates regulated in 'Common standards for children's products safety' and EU cosmetic regulation (EC No. 1223/2009). The phthalate analysis was optimized using GC-MS/MS. In analytical method validation, this method was satisfied in specificity, linearity, recovery rate, accuracy and MQL. Therefore, we used this method to analyze 82 products of Nail cosmetics & polish. Although six phthalates such as DBP, BBP, DEHP, DPP, DIBP and DIDP were detected at concentrations of $1.0{\sim}59.8{\mu}g/g$g, they were suitable to Korean cosmetic standards. DIBP and DBP were detected at concentration of $1.1{\sim}2.6{\mu}g/g$ in artificial nail, DBP and DEHP were $1.4{\sim}2.5{\mu}g/g$ in glue for nails, and DIBP, DBP, and DEHP were $2.5{\sim}33.3{\mu}g/g$ in nail stickers. Although substances such as DBP and DEHP in artificial nail, Glue for nails, and nail stickers were detected, they were suitable to 'Common safety standards for children's products. DIBP is not a regulated substance in Korea but showed the third highest detection rate following DBP (84.6%) and DEHP (63.4%). The concentration of phthalates detected in nail products is considered to be safe in current standards but continuous monitoring and research about non-regulated substances are also needed to be considered.

Monitoring and Risk Assessment of Pesticide Residues for Circulated Agricultural Commodities in Korea-2013 (국내 유통 농산물의 잔류농약 모니터링 및 위해평가-2013년)

  • Kim, Jae-Young;Lee, Sang-Mok;Lee, Han-Jin;Chang, Moon-Ik;Kang, Nam-Sook;Kim, Nam-Sun;Kim, Heejung;Cho, Yoon-Jae;Jeong, Jiyoon;Kim, Mee Kyung;Rhee, Gyu-Seek
    • Journal of Applied Biological Chemistry
    • /
    • v.57 no.3
    • /
    • pp.235-242
    • /
    • 2014
  • The purpose of this study is the establishment of scientific processes for making food safety policies. Thus, we investigated pesticide residue level of the agricultural commodities from market, and performed risk assessment. Fifteen agricultural items are chosen based on the frequency of Korean consumption. The samples were collected from 9 cities where populations are more than one million. Total 283 active ingredients were monitoring ( total sample number =232). Single-analysis of target pesticides was for three kinds of possible growth regulators and the multicomponent analysis was for 280 kinds of pesticides, a total of 283 species were selected to perform the pesticide residues. Before monitoring the analytes, the improvements of the analytical methods were done by method validations under the CODEX analytical method development guidelines and can produce metrics that represent the international standards applied in accordance with the guidelines. In addition to residual pesticides detected during monitoring we compare the ADI to EDI values using detected result and dietary consumption data which is extracted from annual market basket survey. The 163 samples were non-detected in the total 232 samples so it means that every agricultural commodity will residual pesticides-free in 70.3%. The detected residual pesticides showed for a total of 69 cases (29.7%). Two of samples violate Korean MRL (0.9%). The ratio of EDI compared to ADI resulted in only from 0.00087 to 0.902%. In result, we can assume that all detected residual pesticides are very safe level and current policies of Korean pesticides control may be working.

Wind Corridor Analysis and Climate Evaluation with Biotop Map and Airborne LiDAR Data (비오톱 지도와 항공라이다 자료를 이용한 바람통로 분석 및 기후평가)

  • Kim, Yeon-Mee;An, Seung-Man;Moon, Soo-Young;Kim, Hyeon-Soo;Jang, Dae-Hee
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.40 no.6
    • /
    • pp.148-160
    • /
    • 2012
  • The main purpose of this paper is to deliver a climate analysis and evaluation method based on GIS by using airborne LiDAR data and Biotop type map and to provide spatial information of climate analysis and evaluation based on Biotop type Map. At first stage, the area, slope, slope length, surface, wind corridor function and width, and obstacle factors were analyzed to obtain cold/fresh air production and wind corridor evaluation. In addition, climate evaluation was derived from those two results in the second stage. Airborne LiDAR data are useful in wind corridor analysis during the study. Correlation analysis results show that ColdAir_GRD grade was highly correlated with Surface_GRD (-0.967461139) and WindCorridor_ GRD was highly correlated with Function_GRD (-0.883883476) and Obstacle_GRD (-0.834057656). Climate Evaluation GRID was highly correlated with WindCorridor_GRD (0.927554516) than ColdAir_GRD (0.855051646). Visual validations of climate analysis and evaluation results were performed by using aerial ortho-photo image, which shows that the climate evaluation results were well related with in-situ condition. At the end, we applied climate analysis and evaluation by using Biotop map and airborne LiDAR data in Gwangmyung-Shiheung City, candidate for the Bogeumjari Housing District. The results show that the aerial percentile of the 1st Grade is 18.5%, 2nd Grade is 18.2%, 3rd Grade is 30.7%, 4th Grade is 25.2%, and 5th Grade is 7.4%. This study process provided both the spatial analysis and evaluation of climate information and statistics on behalf of each Biotop type.

Introduction of GOCI-II Atmospheric Correction Algorithm and Its Initial Validations (GOCI-II 대기보정 알고리즘의 소개 및 초기단계 검증 결과)

  • Ahn, Jae-Hyun;Kim, Kwang-Seok;Lee, Eun-Kyung;Bae, Su-Jung;Lee, Kyeong-Sang;Moon, Jeong-Eon;Han, Tai-Hyun;Park, Young-Je
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_2
    • /
    • pp.1259-1268
    • /
    • 2021
  • The 2nd Geostationary Ocean Color Imager (GOCI-II) is the successor to the Geostationary Ocean Color Imager (GOCI), which employs one near-ultraviolet wavelength (380 nm) and eight visible wavelengths(412, 443, 490, 510, 555, 620, 660, 680 nm) and three near-infrared wavelengths(709, 745, 865 nm) to observe the marine environment in Northeast Asia, including the Korean Peninsula. However, the multispectral radiance image observed at satellite altitude includes both the water-leaving radiance and the atmospheric path radiance. Therefore, the atmospheric correction process to estimate the water-leaving radiance without the path radiance is essential for analyzing the ocean environment. This manuscript describes the GOCI-II standard atmospheric correction algorithm and its initial phase validation. The GOCI-II atmospheric correction method is theoretically based on the previous GOCI atmospheric correction, then partially improved for turbid water with the GOCI-II's two additional bands, i.e., 620 and 709 nm. The match-up showed an acceptable result, with the mean absolute percentage errors are fall within 5% in blue bands. It is supposed that part of the deviation over case-II waters arose from a lack of near-infrared vicarious calibration. We expect the GOCI-II atmospheric correction algorithm to be improved and updated regularly to the GOCI-II data processing system through continuous calibration and validation activities.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.