Search | Korea Science

Default Prediction for Real Estate Companies with Imbalanced Dataset

Dong, Yuan-Xiang;Xiao, Zhi;Xiao, Xue
- Journal of Information Processing Systems
- /
- v.10 no.2
- /
- pp.314-333
- /
- 2014
When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.
https://doi.org/10.3745/JIPS.04.0002 인용 PDF KSCI

Smart composite repetitive-control design for nonlinear perturbation

ZY Chen;Ruei-Yuan Wang;Yahui Meng;Timothy Chen
- Steel and Composite Structures
- /
- v.51 no.5
- /
- pp.473-485
- /
- 2024
This paper proposes a composite form of fuzzy adaptive control plan based on a robust observer. The fuzzy 2D control gains are regulated by the parameters in the LMIs. Then, control and learning performance indices with weight matrices are constructed as the cost functions, which allows the regulation of the trade-off between the two performance by setting appropriate weight matrices. The design of 2D control gains is equivalent to the LMIs-constrained multi-objective optimization problem under dual performance indices. By using this proposed smart tracking design via fuzzy nonlinear criterion, the data link can be further extended. To evaluate the performance of the controller, the proposed controller was compared with other control technologies. This ensures the execution of the control program used to track position and trajectory in the presence of great model uncertainty and external disturbances. The performance of monitoring and control is verified by quantitative analysis. The goals of this paper are towards access to adequate, safe and affordable housing and basic services, promotion of inclusive and sustainable urbanization and participation, implementation of sustainable and disaster-resilient buildings, sustainable human settlement planning and manage. Therefore, the goal is believed to achieved in the near future by the ongoing development of AI and control theory.
https://doi.org/10.12989/scs.2024.51.5.473 인용

Machine Learning Based Structural Health Monitoring System using Classification and NCA (분류 알고리즘과 NCA를 활용한 기계학습 기반 구조건전성 모니터링 시스템)

Shin, Changkyo;Kwon, Hyunseok;Park, Yurim;Kim, Chun-Gon
- Journal of Advanced Navigation Technology
- /
- v.23 no.1
- /
- pp.84-89
- /
- 2019
This is a pilot study of machine learning based structural health monitoring system using flight data of composite aircraft. In this study, the most suitable machine learning algorithm for structural health monitoring was selected and dimensionality reduction method for application on the actual flight data was conducted. For these tasks, impact test on the cantilever beam with added mass, which is the simulation of damage in the aircraft wing structure was conducted and classification model for damage states (damage location and level) was trained. Through vibration test of cantilever beam with fiber bragg grating (FBG) sensor, data of normal and 12 damaged states were acquired, and the most suitable algorithm was selected through comparison between algorithms like tree, discriminant, support vector machine (SVM), kNN, ensemble. Besides, through neighborhood component analysis (NCA) feature selection, dimensionality reduction which is necessary to deal with high dimensional flight data was conducted. As a result, quadratic SVMs performed best with 98.7% for without NCA and 95.9% for with NCA. It is also shown that the application of NCA improved prediction speed, training time, and model memory.
https://doi.org/10.12673/jant.2019.23.1.84 인용 PDF KSCI HTML

Multi-dimensional Analysis and Prediction Model for Tourist Satisfaction

Shrestha, Deepanjal;Wenan, Tan;Gaudel, Bijay;Rajkarnikar, Neesha;Jeong, Seung Ryul
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.2
- /
- pp.480-502
- /
- 2022
This work assesses the degree of satisfaction tourists receive as final recipients in a tourism destination based on the fact that satisfied tourists can make a significant contribution to the growth and continuous improvement of a tourism business. The work considers Pokhara, the tourism capital of Nepal as a prefecture of study. A stratified sampling methodology with open-ended survey questions is used as a primary source of data for a sample size of 1019 for both international and domestic tourists. The data collected through a survey is processed using a data mining tool to perform multi-dimensional analysis to discover information patterns and visualize clusters. Further, supervised machine learning algorithms, kNN, Decision tree, Support vector machine, Random forest, Neural network, Naive Bayes, and Gradient boost are used to develop models for training and prediction purposes for the survey data. To find the best model for prediction purposes, different performance matrices are used to evaluate a model for performance, accuracy, and robustness. The best model is used in constructing a learning-enabled model for predicting tourists as satisfied, neutral, and unsatisfied visitors. This work is very important for tourism business personnel, government agencies, and tourism stakeholders to find information on tourist satisfaction and factors that influence it. Though this work was carried out for Pokhara city of Nepal, the study is equally relevant to any other tourism destination of similar nature.
https://doi.org/10.3837/tiis.2022.02.007 인용 PDF KSCI HTML

Comparison of Forest Carbon Stocks Estimation Methods Using Forest Type Map and Landsat TM Satellite Imagery (임상도와 Landsat TM 위성영상을 이용한 산림탄소저장량 추정 방법 비교 연구)

Kim, Kyoung-Min;Lee, Jung-Bin;Jung, Jaehoon
- Korean Journal of Remote Sensing
- /
- v.31 no.5
- /
- pp.449-459
- /
- 2015
The conventional National Forest Inventory(NFI)-based forest carbon stock estimation method is suitable for national-scale estimation, but is not for regional-scale estimation due to the lack of NFI plots. In this study, for the purpose of regional-scale carbon stock estimation, we created grid-based forest carbon stock maps using spatial ancillary data and two types of up-scaling methods. Chungnam province was chosen to represent the study area and for which the $5^{th}$ NFI (2006~2009) data was collected. The first method (method 1) selects forest type map as ancillary data and uses regression model for forest carbon stock estimation, whereas the second method (method 2) uses satellite imagery and k-Nearest Neighbor(k-NN) algorithm. Additionally, in order to consider uncertainty effects, the final AGB carbon stock maps were generated by performing 200 iterative processes with Monte Carlo simulation. As a result, compared to the NFI-based estimation(21,136,911 tonC), the total carbon stock was over-estimated by method 1(22,948,151 tonC), but was under-estimated by method 2(19,750,315 tonC). In the paired T-test with 186 independent data, the average carbon stock estimation by the NFI-based method was statistically different from method2(p<0.01), but was not different from method1(p>0.01). In particular, by means of Monte Carlo simulation, it was found that the smoothing effect of k-NN algorithm and mis-registration error between NFI plots and satellite image can lead to large uncertainty in carbon stock estimation. Although method 1 was found suitable for carbon stock estimation of forest stands that feature heterogeneous trees in Korea, satellite-based method is still in demand to provide periodic estimates of un-investigated, large forest area. In these respects, future work will focus on spatial and temporal extent of study area and robust carbon stock estimation with various satellite images and estimation methods.
https://doi.org/10.7780/kjrs.2015.31.5.9 인용 PDF KSCI

Fuzzy neural network controller of interconnected method for civil structures

Chen, Z.Y.;Meng, Yahui;Wang, Ruei-yuan;Chen, Timothy
- Advances in concrete construction
- /
- v.13 no.5
- /
- pp.385-394
- /
- 2022
Recently, an increasing number of cutting-edged studies have shown that designing a smart active control for real-time implementation requires piles of hard-work criteria in the design process, including performance controllers to reduce the tracking errors and tolerance to external interference and measure system disturbed perturbations. This article proposes an effective artificial-intelligence method using these rigorous criteria, which can be translated into general control plants for the management of civil engineering installations. To facilitate the calculation, an efficient solution process based on linear matrix (LMI) inequality has been introduced to verify the relevance of the proposed method, and extensive simulators have been carried out for the numerical constructive model in the seismic stimulation of the active rigidity. Additionally, a fuzzy model of the neural network based system (NN) is developed using an interconnected method for LDI (linear differential) representation determined for arbitrary dynamics. This expression is constructed with a nonlinear sector which converts the nonlinear model into a multiple linear deformation of the linear model and a new state sufficient to guarantee the asymptomatic stability of the Lyapunov function of the linear matrix inequality. In the control design, we incorporated H Infinity optimized development algorithm and performance analysis stability. Finally, there is a numerical practical example with simulations to show the results. The implication results in the RMS response with as well as without tuned mass damper (TMD) of the benchmark building under the external excitation, the El-Centro Earthquake, in which it also showed the simulation using evolved bat algorithmic LMI fuzzy controllers in term of RMS in acceleration and displacement of the building.
https://doi.org/10.12989/acc.2022.13.5.385 인용 KSCI

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

Kim, Kyoung-Jae;Ahn, Hyun-Chul
- Journal of Intelligence and Information Systems
- /
- v.17 no.4
- /
- pp.241-254
- /
- 2011
Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.
https://doi.org/10.13088/jiis.2011.17.4.241 인용 PDF KSCI

Water Balance Projection Using Climate Change Scenarios in the Korean Peninsula (기후변화 시나리오를 활용한 미래 한반도 물수급 전망)

Kim, Cho-Rong;Kim, Young-Oh;Seo, Seung Beom;Choi, Su-Woong
- Journal of Korea Water Resources Association
- /
- v.46 no.8
- /
- pp.807-819
- /
- 2013
This study proposes a new methodology for future water balance projection considering climate change by assigning a weight to each scenario instead of inputting future streamflows based on GCMs into a water balance model directly. K-nearest neighbor algorithm was employed to assign weights and streamflows in non-flood period (October to the following June) was selected as the criterion for assigning weights. GCM-driven precipitation was input to TANK model to simulate future streamflow scenarios and Quantile Mapping was applied to correct bias between GCM hindcast and historical data. Based on these bias-corrected streamflows, different weights were assigned to each streamflow scenarios to calculate water shortage for the projection periods; 2020s (2010~2039), 2050s (2040~2069), and 2080s (2070~2099). As a result by applying the proposed methodology to project water shortage over the Korean Peninsula, average water shortage for 2020s is projected to increase to 10~32% comparing to the basis (1967~2003). In addition, according to getting decreased in streamflows in non-flood period gradually by 2080s, average water shortage for 2080s is projected to increase up to 97% (516.5 million $m^3/yr$) as maximum comparing to the basis. While the existing research on climate change gives radical increase in future water shortage, the results projected by the weighting method shows conservative change. This study has significance in the applicability of water balance projection regarding climate change, keeping the existing framework of national water resources planning and this lessens the confusion for decision-makers in water sectors.
https://doi.org/10.3741/JKWRA.2013.46.8.807 인용 PDF KSCI

A Study of the Feature Classification and the Predictive Model of Main Feed-Water Flow for Turbine Cycle (주급수 유량의 형상 분류 및 추정 모델에 대한 연구)

Yang, Hac Jin;Kim, Seong Kun;Choi, Kwang Hee
- Journal of Energy Engineering
- /
- v.23 no.4
- /
- pp.263-271
- /
- 2014
Corrective thermal performance analysis is required for thermal power plants to determine performance status of turbine cycle. We developed classification method for main feed water flow to make precise correction for performance analysis based on ASME (American Society of Mechanical Engineers) PTC (Performance Test Code). The classification is based on feature identification of status of main water flow. Also we developed predictive algorithms for corrected main feed-water through Support Vector Machine (SVM) Model for each classified feature area. The results was compared to estimations using Neural Network(NN) and Kernel Regression(KR). The feature classification and predictive model of main feed-water flow provides more practical methods for corrective thermal performance analysis of turbine cycle.
https://doi.org/10.5855/ENERGY.2014.23.4.263 인용 PDF KSCI

Discriminant Metric Learning Approach for Face Verification

Chen, Ju-Chin;Wu, Pei-Hsun;Lien, Jenn-Jier James
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.9 no.2
- /
- pp.742-762
- /
- 2015
In this study, we propose a distance metric learning approach called discriminant metric learning (DML) for face verification, which addresses a binary-class problem for classifying whether or not two input images are of the same subject. The critical issue for solving this problem is determining the method to be used for measuring the distance between two images. Among various methods, the large margin nearest neighbor (LMNN) method is a state-of-the-art algorithm. However, to compensate the LMNN's entangled data distribution due to high levels of appearance variations in unconstrained environments, DML's goal is to penalize violations of the negative pair distance relationship, i.e., the images with different labels, while being integrated with LMNN to model the distance relation between positive pairs, i.e., the images with the same label. The likelihoods of the input images, estimated using DML and LMNN metrics, are then weighted and combined for further analysis. Additionally, rather than using the k-nearest neighbor (k-NN) classification mechanism, we propose a verification mechanism that measures the correlation of the class label distribution of neighbors to reduce the false negative rate of positive pairs. From the experimental results, we see that DML can modify the relation of negative pairs in the original LMNN space and compensate for LMNN's performance on faces with large variances, such as pose and expression.
https://doi.org/10.3837/tiis.2015.02.015 인용 PDF KSCI KPUBS HTML

Search Result 281, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)