• Title/Summary/Keyword: Genetic Algorithms(GA)

Search Result 460, Processing Time 0.028 seconds

Domain Knowledge Incorporated Counterfactual Example-Based Explanation for Bankruptcy Prediction Model (부도예측모형에서 도메인 지식을 통합한 반사실적 예시 기반 설명력 증진 방법)

  • Cho, Soo Hyun;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.307-332
    • /
    • 2022
  • One of the most intensively conducted research areas in business application study is a bankruptcy prediction model, a representative classification problem related to loan lending, investment decision making, and profitability to financial institutions. Many research demonstrated outstanding performance for bankruptcy prediction models using artificial intelligence techniques. However, since most machine learning algorithms are "black-box," AI has been identified as a prominent research topic for providing users with an explanation. Although there are many different approaches for explanations, this study focuses on explaining a bankruptcy prediction model using a counterfactual example. Users can obtain desired output from the model by using a counterfactual-based explanation, which provides an alternative case. This study introduces a counterfactual generation technique based on a genetic algorithm (GA) that leverages both domain knowledge (i.e., causal feasibility) and feature importance from a black-box model along with other critical counterfactual variables, including proximity, distribution, and sparsity. The proposed method was evaluated quantitatively and qualitatively to measure the quality and the validity.

Land Use Optimization using Genetic Algorithms - Focused on Yangpyeong-eup - (유전 알고리즘을 적용한 토지이용 최적화 배분 연구 - 양평군 양평읍 일대를 대상으로 -)

  • Park, Yoonsun;Lee, Dongkun;Yoon, Eunjoo;Mo, Yongwon;Leem, Jihun
    • Journal of Environmental Impact Assessment
    • /
    • v.26 no.1
    • /
    • pp.44-56
    • /
    • 2017
  • Sustainable development is important because the ultimate objective is efficient development combining the economic, social, and environmental aspects of urban conservation. Despite Korea's rapid urbanization and economic development, the distribution of resources is inefficient, and land-use is not an exception. Land use distribution is difficult, as it requires considering a variety of purposes, whose solutions lie in a multipurpose optimization process. In this study, Yangpyeong-eup, Yangpyeong, Gyeonggi-do, is selected, as the site has ecological balance, is well-preserved, and has the potential to support population increases. Further, we have used the genetic algorithm method, as it helps to evolve solutions for complex spatial problems such as planning and distribution of land use. This study applies change to the way of mutation. With four goals and restrictions of area, spatial objectives, minimizing land use conversion, ecological conservation, maximizing economic profit, restricting area to a specific land use, and setting a fixed area, we developed an optimal planning map. No urban areas at the site needed preservation and the high urban area growth rate coincided with the optimization of purpose and maximization of economic profit. When the minimum point of the fitness score is the convergence point, we found optimization occurred approximately at 1500 generations. The results of this study can support planning at Yangpyeong-eup.ausative relationship between the perception of improving odor regulation and odor acceptance.

Prediction of Lung Cancer Based on Serum Biomarkers by Gene Expression Programming Methods

  • Yu, Zhuang;Chen, Xiao-Zheng;Cui, Lian-Hua;Si, Hong-Zong;Lu, Hai-Jiao;Liu, Shi-Hai
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.21
    • /
    • pp.9367-9373
    • /
    • 2014
  • In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH), C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1, are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made based on biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint models with a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm that combines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses on relationships between variables in sets of data and then builds models to explain these relationships, and has been successfully used in formula finding and function mining. As a basis for defining a GEP environment for SCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are requentlyused lung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer, basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH), GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 were combined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly, the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. With GEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was 0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminating between NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

A Method of Assigning Weight Values for Qualitative Attributes in CBR Cost Model (사례기반추론 코스트 모델의 정성변수 속성가중치 산정방법)

  • Lee, Hyun-Soo;Kim, Soo-Young;Park, Moon-Seo;Ji, Sae-Hyun;Seong, Ki-Hoon;Pyeon, Jae-Ho
    • Korean Journal of Construction Engineering and Management
    • /
    • v.12 no.1
    • /
    • pp.53-61
    • /
    • 2011
  • For construction projects, the importance of early cost estimates is highly recognized by the project team and sponsoring organization because early cost estimates are frequently a foundation of business decisions as well as a basis for identifying any changes as the project progresses from design to construction. However, it is difficult to accurately estimate construction cost in the early stage of a project due to various uncertainties in construction. To deal with these uncertainties, cost estimates should be made several times over the course of the project. In particular, early cost estimates are essential process for successful project management. For accurate construction cost estimates, it is necessary to compare cost estimates with actual costs based on historical project data. In this context, case-based reasoning (CBR), which is the process of solving new problems based on the solutions of similar past problems, can be considered as an effective method for cost estimating. To obtain this, it is also required to define the attribute similarities and the attribute weights. However, no existing method is capable of determining attribute weights of qualitative variables. Consequently, it has been a well-known barrier of accurate early cost estimates. Using Genetic Algorithms (GA), this research suggests the method of determining the attribute weight of qualitative variables. Based on building project case studies, the proposed methodology was validated.

A Layout Planning Optimization Model for Finishing Work (건축물 마감공사 자재 배치 최적화 모델)

  • Park, Moon-Seo;Yang, Young-Jun;Lee, Hyun-Soo;Han, Sang-Won;Ji, Sae-Hyun
    • Korean Journal of Construction Engineering and Management
    • /
    • v.12 no.1
    • /
    • pp.43-52
    • /
    • 2011
  • Unnecessary transportation of resources are one of the major causes that adversely affect construction site work productivity. Therefore, layout related studies have been conducted with efforts to develop management technologies and techniques to minimize the resource transportation made at site-level. However, although the necessity for floor-level layout planning studies has been increasing as buildings have become larger and floors have become more complicated, studies to optimize the transportation of materials inside buildings are currently not being actively conducted. Therefore, in this study, a model was developed using genetic algorithms(GA) that will enable the optimization of the locations of finishing materials on the work-floor. With the established model, the arrangement of diverse materials on complicated floors can be planned and the optimized material layout planning derived from the model can minimize the total material transportation time spent by laborers during their working day. In addition, to calculate travel distances between work sites and materials realistically, the concept of actual travel distances was applied. To identify the applicability of the developed model and compare it with existing methodologies and analyze it, the model was applied to actual high-rise residential complexes.

Design of Fuzzy PI Controllers for the Temperature Control of Soldering Systems (솔더링 시스템의 온도 제어를 위한 퍼지 PI 제어기 설계)

  • Oh, Kabsuk;Kang, Geuntaek
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.2
    • /
    • pp.325-333
    • /
    • 2016
  • This paper proposes controller design algorithms for a ceramic soldering iron temperature control system, and reports their effectiveness in a control experiment. Because the responses of the ceramic soldering iron temperature to the control input are non-linear and very slow, precise modeling and controller design is difficult. In this study, the temperature characteristics of a ceramic soldering iron are represented by TSK fuzzy models consisting of TSK fuzzy rules. In the fuzzy rules, the premise variable is the control input and the consequences are the transfer functions. The transfer functions in the fuzzy model were obtained from the step input responses. As the responses of the ceramic soldering iron temperature are very slow, it is difficult to obtain the complete step input responses. This paper proposes a genetic algorithm to obtain the transfer functions from an incomplete step input responses, and showed its effectiveness in examples. This paper also reports a fuzzy controller design method from the TSK fuzzy model and examples. The proposed methods were applied to the temperature control experiments of ceramic iron. The TSK fuzzy model consisted of 7 TSK fuzzy rules, and the consequences were PI controllers. The experimental results of the proposed fuzzy PI controller were superior to the linear controller and were as good as in previous studies using a fuzzy PID controller.

Using GA based Input Selection Method for Artificial Neural Network Modeling Application to Bankruptcy Prediction (유전자 알고리즘을 활용한 인공신경망 모형 최적입력변수의 선정: 부도예측 모형을 중심으로)

  • 홍승현;신경식
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.227-249
    • /
    • 2003
  • Prediction of corporate failure using past financial data is a well-documented topic. Early studies of bankruptcy prediction used statistical techniques such as multiple discriminant analysis, logit and probit. Recently, however, numerous studies have demonstrated that artificial intelligence such as neural networks can be an alternative methodology for classification problems to which traditional statistical methods have long been applied. In building neural network model, the selection of independent and dependent variables should be approached with great care and should be treated as model construction process. Irrespective of the efficiency of a teaming procedure in terms of convergence, generalization and stability, the ultimate performance of the estimator will depend on the relevance of the selected input variables and the quality of the data used. Approaches developed in statistical methods such as correlation analysis and stepwise selection method are often very useful. These methods, however, may not be the optimal ones for the development of neural network model. In this paper, we propose a genetic algorithms approach to find an optimal or near optimal input variables fur neural network modeling. The proposed approach is demonstrated by applications to bankruptcy prediction modeling. Our experimental results show that this approach increases overall classification accuracy rate significantly.

  • PDF

Shape Optimum Design of Pultruded FRP Bridge Decks (인발성형된 FRP 바닥판의 형상 최적설계)

  • 조효남;최영민;김희성;김형열;이종순
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.17 no.3
    • /
    • pp.319-332
    • /
    • 2004
  • Due to their high strength to weight ratios and excellent durability, fiber reinforced polymer(FRP) is widely used in construction industries. In this paper, a shape optimum design of FRP bridge decks haying pultruded cellular cross-section is presented. In the problem formulation, an objective function is selected to minimize the volumes. The cross-sectional dimensions and material properties of the deck of FRP bridges are used as the design variables. On the other hand, deflection limits in the design code, material failure criteria, buckling load, minimum height, and stress are selected as the design constraints to enhance the structural performance of FRP decks. In order to efficiently treat the optimization process, the cross-sectional shape of bridge decks is assumed to be a tube shape. The optimization process utilizes an improved Genetic Algorithms incorporating indexing technique. For the structural analysis using a three-dimensional finite element, a commercial package(ABAQUS) is used. Using a computer program coded for this study, an example problem is solved and the results are presented with sensitivity analysis. The bridge consists of a deck width of 12.14m and is supported by five 40m long steel girders spaced at 2.5m. The bridge is designed to carry a standard DB-24 truck loading according to the Standard Specifications for Highway Bridges in Korea. Based on the optimum design, viable cross-sectional dimensions for FRP decks, suitable for pultrusion process are proposed.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.