• Title/Summary/Keyword: GA(genetic algorithm)

Search Result 1,520, Processing Time 0.028 seconds

Domain Knowledge Incorporated Counterfactual Example-Based Explanation for Bankruptcy Prediction Model (부도예측모형에서 도메인 지식을 통합한 반사실적 예시 기반 설명력 증진 방법)

  • Cho, Soo Hyun;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.307-332
    • /
    • 2022
  • One of the most intensively conducted research areas in business application study is a bankruptcy prediction model, a representative classification problem related to loan lending, investment decision making, and profitability to financial institutions. Many research demonstrated outstanding performance for bankruptcy prediction models using artificial intelligence techniques. However, since most machine learning algorithms are "black-box," AI has been identified as a prominent research topic for providing users with an explanation. Although there are many different approaches for explanations, this study focuses on explaining a bankruptcy prediction model using a counterfactual example. Users can obtain desired output from the model by using a counterfactual-based explanation, which provides an alternative case. This study introduces a counterfactual generation technique based on a genetic algorithm (GA) that leverages both domain knowledge (i.e., causal feasibility) and feature importance from a black-box model along with other critical counterfactual variables, including proximity, distribution, and sparsity. The proposed method was evaluated quantitatively and qualitatively to measure the quality and the validity.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

Prediction of Lung Cancer Based on Serum Biomarkers by Gene Expression Programming Methods

  • Yu, Zhuang;Chen, Xiao-Zheng;Cui, Lian-Hua;Si, Hong-Zong;Lu, Hai-Jiao;Liu, Shi-Hai
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.21
    • /
    • pp.9367-9373
    • /
    • 2014
  • In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH), C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1, are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made based on biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint models with a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm that combines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses on relationships between variables in sets of data and then builds models to explain these relationships, and has been successfully used in formula finding and function mining. As a basis for defining a GEP environment for SCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are requentlyused lung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer, basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH), GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 were combined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly, the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. With GEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was 0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminating between NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

A Method of Assigning Weight Values for Qualitative Attributes in CBR Cost Model (사례기반추론 코스트 모델의 정성변수 속성가중치 산정방법)

  • Lee, Hyun-Soo;Kim, Soo-Young;Park, Moon-Seo;Ji, Sae-Hyun;Seong, Ki-Hoon;Pyeon, Jae-Ho
    • Korean Journal of Construction Engineering and Management
    • /
    • v.12 no.1
    • /
    • pp.53-61
    • /
    • 2011
  • For construction projects, the importance of early cost estimates is highly recognized by the project team and sponsoring organization because early cost estimates are frequently a foundation of business decisions as well as a basis for identifying any changes as the project progresses from design to construction. However, it is difficult to accurately estimate construction cost in the early stage of a project due to various uncertainties in construction. To deal with these uncertainties, cost estimates should be made several times over the course of the project. In particular, early cost estimates are essential process for successful project management. For accurate construction cost estimates, it is necessary to compare cost estimates with actual costs based on historical project data. In this context, case-based reasoning (CBR), which is the process of solving new problems based on the solutions of similar past problems, can be considered as an effective method for cost estimating. To obtain this, it is also required to define the attribute similarities and the attribute weights. However, no existing method is capable of determining attribute weights of qualitative variables. Consequently, it has been a well-known barrier of accurate early cost estimates. Using Genetic Algorithms (GA), this research suggests the method of determining the attribute weight of qualitative variables. Based on building project case studies, the proposed methodology was validated.

A Layout Planning Optimization Model for Finishing Work (건축물 마감공사 자재 배치 최적화 모델)

  • Park, Moon-Seo;Yang, Young-Jun;Lee, Hyun-Soo;Han, Sang-Won;Ji, Sae-Hyun
    • Korean Journal of Construction Engineering and Management
    • /
    • v.12 no.1
    • /
    • pp.43-52
    • /
    • 2011
  • Unnecessary transportation of resources are one of the major causes that adversely affect construction site work productivity. Therefore, layout related studies have been conducted with efforts to develop management technologies and techniques to minimize the resource transportation made at site-level. However, although the necessity for floor-level layout planning studies has been increasing as buildings have become larger and floors have become more complicated, studies to optimize the transportation of materials inside buildings are currently not being actively conducted. Therefore, in this study, a model was developed using genetic algorithms(GA) that will enable the optimization of the locations of finishing materials on the work-floor. With the established model, the arrangement of diverse materials on complicated floors can be planned and the optimized material layout planning derived from the model can minimize the total material transportation time spent by laborers during their working day. In addition, to calculate travel distances between work sites and materials realistically, the concept of actual travel distances was applied. To identify the applicability of the developed model and compare it with existing methodologies and analyze it, the model was applied to actual high-rise residential complexes.

On the Design of Multi-layered Polygonal Helix Antennas (다각 다단 구조 헬릭스 안테나 설계)

  • Choo Jae-Yul;Choo Ho-Sung;Park Ik-Mo;Oh Yi-Sok
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.17 no.3 s.106
    • /
    • pp.249-258
    • /
    • 2006
  • In this letter, we propose a novel printed helix antenna for RFID reader in UHF band. The printed strip line of the antenna is first wound up outside a polygonal shaped layer and then the winding continues on an inner layer to control the overall gain and the radiation pattern. In addition, the winding pitch angles on each layer have either negative or positive values resulting in the broad CP bandwidth. The detail structure of the antenna was optimized using Pareto genetic algorithm(GA), so as to obtain excellent performances for RFID reader antennas. The optimized two-layered polygonal helix was fabricated on the cardboard of a flexible substrate and the performances were measured and compared with the simulations. The fabricated antenna was made up of copper tape which can adhere to a flexible cardboard and had 21.4 % matching bandwidth, 31.9 % CP bandwidth, readable range of $5.5m^2$ with kr=3.2. Also based on the current distribution of the strip line of the antenna and sensitivity of the antenna bents points, we confirmed that the antenna has the quarter-wave transformer near the feed for the broad matching bandwidth and radiates the traveling wave for the broad CP bandwidth using the bent strip line.

Comparative analysis of auto-calibration methods using QUAL2Kw and assessment on the water quality management alternatives for Sum River (QUAL2Kw 모형을 이용한 자동보정 방법 비교분석과 섬강의 수질관리 대안 평가)

  • Cho, Jae Heon
    • Journal of Environmental Impact Assessment
    • /
    • v.25 no.5
    • /
    • pp.345-356
    • /
    • 2016
  • In this study, auto-calibration method for water quality model was compared and analyzed using QUAL2Kw, which can estimate the optimum parameters through the integration of genetic algorithm and QUAL2K. The QUAL2Kw was applied to the Sum River which is greatly affected by the pollution loads of Wonju city. Two auto-calibration methods were examined: single parameter application for the whole river reach and separate parameter application for each reach of multiple reaches. The analysis about CV(RMSE) and fitness of the GA show that the separate parameter auto-calibration method is better than the single parameter method in the degree of precision. Thus the separate parameter auto-calibration method is applied to the water quality modelling of this study. The calibrated QUAL2Kw was used for the three scenarios for the water quality management of the Sum River, and the water quality impact on the river was analyzed. In scenario 1, which improve the effluent water quality of Wonju WWTP, BOD and TP concentrations of the Sum River 4-1 station which is representative one of Mid-Watershed, are decreased 17.7% and 29.1%, respectively. And immediately after joining the Wonjucheon, BOD and TP concentrations are decreased 50.4% and 40.5%, respectively. In scenario 2, Wonju water supply intake is closed and multi-regional water supply, which come from other watershed except the Sum River, is provided. The Sum River water quality in scenario 2 is slightly improved as the flow of the river is increased. Immediately after joining the Wonjucheon, BOD and TP concentrations are decreased 0.18mg/L and 0.0063mg/L, respectively. In scenario 3, the water quality management alternatives of scenario 1 and 2 are planned simultaneously, the Sum River water quality is slightly more improved than scenario 1. Water quality prediction of the three scenarios indicates that effluent water quality improvement of Wonju WWTP is the most efficient alternative in water quality management of the Sum River. Particularly the Sum River water quality immediately after joining the Wonjucheon is greatly improved. When Wonju water supply intake is closed and multi-regional water supply is provided, the Sum River water quality is slightly improved.

Machinability investigation and sustainability assessment in FDHT with coated ceramic tool

  • Panda, Asutosh;Das, Sudhansu Ranjan;Dhupal, Debabrata
    • Steel and Composite Structures
    • /
    • v.34 no.5
    • /
    • pp.681-698
    • /
    • 2020
  • The paper addresses contribution to the modeling and optimization of major machinability parameters (cutting force, surface roughness, and tool wear) in finish dry hard turning (FDHT) for machinability evaluation of hardened AISI grade die steel D3 with PVD-TiN coated (Al2O3-TiCN) mixed ceramic tool insert. The turning trials are performed based on Taguchi's L18 orthogonal array design of experiments for the development of regression model as well as adequate model prediction by considering tool approach angle, nose radius, cutting speed, feed rate, and depth of cut as major machining parameters. The models or correlations are developed by employing multiple regression analysis (MRA). In addition, statistical technique (response surface methodology) followed by computational approaches (genetic algorithm and particle swarm optimization) have been employed for multiple response optimization. Thereafter, the effectiveness of proposed three (RSM, GA, PSO) optimization techniques are evaluated by confirmation test and subsequently the best optimization results have been used for estimation of energy consumption which includes savings of carbon footprint towards green machining and for tool life estimation followed by cost analysis to justify the economic feasibility of PVD-TiN coated Al2O3+TiCN mixed ceramic tool in FDHT operation. Finally, estimation of energy savings, economic analysis, and sustainability assessment are performed by employing carbon footprint analysis, Gilbert approach, and Pugh matrix, respectively. Novelty aspects, the present work: (i) contributes to practical industrial application of finish hard turning for the shaft and die makers to select the optimum cutting conditions in a range of hardness of 45-60 HRC, (ii) demonstrates the replacement of expensive, time-consuming conventional cylindrical grinding process and proposes the alternative of costlier CBN tool by utilizing ceramic tool in hard turning processes considering technological, economical and ecological aspects, which are helpful and efficient from industrial point of view, (iii) provides environment friendliness, cleaner production for machining of hardened steels, (iv) helps to improve the desirable machinability characteristics, and (v) serves as a knowledge for the development of a common language for sustainable manufacturing in both research field and industrial practice.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.

Development of Neural Network Based Cycle Length Design Model Minimizing Delay for Traffic Responsive Control (실시간 신호제어를 위한 신경망 적용 지체최소화 주기길이 설계모형 개발)

  • Lee, Jung-Youn;Kim, Jin-Tae;Chang, Myung-Soon
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.3 s.74
    • /
    • pp.145-157
    • /
    • 2004
  • The cycle length design model of the Korean traffic responsive signal control systems is devised to vary a cycle length as a response to changes in traffic demand in real time by utilizing parameters specified by a system operator and such field information as degrees of saturation of through phases. Since no explicit guideline is provided to a system operator, the system tends to include ambiguity in terms of the system optimization. In addition, the cycle lengths produced by the existing model have yet been verified if they are comparable to the ones minimizing delay. This paper presents the studies conducted (1) to find shortcomings embedded in the existing model by comparing the cycle lengths produced by the model against the ones minimizing delay and (2) to propose a new direction to design a cycle length minimizing delay and excluding such operator oriented parameters. It was found from the study that the cycle lengths from the existing model fail to minimize delay and promote intersection operational conditions to be unsatisfied when traffic volume is low, due to the feature of the changed target operational volume-to-capacity ratio embedded in the model. The 64 different neural network based cycle length design models were developed based on simulation data surrogating field data. The CORSIM optimal cycle lengths minimizing delay were found through the COST software developed for the study. COST searches for the CORSIM optimal cycle length minimizing delay with a heuristic searching method, a hybrid genetic algorithm. Among 64 models, the best one producing cycle lengths close enough to the optimal was selected through statistical tests. It was found from the verification test that the best model designs a cycle length as similar pattern to the ones minimizing delay. The cycle lengths from the proposed model are comparable to the ones from TRANSYT-7F.