• Title/Summary/Keyword: Validation data set

Search Result 381, Processing Time 0.028 seconds

Defect Severity-based Ensemble Model using FCM (FCM을 적용한 결함심각도 기반 앙상블 모델)

  • Lee, Na-Young;Kwon, Ki-Tae
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.12
    • /
    • pp.681-686
    • /
    • 2016
  • Software defect prediction is an important factor in efficient project management and success. The severity of the defect usually determines the degree to which the project is affected. However, existing studies focus only on the presence or absence of a defect and not the severity of defect. In this study, we proposed an ensemble model using FCM based on defect severity. The severity of the defect of NASA data set's PC4 was reclassified. To select the input column that affected the severity of the defect, we extracted the important defect factor of the data set using Random Forest (RF). We evaluated the performance of the model by changing the parameters in the 10-fold cross-validation. The evaluation results were as follows. First, defect severities were reclassified from 58, 40, 80 to 30, 20, 128. Second, BRANCH_COUNT was an important input column for the degree of severity in terms of accuracy and node impurities. Third, smaller tree number led to more variables for good performance.

Structural identification of Humber Bridge for performance prognosis

  • Rahbari, R.;Niu, J.;Brownjohn, J.M.W.;Koo, K.Y.
    • Smart Structures and Systems
    • /
    • v.15 no.3
    • /
    • pp.665-682
    • /
    • 2015
  • Structural identification or St-Id is 'the parametric correlation of structural response characteristics predicted by a mathematical model with analogous characteristics derived from experimental measurements'. This paper describes a St-Id exercise on Humber Bridge that adopted a novel two-stage approach to first calibrate and then validate a mathematical model. This model was then used to predict effects of wind and temperature loads on global static deformation that would be practically impossible to observe. The first stage of the process was an ambient vibration survey in 2008 that used operational modal analysis to estimate a set of modes classified as vertical, torsional or lateral. In the more recent second stage a finite element model (FEM) was developed with an appropriate level of refinement to provide a corresponding set of modal properties. A series of manual adjustments to modal parameters such as cable tension and bearing stiffness resulted in a FEM that produced excellent correspondence for vertical and torsional modes, along with correspondence for the lower frequency lateral modes. In the third stage traffic, wind and temperature data along with deformation measurements from a sparse structural health monitoring system installed in 2011 were compared with equivalent predictions from the partially validated FEM. The match of static response between FEM and SHM data proved good enough for the FEM to be used to predict the un-measurable global deformed shape of the bridge due to vehicle and temperature effects but the FEM had limited capability to reproduce static effects of wind. In addition the FEM was used to show internal forces due to a heavy vehicle to to estimate the worst-case bearing movements under extreme combinations of wind, traffic and temperature loads. The paper shows that in this case, but with limitations, such a two-stage FEM calibration/validation process can be an effective tool for performance prognosis.

Estimation of Forest Productive Area of Quercus acutissima and Quercus mongolica Using Site Environmental Variables (산림 입지토양 환경요인에 의한 상수리나무와 신갈나무의 적지추정)

  • Lee, Seung-Woo;Won, Hyung-Kyu;Shin, Man-Yong;Son, Young-Mo;Lee, Yoon-Young
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.40 no.5
    • /
    • pp.429-434
    • /
    • 2007
  • This study was conducted to estimate site productivity of Quercus acutissima and Quercus mongolica by four forest climatic zones. We used site environmental variables (28 geographical and pedological factors) and site index as a site productivity indicator from nation-wide 23,315 stands. Based on multiple regression analysis between site index and major environmental variables, the best-fit multivaliate models were made by each species and forest climatic zone. Most of site index prediction models by species were regressed with seven to eight factors, including altitude, relief, soil depth, and soil moisture etc. For those models, three evaluation statistics such as mean difference, standard deviation of difference, and standard error of difference were applied to the test data set for the validation of the results. According to the evaluation statistics, it was found that the models by climatic zones and species fitted well to the test data set with relatively low bias and variation. Also having above middle of site index range, total area of productive sites for the two Quercus spp. estimated by those models would be about 6% of total forest area. Northern temperate forest zone and central temperate forest zone had more productive area than southern temperate forest zone and warm temperate forest zone. As a result, it was concluded that the regressive prediction with site environmental variables by climatic zones and species had enough estimation capability of forest site productivity.

Use of deep learning in nano image processing through the CNN model

  • Xing, Lumin;Liu, Wenjian;Liu, Xiaoliang;Li, Xin;Wang, Han
    • Advances in nano research
    • /
    • v.12 no.2
    • /
    • pp.185-195
    • /
    • 2022
  • Deep learning is another field of artificial intelligence (AI) utilized for computer aided diagnosis (CAD) and image processing in scientific research. Considering numerous mechanical repetitive tasks, reading image slices need time and improper with geographical limits, so the counting of image information is hard due to its strong subjectivity that raise the error ratio in misdiagnosis. Regarding the highest mortality rate of Lung cancer, there is a need for biopsy for determining its class for additional treatment. Deep learning has recently given strong tools in diagnose of lung cancer and making therapeutic regimen. However, identifying the pathological lung cancer's class by CT images in beginning phase because of the absence of powerful AI models and public training data set is difficult. Convolutional Neural Network (CNN) was proposed with its essential function in recognizing the pathological CT images. 472 patients subjected to staging FDG-PET/CT were selected in 2 months prior to surgery or biopsy. CNN was developed and showed the accuracy of 87%, 69%, and 69% in training, validation, and test sets, respectively, for T1-T2 and T3-T4 lung cancer classification. Subsequently, CNN (or deep learning) could improve the CT images' data set, indicating that the application of classifiers is adequate to accomplish better exactness in distinguishing pathological CT images that performs better than few deep learning models, such as ResNet-34, Alex Net, and Dense Net with or without Soft max weights.

A Study on the Forecasting of Daily Streamflow using the Multilayer Neural Networks Model (다층신경망모형에 의한 일 유출량의 예측에 관한 연구)

  • Kim, Seong-Won
    • Journal of Korea Water Resources Association
    • /
    • v.33 no.5
    • /
    • pp.537-550
    • /
    • 2000
  • In this study, Neural Networks models were used to forecast daily streamflow at Jindong station of the Nakdong River basin. Neural Networks models consist of CASE 1(5-5-1) and CASE 2(5-5-5-1). The criteria which separates two models is the number of hidden layers. Each model has Fletcher-Reeves Conjugate Gradient BackPropagation(FR-CGBP) and Scaled Conjugate Gradient BackPropagation(SCGBP) algorithms, which are better than original BackPropagation(BP) in convergence of global error and training tolerance. The data which are available for model training and validation were composed of wet, average, dry, wet+average, wet+dry, average+dry and wet+average+dry year respectively. During model training, the optimal connection weights and biases were determined using each data set and the daily streamflow was calculated at the same time. Except for wet+dry year, the results of training were good conditions by statistical analysis of forecast errors. And, model validation was carried out using the connection weights and biases which were calculated from model training. The results of validation were satisfactory like those of training. Daily streamflow forecasting using Neural Networks models were compared with those forecasted by Multiple Regression Analysis Mode(MRAM). Neural Networks models were displayed slightly better results than MRAM in this study. Thus, Neural Networks models have much advantage to provide a more sysmatic approach, reduce model parameters, and shorten the time spent in the model development.

  • PDF

A New Look at the Statistical Method for Remote Sensing of Daily Maximum Air Temperature (위성자료를 이용한 일최고온도 산출의 통계적 접근에 관한 고찰)

  • 변민정;한경수;김영섭
    • Korean Journal of Remote Sensing
    • /
    • v.20 no.2
    • /
    • pp.65-76
    • /
    • 2004
  • This study aims to estimate daily maximum air temperature estimated using satellite-derived surface temperature and Elevation Derivative Database (EDD). The analysis is focused on the establishment of a semi-empirical estimation technique of daily maximum air temperature through the multiple regression analysis. This tests the contribution of EDD in the air temperature estimation when it is added into regression model as an independent variable. The better correlation is shown with the EDD data as compared with the correlation without this data set. In order to provide a progressive estimation technique, we propose and compare three approaches: 1) seasonal estimation non-considering landcover, 2) seasonal estimation considering landcover, and 3) estimation according to landcover type and non-considering season. The last method shows the best fit with the root-mean-square error between 0.56$^{\circ}C$ and 3.14$^{\circ}C$. A cross-validation procedure is performed for third method to valid the estimated values for two major landcover types (cropland and forest). For both landcover types, the validation results show reasonable agreement with estimation results. Therefore it is considered that the estimation technique proposed may be applicable to most parts of South Korea.

Design of Data Fusion and Data Processing Model According to Industrial Types (산업유형별 데이터융합과 데이터처리 모델의 설계)

  • Jeong, Min-Seung;Jin, Seon-A;Cho, Woo-Hyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.2
    • /
    • pp.67-76
    • /
    • 2017
  • In industrial site in various fields it will be generated in combination with large amounts of data have a correlation. It is able to collect a variety of data in types of industry process, but they are unable to integrate each other's association between each process. For the data of the existing industry, the set values of the molding condition table are input by the operator as an arbitrary value When a problem occurs in the work process. In this paper, design the fusion and analysis processing model of data collected for each industrial type, Prediction Case(Automobile Connect), a through for corporate earnings improvement and process manufacturing industries such as master data through standard molding condition table and the production history file comparison collected during the manufacturing process and reduced failure rate with a new molding condition table digitized by arbitrary value for worker, a new pattern analysis and reinterpreted for various malfunction factors and exceptions, increased productivity, process improvement, the cost savings. It can be designed in a variety of data analysis and model validation. In addition, to secure manufacturing process of objectivity, consistency and optimization by standard set values analyzed and verified and may be optimized to support the industry type, fits optimization(standard setting) techniques through various pattern types.

Dynamic RNN-CNN malware classifier correspond with Random Dimension Input Data (임의 차원 데이터 대응 Dynamic RNN-CNN 멀웨어 분류기)

  • Lim, Geun-Young;Cho, Young-Bok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.5
    • /
    • pp.533-539
    • /
    • 2019
  • This study proposes a malware classification model that can handle arbitrary length input data using the Microsoft Malware Classification Challenge dataset. We are based on imaging existing data from malware. The proposed model generates a lot of images when malware data is large, and generates a small image of small data. The generated image is learned as time series data by Dynamic RNN. The output value of the RNN is classified into malware by using only the highest weighted output by applying the Attention technique, and learning the RNN output value by Residual CNN again. Experiments on the proposed model showed a Micro-average F1 score of 92% in the validation data set. Experimental results show that the performance of a model capable of learning and classifying arbitrary length data can be verified without special feature extraction and dimension reduction.

Design Optimization of Transonic Wing/Fuselage System Using Proper Orthogona1 Decomposition (Proper Orthogonal Decomposition을 이용한 천음속 날개/동체 모텔의 최적설계)

  • Park, Kyung-Hyun;Jun, Sang-Ook;Cho, Maeng-Hyo;Lee, Dong-Ho
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.38 no.5
    • /
    • pp.414-420
    • /
    • 2010
  • This paper presents a validation of the accuracy of a reduced order model(ROM) and the efficiency of the design optimization using a Proper Orthogonal Decomposition(POD) to transonic wing/fuselage system. Three dimensional Euler equations are solved to extrude snapshot data of the full order aerodynamic analysis, and then a set of POD basis vectors reproducing the behavior of flow around the wing/fuselage system is calculated from these snapshots. In this study, reduced order model constructed through this procedure is applied to several validation cases, and then it is confirmed that the ROM has the capability of the prediction of flow field in the space of interest. Additionally, after the design optimization of the wing/fuselage system with the ROM is performed, results of the ROM are compared with results of the design optimization using response surface model(RSM). From these, it can be confirmed that the design optimization with the ROM is more efficient than RSM.

Effect of Sample Preparation on Prediction of Fermentation Quality of Maize Silages by Near Infrared Reflectance Spectroscopy

  • Park, H.S.;Lee, J.K.;Fike, J.H.;Kim, D.A.;Ko, M.S.;Ha, Jong Kyu
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.5
    • /
    • pp.643-648
    • /
    • 2005
  • Near infrared reflectance spectroscopy (NIRS) has become increasingly used as a rapid, accurate method of evaluating some chemical constituents in cereal grains and forages. If samples could be analyzed without drying and grinding, then sample preparation time and costs may be reduced. This study was conducted to develop robust NIRS equations to predict fermentation quality of corn (Zea mays) silage and to select acceptable sample preparation methods for prediction of fermentation products in corn silage by NIRS. Prior to analysis, samples (n = 112) were either oven-dried and ground (OD), frozen in liquid nitrogen and ground (LN) and intact fresh (IF). Samples were scanned from 400 to 2,500 nm with an NIRS 6,500 monochromator. The samples were divided into calibration and validation sets. The spectral data were regressed on a range of dry matter (DM), pH and short chain organic acids using modified multivariate partial least squares (MPLS) analysis that used first and second order derivatives. All chemical analyses were conducted with fresh samples. From these treatments, calibration equations were developed successfully for concentrations of all constituents except butyric acid. Prediction accuracy, represented by standard error of prediction (SEP) and $R^2_{v}$ (variance accounted for in validation set), was slightly better with the LN treatment ($R^2$ 0.75-0.90) than for OD ($R^2$ 0.43-0.81) or IF ($R^2$ 0.62-0.79) treatments. Fermentation characteristics could be successfully predicted by NIRS analysis either with dry or fresh silage. Although statistical results for the OD and IF treatments were the lower than those of LN treatment, intact fresh (IF) treatment may be acceptable when processing is costly or when possible component alterations are expected.