• Title/Summary/Keyword: Validation data set

Search Result 379, Processing Time 0.025 seconds

Algorithm for Determining Whether Work Data is Normal using Autoencoder (오토인코더를 이용한 작업 데이터 정상 여부 판단 알고리즘)

  • Kim, Dong-Hyun;Oh, Jeong Seok
    • Journal of the Korean Institute of Gas
    • /
    • v.25 no.5
    • /
    • pp.63-69
    • /
    • 2021
  • In this study, we established an algorithm to determine whether the work in the gas facility is a normal work or an abnormal work using the threshold of the reconstruction error of the autoencoder. This algorithm do deep learning the autoencoder only with time-series data of a normal work, and derives the optimized threshold of the reconstruction error of the normal work. We applied this algorithm to the time series data of the new work to get the reconstruction error, and then compare it with the reconstruction error threshold of the normal work to determine whether the work is normal work or abnormal work. In order to train and validate this algorithm, we defined the work in a virtual gas facility, and constructed the training data set consisting only of normal work data and the validation data set including both normal work and abnormal work data.

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data (네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구)

  • Ryu, Kyung Joon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.411-418
    • /
    • 2020
  • In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.

Accuracy evaluation of liver and tumor auto-segmentation in CT images using 2D CoordConv DeepLab V3+ model in radiotherapy

  • An, Na young;Kang, Young-nam
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.341-352
    • /
    • 2022
  • Medical image segmentation is the most important task in radiation therapy. Especially, when segmenting medical images, the liver is one of the most difficult organs to segment because it has various shapes and is close to other organs. Therefore, automatic segmentation of the liver in computed tomography (CT) images is a difficult task. Since tumors also have low contrast in surrounding tissues, and the shape, location, size, and number of tumors vary from patient to patient, accurate tumor segmentation takes a long time. In this study, we propose a method algorithm for automatically segmenting the liver and tumor for this purpose. As an advantage of setting the boundaries of the tumor, the liver and tumor were automatically segmented from the CT image using the 2D CoordConv DeepLab V3+ model using the CoordConv layer. For tumors, only cropped liver images were used to improve accuracy. Additionally, to increase the segmentation accuracy, augmentation, preprocess, loss function, and hyperparameter were used to find optimal values. We compared the CoordConv DeepLab v3+ model using the CoordConv layer and the DeepLab V3+ model without the CoordConv layer to determine whether they affected the segmentation accuracy. The data sets used included 131 hepatic tumor segmentation (LiTS) challenge data sets (100 train sets, 16 validation sets, and 15 test sets). Additional learned data were tested using 15 clinical data from Seoul St. Mary's Hospital. The evaluation was compared with the study results learned with a two-dimensional deep learning-based model. Dice values without the CoordConv layer achieved 0.965 ± 0.01 for liver segmentation and 0.925 ± 0.04 for tumor segmentation using the LiTS data set. Results from the clinical data set achieved 0.927 ± 0.02 for liver division and 0.903 ± 0.05 for tumor division. The dice values using the CoordConv layer achieved 0.989 ± 0.02 for liver segmentation and 0.937 ± 0.07 for tumor segmentation using the LiTS data set. Results from the clinical data set achieved 0.944 ± 0.02 for liver division and 0.916 ± 0.18 for tumor division. The use of CoordConv layers improves the segmentation accuracy. The highest of the most recently published values were 0.960 and 0.749 for liver and tumor division, respectively. However, better performance was achieved with 0.989 and 0.937 results for liver and tumor, which would have been used with the algorithm proposed in this study. The algorithm proposed in this study can play a useful role in treatment planning by improving contouring accuracy and reducing time when segmentation evaluation of liver and tumor is performed. And accurate identification of liver anatomy in medical imaging applications, such as surgical planning, as well as radiotherapy, which can leverage the findings of this study, can help clinical evaluation of the risks and benefits of liver intervention.

Power Consumption Forecasting Scheme for Educational Institutions Based on Analysis of Similar Time Series Data (유사 시계열 데이터 분석에 기반을 둔 교육기관의 전력 사용량 예측 기법)

  • Moon, Jihoon;Park, Jinwoong;Han, Sanghoon;Hwang, Eenjun
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.954-965
    • /
    • 2017
  • A stable power supply is very important for the maintenance and operation of the power infrastructure. Accurate power consumption prediction is therefore needed. In particular, a university campus is an institution with one of the highest power consumptions and tends to have a wide variation of electrical load depending on time and environment. For this reason, a model that can accurately predict power consumption is required for the effective operation of the power system. The disadvantage of the existing time series prediction technique is that the prediction performance is greatly degraded because the width of the prediction interval increases as the difference between the learning time and the prediction time increases. In this paper, we first classify power data with similar time series patterns considering the date, day of the week, holiday, and semester. Next, each ARIMA model is constructed based on the classified data set and a daily power consumption forecasting method of the university campus is proposed through the time series cross-validation of the predicted time. In order to evaluate the accuracy of the prediction, we confirmed the validity of the proposed method by applying performance indicators.

Analysis and Cut-off Adjustment of Dried Blood Spot 17alpha-hydroxyprogesterone Concentration by Birth Weight (신생아의 출생 체중에 따른 혈액 여과지 17alpha-hydroxyprogesterone의 농도 분석 및 판정 기준 조정)

  • Park, Seungman;Kwon, Aerin;Yang, Songhyeon;Park, Euna;Choi, Jaehwang;Hwang, Mijung;Nam, Hyeongyeong;Lee, Eunhee
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.14 no.2
    • /
    • pp.150-155
    • /
    • 2014
  • The measurement of $17{\alpha}$-hydroxyprogesterone ($17{\alpha}$-OHP) in a dried blood spot on filter paper is an important for screening of congenital adrenal hyperplasia (CAH). Since high levels of $17{\alpha}$-OHP are frequently observed in premature infants without congenital adrenal hyperplasia, we evaluated cuts-off based on birth weight and performed validation. Birth weight and $17{\alpha}$-OHP concentration data of 292,204 newborn screening subjects in Greencross labopratories were analyzed. The cut-off values based on birth weight were newly evaluated and validated with the original data. The mean $17{\alpha}$-OHP concentration were 7.25 ng/mL in very low birth weight (VLBW) group, 4.02 ng/mL in low birth weight (LBW) group, 2.53 g/mL in normal birth weight (NBW) group, and 2.24 ng/mL in heavy birth weight (HBW) group. The cut-offs for CAH were decided as follows: 21.12 ng/mL for VLBW and LBW groups and 11.14 ng/mL for NBW and HBW groups. When applied new cut-offs for original data, positive rates in VLBW and LBW groups were decreased and positive rates in NBW and HBW groups were increased. The cut-offs based on birth weight should be used in the screening for CAH. We believe that our new cut-off reduce the false positive rate and false negative rate and our experience for cut-off set up and validation will be helpful for other laboratories doing newborn screening test.

Prediction and Verification of Distribution Potential of the Debris Landforms in the Southwest Region of the Korean Peninsula (한반도 서남부 암설사면지형의 분포가능성 예측 및 검증)

  • Lee, Seong-Ho;Jang, Dong-Ho
    • Journal of The Geomorphological Association of Korea
    • /
    • v.27 no.2
    • /
    • pp.1-17
    • /
    • 2020
  • This study evaluated a debris landform distribution potential area map in the southwest region of the Korean peninsula. A GIS spatial integration technique and logistic regression method were used to produce a distribution potential area map. Seven topographic and environmental factors were considered for analysis and 28 different data set were combined and used to get most effective results. Moreover, in an accuracy assessment, the extracted results of the Distribution Potential area were evaluated by conducting a cross-validation module. Block stream showed the highest accuracy in the combination No. 6, and that DEM (digital elevation model) and TWI (topographic wetness index) have relatively high influences on the production of the Block stream Distribution Potential area map. Talus showed the highest accuracy in the combination No. 13. We also found that slope, TWI and geology have relatively high influences on the production of the Talus Distribution Potential area map. In addition, fieldwork confirmed the accuracy of the input data that were used in this study, and the slope and geology were also similar. It was also determined that these input data were relatively accurate. In the case of angularity, the block stream was composed of sub-rounded and sub-angular systems and Talus showed differences according to the terrain formation. Although the results of the rebound strain measurement using a Schmidt's hammer did not shown any difference in topographic conditions, it is determined that the rebound strain results reflected the underlying geological setting.

Hydrologic Calibration of HSPF Model using Parameter Estimation (PEST) Program at Imha Watershed (PEST를 이용한 임하호유역 HSPF 수문 보정)

  • Jeon, Ji-Hong;Kim, Tae-Il;Choi, Donghyuk;Lim, Kyung-Jae;Kim, Tae-Dong
    • Journal of Korean Society on Water Environment
    • /
    • v.26 no.5
    • /
    • pp.802-809
    • /
    • 2010
  • An automatic calibration tool of Hydrological Simulation Program-Fortran (HSPF), Parameter Estimation (PEST) program, was applied at the Imha lake watershed to get optimal hydrological parameters of HSPF. Calibration of HSPF parameters was performed during 2004 ~ 2008 by PEST and validation was carried out to examine the model's ability by using another data set of 1999 ~ 2003. The calibrated HSPF parameters had tendencies to minimize water loss to soil layer by infiltration and deep percolation and to atmosphere by evapotranspiration and maximize runoff rate. The results of calibration indicated that the PEST program could calibrate the hydrological parameters of HSPF with showing 0.83 and 0.97 Nash-Sutcliffe coefficient (NS) for daily and monthly stream flow and -3% of relative error for yearly stream flow. The validation results also represented high model efficiency with showing 0.88 and 0.95, -10% relative error for daily, monthly, and yearly stream flow. These statistical values of daily, monthly, and yearly stream flow for calibration and validation show a 'very good' agreement between observed and simulated values. Overall, the PEST program was useful for automatic calibration of HSPF, and reduced numerous time and effort for model calibration, and improved model setup.

Validation of a CFD Analysis Model for the Calculation of CANDU6 Moderator Temperature Distribution (CANDU6 감속재 온도분포 계산을 위한 CFD 해석모델의 타당성 검토)

  • Yoon, Churl;Rhee, Bo-Wook;Min, Byung-Joo
    • Proceedings of the KSME Conference
    • /
    • 2001.11b
    • /
    • pp.499-504
    • /
    • 2001
  • A validation of a 3D CFD model for predicting local subcooling of moderator in the vicinity of calandria tubes in a CANDU reactor is performed. The small scale moderator experiments performed at Sheridan Park Experimental Laboratory(SPEL) in Ontario, Canada[1] is used for the validation. Also a comparison is made between previous CFD analyses based on 2DMOTH and PHOENICS, and the current model analysis for the same SPEL experiment. For the current model, a set of grid structures for the same geometry as the experimental test section is generated and the momentum, heat and continuity equations are solved by CFX-4.3, a CFD code developed by AEA technology. The matrix of calandria tubes is simplified by the porous media approach. The standard $k-\varepsilon$ turbulence model associated with logarithmic wall treatment and SIMPLEC algorithm on the body fitted grid are used and buoyancy effects are accounted for by the Boussinesq approximation. For the test conditions simulated in this study, the flow pattern identified is a buoyancy-dominated flow, which is generated by the interaction between the dominant buoyancy force by heating and inertial momentum forces by the inlet jets. As a result, the current CFD moderator analysis model predicts the moderator temperature reasonably, and the maximum error against the experimental data is kept at less than $2.0^{\circ}C$ over the whole domain. The simulated velocity field matches with the visualization of SPEL experiments quite well.

  • PDF

Use of Near-Infrared Spectroscopy for Estimating Lignan Glucosides Contents in Intact Sesame Seeds

  • Kim, Kwan-Su;Park, Si-Hyung;Shim, Kang-Bo;Ryu, Su-Noh
    • Journal of Crop Science and Biotechnology
    • /
    • v.10 no.3
    • /
    • pp.185-192
    • /
    • 2007
  • Near-infrared spectroscopy(NIRS) was used to develop a rapid and efficient method to determine lignan glucosides in intact seeds of sesame(Sesamum indicum L.) germplasm accessions in Korea. A total of 93 samples(about 2 g of intact seeds) were scanned in the reflectance mode of a scanning monochromator, and the reference values for lignan glucosides contents were measured by high performance liquid chromatography. Calibration equations for sesaminol triglucoside, sesaminol($1{\rightarrow}2$) diglucoside, sesamolinol diglucoside, sesaminol($1{\rightarrow}6$) diglucoside, and total amount of lignan glucosides were developed using modified partial least square regression with internal cross validation(n=63), which exhibited lower SECV(standard errors of cross-validation), higher $R^2$(coefficient of determination in calibration), and higher 1-VR(ratio of unexplained variance divided by variance) values. Prediction of an external validation set(n=30) showed a significant correlation between reference values and NIRS estimated values based on the SEP(standard error of prediction), $r^2$(coefficient of determination in prediction), and the ratio of standard deviation(SD) of reference data to SEP, as factors used to evaluate the accuracy of equations. The models for each glucoside content had relatively higher values of SD/SEP(C) and $r^2$(more than 2.0 and 0.80, respectively), thereby characterizing those equations as having good quantitative information, while those of sesaminol($1{\rightarrow}2$) diglucoside showing a minor quantity had the lowest SD/SEP(C) and $r^2$ values(1.7 and 0.74, respectively), indicating a poor correlation between reference values and NIRS estimated values. The results indicated that NIRS could be used to rapidly determine lignan glucosides content in sesame seeds in the breeding programs for high quality sesame varieties.

  • PDF

Use of Near-Infrared Spectroscopy for Estimating Fatty Acid Composition in Intact Seeds of Rapeseed

  • Kim, Kwan-Su;Park, Si-Hyung;Choung, Myoung-Gun;Jang, Young-Seok
    • Journal of Crop Science and Biotechnology
    • /
    • v.10 no.1
    • /
    • pp.13-18
    • /
    • 2007
  • Near-infrared spectroscopy(NIRS) was used as a rapid and nondestructive method to determine the fatty acid composition in intact seed samples of rapeseed(Brassica napus L.). A total of 349 samples(about 2 g of intact seeds) were scanned in the reflectance mode of a scanning monochromator, and the reference values for fatty acid composition were measured by gas-liquid chromatography. Calibration equations for individual fatty acids were developed using the regression method of modified partial least-squares with internal cross validation(n=249). The equations had low SECV(standard errors of cross-validation), and high $R^2$(coefficient of determination in calibration) values(>0.8) except for palmitic and eicosenoic acid. Prediction of an external validation set(n=100) showed significant correlation between reference values and NIRS estimated values based on the SEP(standard error of prediction), $r^2$(coefficient of determination in prediction), and the ratio of standard deviation(SD) of reference data to SEP. The models developed in this study had relatively higher values(> 3.0 and 0.9, respectively) of SD/SEP(C) and $r^2$ for oleic, linoleic, and erucic acid, characterizing those equations as having good quantitative information. The results indicated that NIRS could be used to rapidly determine the fatty acid composition in rapeseed seeds in the breeding programs for high quality rapeseed oil.

  • PDF