• 제목/요약/키워드: Reference dataset

검색결과 120건 처리시간 0.021초

AI 모델의 Robustness 향상을 위한 효율적인 Adversarial Attack 생성 방안 연구 (A Study on Effective Adversarial Attack Creation for Robustness Improvement of AI Models)

  • 정시온;한태현;임승범;이태진
    • 인터넷정보학회논문지
    • /
    • 제24권4호
    • /
    • pp.25-36
    • /
    • 2023
  • 오늘날 AI(Artificial Intelligence) 기술은 보안 분야를 비롯하여 다양한 분야에 도입됨에 따라 기술의 발전이 가속화되고 있다. 하지만 AI 기술의 발전과 더불어 악성 행위 탐지를 교묘하게 우회하는 공격 기법들도 함께 발전되고 있다. 이러한 공격 기법 중 AI 모델의 분류 과정에서 입력값의 미세한 조정을 통해 오 분류와 신뢰도 하락을 유도하는 Adversarial attack이 등장하였다. 앞으로 등장할 공격들은 공격자가 새로이 공격을 생성하는 것이 아닌, Adversarial attack처럼 기존에 생성된 공격에 약간의 변형을 주어 AI 모델의 탐지체계를 회피하는 방식이다. 이러한 악성코드의 변종에도 대응이 가능한 견고한 모델을 만들어야 한다. 본 논문에서는 AI 모델의 Robustness 향상을 위한 효율적인 Adversarial attack 생성 기법으로 2가지 기법을 제안한다. 제안하는 기법은 XAI 기법을 활용한 XAI based attack 기법과 모델의 결정 경계 탐색을 통한 Reference based attack이다. 이후 성능 검증을 위해 악성코드 데이터 셋을 통해 분류 모델을 구축하여 기존의 Adversarial attack 중 하나인 PGD attack과의 성능 비교를 하였다. 생성 속도 측면에서 기존 20분이 소요되는 PGD attack에 비하여 XAI based attack과 Reference based attack이 각각 0.35초, 0.47초 소요되어 매우 빠른 속도를 보이며, 특히 Reference based attack의 경우 생성률이 97.7%로 기존 PGD attack의 생성률인 75.5%에 비해 높은 성공률을 보이는 것을 확인하였다. 따라서 제안한 기법을 통해 더욱 효율적인 Adversarial attack이 가능하며, 이후 견고한 AI 모델을 구축하기 위한 연구에 기여 할 수 있을 것으로 기대한다.

Standardized Breast Cancer Mortality Rate Compared to the General Female Population of Iran

  • Haghighat, S.;Akbari, M.E.;Ghaffari, S.;Yavari, P.
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제13권11호
    • /
    • pp.5525-5528
    • /
    • 2012
  • Introduction: Breast cancer is the most common cancer in women. Improvements of early diagnosis modalities have led to longer survival rates. This study aimed to determine the 5, 10 and 15 year mortality rates of breast cancer patients compared to the normal female population. Materials and Methods: The follow up data of a cohort of 615 breast cancer patients referred to Iranian Breast Cancer Research Center (BCRC) from 1986 to 1996 was considered as reference breast cancer dataset. The dataset was divided into 5 year age groups and the 5, 10 and 15 year probability of death for each group was estimated. The annual mortality rate of Iranian women was obtained from the Death Registry system. Standardized mortality ratios (SMRs) of breast cancer patients were calculated using the ratio of the mortality rate in breast cancer patients over the general female population. Results: The mean age of breast cancer patients at diagnosis time was 45.9 (${\pm}10.5$) years ranging from 24-74. A total of 73, 32 and 2 deaths were recorded at 5, 10 and 15 years, respectively, after diagnosis. The SMRs for breast cancer patients at 5, 10 and 15 year intervals after diagnosis were 6.74 (95% CI, 5.5-8.2), 6.55 (95%CI, 5-8.1) and 1.26 (95%CI, 0.65-2.9), respectively. Conclusion: Results showed that the observed mortality rate of breast cancer patients after 15 years from diagnosis was very similar to expected rates in general female population. This finding would be useful for clinicians and health policy makers to adopt a beneficial strategy to improve breast cancer survival. Further follow-up time with larger sample size and a pooled analysis of survival rates of different centres may shed more light on mortality patterns of breast cancer.

FAFS: A Fuzzy Association Feature Selection Method for Network Malicious Traffic Detection

  • Feng, Yongxin;Kang, Yingyun;Zhang, Hao;Zhang, Wenbo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권1호
    • /
    • pp.240-259
    • /
    • 2020
  • Analyzing network traffic is the basis of dealing with network security issues. Most of the network security systems depend on the feature selection of network traffic data and the detection ability of malicious traffic in network can be improved by the correct method of feature selection. An FAFS method, which is short for Fuzzy Association Feature Selection method, is proposed in this paper for network malicious traffic detection. Association rules, which can reflect the relationship among different characteristic attributes of network traffic data, are mined by association analysis. The membership value of association rules are obtained by the calculation of fuzzy reasoning. The data features with the highest correlation intensity in network data sets are calculated by comparing the membership values in association rules. The dimension of data features are reduced and the detection ability of malicious traffic detection algorithm in network is improved by FAFS method. To verify the effect of malicious traffic feature selection by FAFS method, FAFS method is used to select data features of different dataset in this paper. Then, K-Nearest Neighbor algorithm, C4.5 Decision Tree algorithm and Naïve Bayes algorithm are used to test on the dataset above. Moreover, FAFS method is also compared with classical feature selection methods. The analysis of experimental results show that the precision and recall rate of malicious traffic detection in the network can be significantly improved by FAFS method, which provides a valuable reference for the establishment of network security system.

시공간적 연속성을 이용한 오염된 식생지수(GIMMS NDVI) 화소의 탐지 및 보정 기법 개발 (Detection and Correction of Noisy Pixels Embedded in NDVI Time Series Based on the Spatio-temporal Continuity)

  • 박주희;조아라;강전호;서명석
    • 대기
    • /
    • 제21권4호
    • /
    • pp.337-347
    • /
    • 2011
  • In this paper, we developed a detection and correction method of noisy pixels embedded in the time series of normalized difference vegetation index (NDVI) data based on the spatio-temporal continuity of vegetation conditions. For the application of the method, 25-year (1982-2006) GIMMS (Global Inventory Modeling and Mapping Study) NDVI dataset over the Korean peninsula were used. The spatial resolution and temporal frequency of this dataset are $8{\times}8km^2$ and 15-day, respectively. Also the land cover map over East Asia is used. The noisy pixels are detected by the temporal continuity check with the reference values and dynamic threshold values according to season and location. In general, the number of noisy pixels are especially larger during summer than other seasons. And the detected noisy pixels are corrected by the iterative method until the noisy pixels are completely corrected. At first, the noisy pixels are replaced by the arithmetic weighted mean of two adjacent NDVIs when the two NDVI are normal. After that the remnant noisy pixels are corrected by the weighted average of NDVI of the same land cover according to the distance. After correction, the NDVI values and their variances are increased and decreased by 5% and 50%, respectively. Comparing to the other correction method, this correction method shows a better result especially when the noisy pixels are occurred more than 2 times consistently and the temporal change rates of NDVI are very high. It means that the correction method developed in this study is superior in the reconstruction of maximum NDVI and NDVI at the starting and falling season.

Plants Disease Phenotyping using Quinary Patterns as Texture Descriptor

  • Ahmad, Wakeel;Shah, S.M. Adnan;Irtaza, Aun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권8호
    • /
    • pp.3312-3327
    • /
    • 2020
  • Plant diseases are a significant yield and quality constraint for farmers around the world due to their severe impact on agricultural productivity. Such losses can have a substantial impact on the economy which causes a reduction in farmer's income and higher prices for consumers. Further, it may also result in a severe shortage of food ensuing violent hunger and starvation, especially, in less-developed countries where access to disease prevention methods is limited. This research presents an investigation of Directional Local Quinary Patterns (DLQP) as a feature descriptor for plants leaf disease detection and Support Vector Machine (SVM) as a classifier. The DLQP as a feature descriptor is specifically the first time being used for disease detection in horticulture. DLQP provides directional edge information attending the reference pixel with its neighboring pixel value by involving computation of their grey-level difference based on quinary value (-2, -1, 0, 1, 2) in 0°, 45°, 90°, and 135° directions of selected window of plant leaf image. To assess the robustness of DLQP as a texture descriptor we used a research-oriented Plant Village dataset of Tomato plant (3,900 leaf images) comprising of 6 diseased classes, Potato plant (1,526 leaf images) and Apple plant (2,600 leaf images) comprising of 3 diseased classes. The accuracies of 95.6%, 96.2% and 97.8% for the above-mentioned crops, respectively, were achieved which are higher in comparison with classification on the same dataset using other standard feature descriptors like Local Binary Pattern (LBP) and Local Ternary Patterns (LTP). Further, the effectiveness of the proposed method is proven by comparing it with existing algorithms for plant disease phenotyping.

De novo 시퀀스 어셈블리의 overlap 단계의 최근 연구 실험 분석 (Experimental Analysis of Recent Works on the Overlap Phase of De Novo Sequence Assembly)

  • 임지혁;김선;박근수
    • 정보과학회 논문지
    • /
    • 제45권3호
    • /
    • pp.200-210
    • /
    • 2018
  • 여러 DNA 리드 시퀀스가 주어졌을 때, de novo 시퀀스 어셈블리는 레퍼런스 시퀀스 없이 하나의 시퀀스를 재조립한다. 재조립을 위해 de novo 시퀀스 어셈블리는 리드 사이의 모든 겹침을 계산하는 overlap 단계가 필요하다. Overlap 단계는 전체 연산 중 비용이 가장 많이 들기 때문에 어셈블리의 계산 성능을 좌우한다. 여러 분야에서 overlap 단계를 위한 연구가 많이 발표되고 있는데, 그 중 가장 최신의 세 연구 결과는 Readjoiner, SOF, Lim-Park 알고리즘이다. 최근 염기 분석기술의 큰 발전으로 DNA 리드 데이터 셋을 기존보다 저비용으로 대량 생산하는 것이 가능해져 DNA 리드 데이터 셋을 생성하는 여러 플랫폼들이 개발되었다. 각 플랫폼마다 생성하는 데이터 셋의 통계적 특성이 다르기 때문에 overlap 단계의 성능 평가 시 다양한 통계적 특성의 데이터 셋이 반영되어야 한다. 본 논문은 여러 통계적 특성을 가진 DNA 리드 데이터 셋을 이용하여 위의 세 알고리즘의 성능을 비교 분석한다.

External Validation of a Gastric Cancer Nomogram Derived from a Large-volume Center Using Dataset from a Medium-volume Center

  • Kim, Pyeong Su;Lee, Kyung-Muk;Han, Dong-Seok;Yoo, Moon-Won;Han, Hye Seung;Yang, Han-Kwang;Bang, Ho Yoon
    • Journal of Gastric Cancer
    • /
    • 제17권3호
    • /
    • pp.204-211
    • /
    • 2017
  • Purpose: Recently, a nomogram predicting overall survival after gastric resection was developed and externally validated in Korea and Japan. However, this gastric cancer nomogram is derived from large-volume centers, and the applicability of the nomogram in smaller centers must be proven. The purpose of this study is to externally validate the gastric cancer nomogram using a dataset from a medium-volume center in Korea. Materials and Methods: We retrospectively analyzed 610 patients who underwent radical gastrectomy for gastric cancer from August 1, 2005 to December 31, 2011. Age, sex, number of metastatic lymph nodes (LNs), number of examined LNs, depth of invasion, and location of the tumor were investigated as variables for validation of the nomogram. Both discrimination and calibration of the nomogram were evaluated. Results: The discrimination was evaluated using Harrell's C-index. The Harrell's C-index was 0.83 and the discrimination of the gastric cancer nomogram was appropriate. Regarding calibration, the 95% confidence interval of predicted survival appeared to be on the ideal reference line except in the poorest survival group. However, we observed a tendency for actual survival to be constantly higher than predicted survival in this cohort. Conclusions: Although the discrimination power was good, actual survival was slightly higher than that predicted by the nomogram. This phenomenon might be explained by elongated life span in the recent patient cohort due to advances in adjuvant chemotherapy and improved nutritional status. Future gastric cancer nomograms should consider elongated life span with the passage of time.

A Bibliometric Approach for Department-Level Disciplinary Analysis and Science Mapping of Research Output Using Multiple Classification Schemes

  • Gautam, Pitambar
    • Journal of Contemporary Eastern Asia
    • /
    • 제18권1호
    • /
    • pp.7-29
    • /
    • 2019
  • This study describes an approach for comparative bibliometric analysis of scientific publications related to (i) individual or several departments comprising a university, and (ii) broader integrated subject areas using multiple disciplinary schemes. It uses a custom dataset of scientific publications (ca. 15,000 articles and reviews, published during 2009-2013, and recorded in the Web of Science Core Collections) with author affiliations to the research departments, dedicated to science, technology, engineering, mathematics, and medicine (STEMM), of a comprehensive university. The dataset was subjected, at first, to the department level and discipline level analyses using the newly available KAKEN-L3 classification (based on MEXT/JSPS Grants-in-Aid system), hierarchical clustering, correspondence analysis to decipher the major departmental and disciplinary clusters, and visualization of the department-discipline relationships using two-dimensional stacked bar diagrams. The next step involved the creation of subsets covering integrated subject areas and a comparative analysis of departmental contributions to a specific area (medical, health and life science) using several disciplinary schemes: Essential Science Indicators (ESI) 22 research fields, SCOPUS 27 subject areas, OECD Frascati 38 subordinate research fields, and KAKEN-L3 66 subject categories. To illustrate the effective use of the science mapping techniques, the same subset for medical, health and life science area was subjected to network analyses for co-occurrences of keywords, bibliographic coupling of the publication sources, and co-citation of sources in the reference lists. The science mapping approach demonstrates the ways to extract information on the prolific research themes, the most frequently used journals for publishing research findings, and the knowledge base underlying the research activities covered by the publications concerned.

Dynamic characteristics monitoring of wind turbine blades based on improved YOLOv5 deep learning model

  • W.H. Zhao;W.R. Li;M.H. Yang;N. Hong;Y.F. Du
    • Smart Structures and Systems
    • /
    • 제31권5호
    • /
    • pp.469-483
    • /
    • 2023
  • The dynamic characteristics of wind turbine blades are usually monitored by contact sensors with the disadvantages of high cost, difficult installation, easy damage to the structure, and difficult signal transmission. In view of the above problems, based on computer vision technology and the improved YOLOv5 (You Only Look Once v5) deep learning model, a non-contact dynamic characteristic monitoring method for wind turbine blade is proposed. First, the original YOLOv5l model of the CSP (Cross Stage Partial) structure is improved by introducing the CSP2_2 structure, which reduce the number of residual components to better the network training speed. On this basis, combined with the Deep sort algorithm, the accuracy of structural displacement monitoring is mended. Secondly, for the disadvantage that the deep learning sample dataset is difficult to collect, the blender software is used to model the wind turbine structure with conditions, illuminations and other practical engineering similar environments changed. In addition, incorporated with the image expansion technology, a modeling-based dataset augmentation method is proposed. Finally, the feasibility of the proposed algorithm is verified by experiments followed by the analytical procedure about the influence of YOLOv5 models, lighting conditions and angles on the recognition results. The results show that the improved YOLOv5 deep learning model not only perform well compared with many other YOLOv5 models, but also has high accuracy in vibration monitoring in different environments. The method can accurately identify the dynamic characteristics of wind turbine blades, and therefore can provide a reference for evaluating the condition of wind turbine blades.

자동화 프로그램을 이용한 아동의 전체두개강내용적 평가 (Total Intracranial Volume Measurement for Children by Using an Automatized Program)

  • 이정환;김지은;임성진;주가원;김시경;손정우;신철진;이상익;김혜리
    • 생물정신의학
    • /
    • 제21권3호
    • /
    • pp.81-86
    • /
    • 2014
  • Objectives Total intracranial volume (TIV) is a major nuisance of neuroimaging research for interindividual differences of brain structure and function. Authors intended to prove the reliability of the atlas scaling factor (ASF) method for TIV estimation in FreeSurfer by comparing it with the results of manual tracing as reference method. Methods The TIVs of 26 normal children and 26 children with attention-deficit hyperactivity disorder (ADHD) were obtained by using FreeSurfer reconstruction and manual tracing with T1-weighted images. Manual tracing performed in every 10th slice of MRI dataset from midline of sagittal plane by one researcher who was blinded from clinical data. Another reseacher performed manual tracing independently for randomly selected 20 dataset to verify interrater reliability. Results The interrater reliability was excellent (intraclass coefficient = 0.91, p < 7.1e-07). There were no significant differences of age and gender distribution between normal and ADHD groups. No significant differences were found between TIVs from ASF method and manual tracing. Strong correlation between TIVs from 2 different methods were shown (r = 0.90, p < 2.2e-16). Conclusions The ASF method for TIV estimation by using FreeSurfer showed good agreement with the reference method. We can use the TIV from ASF method for correction in analysis of structural and functional neuroimaging studies with not only elderly subjects but also children, even with ADHD.