• Title/Summary/Keyword: Confusion Matrix

Search Result 115, Processing Time 0.027 seconds

Classification of Tablets Using a Handheld NIR/Visible-Light Spectrometer (휴대형 근적외선/가시광선 분광기를 이용한 의약품 분류기법)

  • Kim, Tae-Dong;Lee, Seung-hyun;Baik, Kyung-Jin;Jang, Byung-Jun;Jung, Kyeong-Hoon
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.28 no.8
    • /
    • pp.628-635
    • /
    • 2017
  • It is important to prescribe and take medicines that are appropriate for symptoms, since medicines are closely related to human health and life. Moreover, it becomes more important to accurately classify genuine medicines with counterfeit, since the number of counterfeit increases worldwide. However, the number of high-quality experts who have enough experience to properly classify them is limited and there exists a need for the automatic technique to classify medicine tablets. In this paper, we propose a method to classify the tablets by using a handheld spectrometer which provides both Near Infra-Red (NIR) and visible light spectrums. We adopted Support Vector Machine(SVM) as a machine learning algorithm for tablet classification. As a result of the simulation, we could obtain the classification accuracy of 99.9 % on average by using both NIR and visible light spectrums. Also, we proposed a two-step SVM approach to discriminate the counterfeit tablets from the genuine ones. This method could improve both the accuracy and the processing time.

Variation of Seasonal Groundwater Recharge Analyzed Using Landsat-8 OLI Data and a CART Algorithm (CART알고리즘과 Landsat-8 위성영상 분석을 통한 계절별 지하수함양량 변화)

  • Park, Seunghyuk;Jeong, Gyo-Cheol
    • The Journal of Engineering Geology
    • /
    • v.31 no.3
    • /
    • pp.395-432
    • /
    • 2021
  • Groundwater recharge rates vary widely by location and with time. They are difficult to measure directly and are thus often estimated using simulations. This study employed frequency and regression analysis and a classification and regression tree (CART) algorithm in a machine learning method to estimate groundwater recharge. CART algorithms are considered for the distribution of precipitation by subbasin (PCP), geomorphological data, indices of the relationship between vegetation and landuse, and soil type. The considered geomorphological data were digital elevaion model (DEM), surface slope (SLOP), surface aspect (ASPT), and indices were the perpendicular vegetation index (PVI), normalized difference vegetation index (NDVI), normalized difference tillage index (NDTI), normalized difference residue index (NDRI). The spatio-temperal distribution of groundwater recharge in the SWAT-MOD-FLOW program, was classified as group 4, run in R, sampled for random and a model trained its groundwater recharge was predicted by CART condidering modified PVI, NDVI, NDTI, NDRI, PCP, and geomorphological data. To assess inter-rater reliability for group 4 groundwater recharge, the Kappa coefficient and overall accuracy and confusion matrix using K-fold cross-validation were calculated. The model obtained a Kappa coefficient of 0.3-0.6 and an overall accuracy of 0.5-0.7, indicating that the proposed model for estimating groundwater recharge with respect to soil type and vegetation cover is quite reliable.

Evaluation of International Quality Control Procedures for Detecting Outliers in Water Temperature Time-series at Ieodo Ocean Research Station (이어도 해양과학기지 수온 시계열 자료의 이상값 검출을 위한 국제 품질검사의 성능 평가)

  • Min, Yongchim;Jun, Hyunjung;Jeong, Jin-Yong;Park, Sung-Hwan;Lee, Jaeik;Jeong, Jeongmin;Min, Inki;Kim, Yong Sun
    • Ocean and Polar Research
    • /
    • v.43 no.4
    • /
    • pp.229-243
    • /
    • 2021
  • Quality control (QC) to process observed time series has become more critical as the types and amount of observed data have increased along with the development of ocean observing sensors and communication technology. International ocean observing institutions have developed and operated automatic QC procedures for these observed time series. In this study, the performance of automated QC procedures proposed by U.S. IOOS (Integrated Ocean Observing System), NDBC (National Data Buy Center), and OOI (Ocean Observatory Initiative) were evaluated for observed time-series particularly from the Yellow and East China Seas by taking advantage of a confusion matrix. We focused on detecting additive outliers (AO) and temporary change outliers (TCO) based on ocean temperature observation from the Ieodo Ocean Research Station (I-ORS) in 2013. Our results present that the IOOS variability check procedure tends to classify normal data as AO or TCO. The NDBC variability check tracks outliers well but also tends to classify a lot of normal data as abnormal, particularly in the case of rapidly fluctuating time-series. The OOI procedure seems to detect the AO and TCO most effectively and the rate of classifying normal data as abnormal is also the lowest among the international checks. However, all three checks need additional scrutiny because they often fail to classify outliers when intermittent observations are performed or as a result of systematic errors, as well as tending to classify normal data as outliers in the case where there is abrupt change in the observed data due to a sensor being located within a sharp boundary between two water masses, which is a common feature in shallow water observations. Therefore, this study underlines the necessity of developing a new QC algorithm for time-series occurring in a shallow sea.

Discrimination model for cultivation origin of paper mulberry bast fiber and Hanji based on NIR and MIR spectral data combined with PLS-DA (닥나무 인피섬유와 한지의 원산지 판별모델 개발을 위한 NIR 및 MIR 스펙트럼 데이터의 PLS-DA 적용)

  • Jang, Kyung-Ju;Jung, So-Yoon;Go, In-Hee;Jeong, Seon-Hwa
    • Analytical Science and Technology
    • /
    • v.32 no.1
    • /
    • pp.7-16
    • /
    • 2019
  • The objective of this study was the development of a discrimination model for the cultivational origin of paper mulberry bast fiber and Hanji using near infrared (NIR) and mid infrared (MIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA). Paper mulberry bast fiber was purchased in 10 different regions of Korea, and used to make Hanji. PLS-DA was performed using pre-treated FT-NIR and FT-MIR spectral data for paper mulberry bast fiber and Hanji. PLS-DA of paper mulberry bast fiber and Hanji samples, using FT-NIR spectral data, showed 100 % performance in cross validation and the confusion matrix (accuracy, sensitivity, and specificity). The discrimination models showed four regional groups which demonstrated clearer separation and much superior score plots in the NIR spectral data-based model than in the MIR spectral data-based model. Furthermore, the discrimination model based on the NIR spectral data of paper mulberry bast fiber had highly similar score morphology to that of the discrimination model based on the NIR spectral data of Hanji.

Evaluation of Grid-Based ROI Extraction Method Using a Seamless Digital Map (연속수치지형도를 활용한 격자기준 관심 지역 추출기법의 평가)

  • Jeong, Jong-Chul
    • Journal of Cadastre & Land InformatiX
    • /
    • v.49 no.1
    • /
    • pp.103-112
    • /
    • 2019
  • Extraction of region of interest for satellite image classification is one of the important techniques for efficient management of the national land space. However, recent studies on satellite image classification often depend on the information of the selected image in selecting the region of interest. This study propose an effective method of selecting the area of interest using the continuous digital topographic map constructed from high resolution images. The spatial information used in this research is based on the digital topographic map from 2013 to 2017 provided by the National Geographical Information Institute and the 2015 Sejong City land cover map provided by the Ministry of Environment. To verify the accuracy of the extracted area of interest, KOMPSAT-3A satellite images were used which taken on October 28, 2018 and July 7, 2018. The baseline samples for 2015 were extracted using the unchanged area of the continuous digital topographic map for 2013-2015 and the land cover map for 2015, and also extracted the baseline samples in 2018 using the unchanged area of the continuous digital topographic map for 2015-2017 and the land cover map for 2015. The redundant areas that occurred when merging continuous digital topographic maps and land cover maps were removed to prevent confusion of data. Finally, the checkpoints are generated within the region of interest, and the accuracy of the region of interest extracted from the K3A satellite images and the error matrix in 2015 and 2018 is shown, and the accuracy is approximately 93% and 72%, respectively. The accuracy of the region of interest can be used as a region of interest, and the misclassified region can be used as a reference for change detection.

A Study on the Design of Supervised and Unsupervised Learning Models for Fault and Anomaly Detection in Manufacturing Facilities (제조 설비 이상탐지를 위한 지도학습 및 비지도학습 모델 설계에 관한 연구)

  • Oh, Min-Ji;Choi, Eun-Seon;Roh, Kyung-Woo;Kim, Jae-Sung;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.23-35
    • /
    • 2021
  • In the era of the 4th industrial revolution, smart factories have received great attention, where production and manufacturing technology and ICT converge. With the development of IoT technology and big data, automation of production systems has become possible. In the advanced manufacturing industry, production systems are subject to unscheduled performance degradation and downtime, and there is a demand to reduce safety risks by detecting and reparing potential errors as soon as possible. This study designs a model based on supervised and unsupervised learning for detecting anomalies. The accuracy of XGBoost, LightGBM, and CNN models was compared as a supervised learning analysis method. Through the evaluation index based on the confusion matrix, it was confirmed that LightGBM is most predictive (97%). In addition, as an unsupervised learning analysis method, MD, AE, and LSTM-AE models were constructed. Comparing three unsupervised learning analysis methods, the LSTM-AE model detected 75% of anomalies and showed the best performance. This study aims to contribute to the advancement of the smart factory by combining supervised and unsupervised learning techniques to accurately diagnose equipment failures and predict when abnormal situations occur, thereby laying the foundation for preemptive responses to abnormal situations. do.

Effectiveness of the Detection of Pulmonary Emphysema using VGGNet with Low-dose Chest Computed Tomography Images (저선량 흉부 CT를 이용한 VGGNet 폐기종 검출 유용성 평가)

  • Kim, Doo-Bin;Park, Young-Joon;Hong, Joo-Wan
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.4
    • /
    • pp.411-417
    • /
    • 2022
  • This study aimed to learn and evaluate the effectiveness of VGGNet in the detection of pulmonary emphysema using low-dose chest computed tomography images. In total, 8000 images with normal findings and 3189 images showing pulmonary emphysema were used. Furthermore, 60%, 24%, and 16% of the normal and emphysema data were randomly assigned to training, validation, and test datasets, respectively, in model learning. VGG16 and VGG19 were used for learning, and the accuracy, loss, confusion matrix, precision, recall, specificity, and F1-score were evaluated. The accuracy and loss for pulmonary emphysema detection of the low-dose chest CT test dataset were 92.35% and 0.21% for VGG16 and 95.88% and 0.09% for VGG19, respectively. The precision, recall, and specificity were 91.60%, 98.36%, and 77.08% for VGG16 and 96.55%, 97.39%, and 92.72% for VGG19, respectively. The F1-scores were 94.86% and 96.97% for VGG16 and VGG19, respectively. Through the above evaluation index, VGG19 is judged to be more useful in detecting pulmonary emphysema. The findings of this study would be useful as basic data for the research on pulmonary emphysema detection models using VGGNet and artificial neural networks.

Prediction of Safety Grade of Bridges Using the Classification Models of Decision Tree and Random Forest (의사결정나무 및 랜덤포레스트 분류 모델을 이용한 교량 안전등급 예측)

  • Hong, Jisu;Jeon, Se-Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.397-411
    • /
    • 2023
  • The number of deteriorated bridges with a service period of more than 30 years has been rapidly increasing in Korea. Accordingly, the importance of advanced maintenance technologies through the predictions of age-induced deterioration degree, condition, and performance of bridges is more and more noticed. The prediction method of the safety grade of bridges was proposed in this study using the classification models of the Decision Tree and the Random Forest based on machine learning. As a result of analyzing these models for the 8,850 bridges located in national roads with various evaluation indexes such as confusion matrix, balanced accuracy, recall, ROC curve, and AUC, the Random Forest largely showed better predictive performance than that of the Decision Tree. In particular, random under-sampling in the Random Forest showed higher predictive performance than that of other sampling techniques for the C and D grade bridges, with the recall of 83.4%, which need more attention to maintenance because of the significant deterioration degree. The proposed model can be usefully applied to rapidly identify the safety grade and to establish an efficient and economical maintenance plan of bridges that have not recently been inspected.

Performance Evaluation of Loss Functions and Composition Methods of Log-scale Train Data for Supervised Learning of Neural Network (신경 망의 지도 학습을 위한 로그 간격의 학습 자료 구성 방식과 손실 함수의 성능 평가)

  • Donggyu Song;Seheon Ko;Hyomin Lee
    • Korean Chemical Engineering Research
    • /
    • v.61 no.3
    • /
    • pp.388-393
    • /
    • 2023
  • The analysis of engineering data using neural network based on supervised learning has been utilized in various engineering fields such as optimization of chemical engineering process, concentration prediction of particulate matter pollution, prediction of thermodynamic phase equilibria, and prediction of physical properties for transport phenomena system. The supervised learning requires training data, and the performance of the supervised learning is affected by the composition and the configurations of the given training data. Among the frequently observed engineering data, the data is given in log-scale such as length of DNA, concentration of analytes, etc. In this study, for widely distributed log-scaled training data of virtual 100×100 images, available loss functions were quantitatively evaluated in terms of (i) confusion matrix, (ii) maximum relative error and (iii) mean relative error. As a result, the loss functions of mean-absolute-percentage-error and mean-squared-logarithmic-error were the optimal functions for the log-scaled training data. Furthermore, we figured out that uniformly selected training data lead to the best prediction performance. The optimal loss functions and method for how to compose training data studied in this work would be applied to engineering problems such as evaluating DNA length, analyzing biomolecules, predicting concentration of colloidal suspension.

Establishment of a deep learning-based defect classification system for optimizing textile manufacturing equipment

  • YuLim Kim;Jaeil Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.27-35
    • /
    • 2023
  • In this paper, we propose a process of increasing productivity by applying a deep learning-based defect detection and classification system to the prepreg fiber manufacturing process, which is in high demand in the field of producing composite materials. In order to apply it to toe prepreg manufacturing equipment that requires a solution due to the occurrence of a large amount of defects in various conditions, the optimal environment was first established by selecting cameras and lights necessary for defect detection and classification model production. In addition, data necessary for the production of multiple classification models were collected and labeled according to normal and defective conditions. The multi-classification model is made based on CNN and applies pre-learning models such as VGGNet, MobileNet, ResNet, etc. to compare performance and identify improvement directions with accuracy and loss graphs. Data augmentation and dropout techniques were applied to identify and improve overfitting problems as major problems. In order to evaluate the performance of the model, a performance evaluation was conducted using the confusion matrix as a performance indicator, and the performance of more than 99% was confirmed. In addition, it checks the classification results for images acquired in real time by applying them to the actual process to check whether the discrimination values are accurately derived.