• Title/Summary/Keyword: 성능평가 지표

Search Result 625, Processing Time 0.027 seconds

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.

Discovering Promising Convergence Technologies Using Network Analysis of Maturity and Dependency of Technology (기술 성숙도 및 의존도의 네트워크 분석을 통한 유망 융합 기술 발굴 방법론)

  • Choi, Hochang;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.101-124
    • /
    • 2018
  • Recently, most of the technologies have been developed in various forms through the advancement of single technology or interaction with other technologies. Particularly, these technologies have the characteristic of the convergence caused by the interaction between two or more techniques. In addition, efforts in responding to technological changes by advance are continuously increasing through forecasting promising convergence technologies that will emerge in the near future. According to this phenomenon, many researchers are attempting to perform various analyses about forecasting promising convergence technologies. A convergence technology has characteristics of various technologies according to the principle of generation. Therefore, forecasting promising convergence technologies is much more difficult than forecasting general technologies with high growth potential. Nevertheless, some achievements have been confirmed in an attempt to forecasting promising technologies using big data analysis and social network analysis. Studies of convergence technology through data analysis are actively conducted with the theme of discovering new convergence technologies and analyzing their trends. According that, information about new convergence technologies is being provided more abundantly than in the past. However, existing methods in analyzing convergence technology have some limitations. Firstly, most studies deal with convergence technology analyze data through predefined technology classifications. The technologies appearing recently tend to have characteristics of convergence and thus consist of technologies from various fields. In other words, the new convergence technologies may not belong to the defined classification. Therefore, the existing method does not properly reflect the dynamic change of the convergence phenomenon. Secondly, in order to forecast the promising convergence technologies, most of the existing analysis method use the general purpose indicators in process. This method does not fully utilize the specificity of convergence phenomenon. The new convergence technology is highly dependent on the existing technology, which is the origin of that technology. Based on that, it can grow into the independent field or disappear rapidly, according to the change of the dependent technology. In the existing analysis, the potential growth of convergence technology is judged through the traditional indicators designed from the general purpose. However, these indicators do not reflect the principle of convergence. In other words, these indicators do not reflect the characteristics of convergence technology, which brings the meaning of new technologies emerge through two or more mature technologies and grown technologies affect the creation of another technology. Thirdly, previous studies do not provide objective methods for evaluating the accuracy of models in forecasting promising convergence technologies. In the studies of convergence technology, the subject of forecasting promising technologies was relatively insufficient due to the complexity of the field. Therefore, it is difficult to find a method to evaluate the accuracy of the model that forecasting promising convergence technologies. In order to activate the field of forecasting promising convergence technology, it is important to establish a method for objectively verifying and evaluating the accuracy of the model proposed by each study. To overcome these limitations, we propose a new method for analysis of convergence technologies. First of all, through topic modeling, we derive a new technology classification in terms of text content. It reflects the dynamic change of the actual technology market, not the existing fixed classification standard. In addition, we identify the influence relationships between technologies through the topic correspondence weights of each document, and structuralize them into a network. In addition, we devise a centrality indicator (PGC, potential growth centrality) to forecast the future growth of technology by utilizing the centrality information of each technology. It reflects the convergence characteristics of each technology, according to technology maturity and interdependence between technologies. Along with this, we propose a method to evaluate the accuracy of forecasting model by measuring the growth rate of promising technology. It is based on the variation of potential growth centrality by period. In this paper, we conduct experiments with 13,477 patent documents dealing with technical contents to evaluate the performance and practical applicability of the proposed method. As a result, it is confirmed that the forecast model based on a centrality indicator of the proposed method has a maximum forecast accuracy of about 2.88 times higher than the accuracy of the forecast model based on the currently used network indicators.

An Improved Online Algorithm to Minimize Total Error of the Imprecise Tasks with 0/1 Constraint (0/1 제약조건을 갖는 부정확한 태스크들의 총오류를 최소화시키기 위한 개선된 온라인 알고리즘)

  • Song, Gi-Hyeon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.10
    • /
    • pp.493-501
    • /
    • 2007
  • The imprecise real-time system provides flexibility in scheduling time-critical tasks. Most scheduling problems of satisfying both 0/1 constraint and timing constraints, while the total error is minimized, are NP-complete when the optional tasks have arbitrary processing times. Liu suggested a reasonable strategy of scheduling tasks with the 0/1 constraint on uniprocessors for minimizing the total error. Song et at suggested a reasonable strategy of scheduling tasks with the 0/1 constraint on multiprocessors for minimizing the total error. But, these algorithms are all off-line algorithms. In the online scheduling, the NORA algorithm can find a schedule with the minimum total error for the imprecise online task system. In NORA algorithm, EDF strategy is adopted in the optional scheduling. On the other hand, for the task system with 0/1 constraint, EDF_Scheduling may not be optimal in the sense that the total error is minimized. Furthermore, when the optional tasks are scheduled in the ascending order of their required processing times, NORA algorithm which EDF strategy is adopted may not produce minimum total error. Therefore, in this paper, an online algorithm is proposed to minimize total error for the imprecise task system with 0/1 constraint. Then, to compare the performance between the proposed algorithm and NORA algorithm, a series of experiments are performed. As a conseqence of the performance comparison between two algorithms, it has been concluded that the proposed algorithm can produce similar total error to NORA algorithm when the optional tasks are scheduled in the random order of their required processing times but, the proposed algorithm can produce less total error than NORA algorithm especially when the optional tasks are scheduled in the ascending order of their required processing times.

Improvement of Mid-and Low-flow Estimation Using Variable Nonlinear Catchment Wetness Index (비선형 유역습윤지수를 이용한 평갈수기 유출모의개선)

  • Hyun, Sukhoon;Kang, Boosik;Kim, Jin-Gyeom
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.36 no.5
    • /
    • pp.779-789
    • /
    • 2016
  • The effective rainfall is calculated considering the soil moisture. It utilizes observed data directly in order to incorporate the soil moisture into the rainfall-runoff model, or it calculates indirectly within the model. The rainfall-runoff model, IHACRES, used in this study computes the catchment wetness index (CWI) first varying with temperature and utilize it for estimating precipitation loss. The nonlinear relationship between the CWI and the effective rainfall in the Hapcheondam watershed was derived and utilized for the long-term runoff calculation. The effects of variable and constant CWI during calibration and validation were suggested by flow regime. The results show the variable CWI is generally more effective than the constant CWI. The $R^2$ during high flow period shows relatively higher than the ones during normal or low flow period, but the difference between cases of the variable and constant CWI was insignificant. The results indicates that the high flow is relatively less sensitive to the evaporation and soil moisture associated with temperature. On the other hand, the variable CWI gives more desirable results during normal and low flow periods which means that it is crucial to incorporate evaporation and soil moisture depending on temperature into long-term continuous runoff simulation. The NSE tends to decrease during high flow period with high variability which could be natural because NSE index is largely influenced by outliers of underlying variable. Nevertheless overall NSE shows satisfactory range higher than 0.9. The utilization of variable CWI during normal and low flow period would improve the computation of long-term rainfall-runoff simulation.

A Method of Reproducing the CCT of Natural Light using the Minimum Spectral Power Distribution for each Light Source of LED Lighting (LED 조명의 광원별 최소 분광분포를 사용하여 자연광 색온도를 재현하는 방법)

  • Yang-Soo Kim;Seung-Taek Oh;Jae-Hyun Lim
    • Journal of Internet Computing and Services
    • /
    • v.24 no.2
    • /
    • pp.19-26
    • /
    • 2023
  • Humans have adapted and evolved to natural light. However, as humans stay in indoor longer in modern times, the problem of biorhythm disturbance has been induced. To solve this problem, research is being conducted on lighting that reproduces the correlated color temperature(CCT) of natural light that varies from sunrise to sunset. In order to reproduce the CCT of natural light, multiple LED light sources with different CCTs are used to produce lighting, and then a control index DB is constructed by measuring and collecting the light characteristics of the combination of input currents for each light source in hundreds to thousands of steps, and then using it to control the lighting through the light characteristic matching method. The problem with this control method is that the more detailed the steps of the combination of input currents, the more time and economic costs are incurred. In this paper, an LED lighting control method that applies interpolation and combination calculation based on the minimum spectral power distribution information for each light source is proposed to reproduce the CCT of natural light. First, five minimum SPD information for each channel was measured and collected for the LED lighting, which consisted of light source channels with different CCTs and implemented input current control function of a 256-steps for each channel. Interpolation calculation was performed to generate SPD of 256 steps for each channel for the minimum SPD information, and SPD for all control combinations of LED lighting was generated through combination calculation of SPD for each channel. Illuminance and CCT were calculated through the generated SPD, a control index DB was constructed, and the CCT of natural light was reproduced through a matching technique. In the performance evaluation, the CCT for natural light was provided within the range of an average error rate of 0.18% while meeting the recommended indoor illumination standard.

Estimation of Chlorophyll-a Concentration in Nakdong River Using Machine Learning-Based Satellite Data and Water Quality, Hydrological, and Meteorological Factors (머신러닝 기반 위성영상과 수질·수문·기상 인자를 활용한 낙동강의 Chlorophyll-a 농도 추정)

  • Soryeon Park;Sanghun Son;Jaegu Bae;Doi Lee;Dongju Seo;Jinsoo Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.655-667
    • /
    • 2023
  • Algal bloom outbreaks are frequently reported around the world, and serious water pollution problems arise every year in Korea. It is necessary to protect the aquatic ecosystem through continuous management and rapid response. Many studies using satellite images are being conducted to estimate the concentration of chlorophyll-a (Chl-a), an indicator of algal bloom occurrence. However, machine learning models have recently been used because it is difficult to accurately calculate Chl-a due to the spectral characteristics and atmospheric correction errors that change depending on the water system. It is necessary to consider the factors affecting algal bloom as well as the satellite spectral index. Therefore, this study constructed a dataset by considering water quality, hydrological and meteorological factors, and sentinel-2 images in combination. Representative ensemble models random forest and extreme gradient boosting (XGBoost) were used to predict the concentration of Chl-a in eight weirs located on the Nakdong river over the past five years. R-squared score (R2), root mean square errors (RMSE), and mean absolute errors (MAE) were used as model evaluation indicators, and it was confirmed that R2 of XGBoost was 0.80, RMSE was 6.612, and MAE was 4.457. Shapley additive expansion analysis showed that water quality factors, suspended solids, biochemical oxygen demand, dissolved oxygen, and the band ratio using red edge bands were of high importance in both models. Various input data were confirmed to help improve model performance, and it seems that it can be applied to domestic and international algal bloom detection.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Nitroglycerin-Challenged Tc-99m MIBI Quantitative Gated SPECT to Predict Functional Recovery After Coronary Artery Bypass Surgery (니트로글리세린 투여 Tc-99m-MIBI 정량 게이트 심근SPECT를 이용한 관상동맥우회로술 후 심근 기능 회복 예측)

  • Lee, Dong-Soo;Kim, Yu-Kyeong;Cheon, Gi-Jeong;Paeng, Jin-Chul;Lee, Myoung-Mook;Kim, Ki-Bong;Chung, June-Key;Lee, Myung-Chul
    • The Korean Journal of Nuclear Medicine
    • /
    • v.37 no.5
    • /
    • pp.278-287
    • /
    • 2003
  • Purpose: The performance of nitroglycerin-challenged Tc-99m-MIBI quantitative gated SPECT for the detection of viable myocardium was compared with rest/24-hour redistribution Tl-201 SPECT Materials and Methods: In 22 patients with coronary artery disease, rest Tl-20l/ dipyridamole stress Tc-99m-MIBI gated/24-hour redistribution Tl-201 SPECT were peformed, and gated SPECT was repeated on-site after sublingual administration of nitroglycerin (0.6 mg). Follow-up gated SPECT was done 3 months after coronary artery bypass graft surgery. For 20 segments per patient, perfusion at rest and 24-hour redistribution, and wall motion and thickening at baseline and nitroglycerin-challenged state were quantified. Quantitative viability markers were evaluated and compared;(1) rest thallium uptake, (2) thallium uptake on 24-hour redistribution SPECT, (3) systolic wall thickening at baseline, and (4) systolic wall thickening with nitroglycerin-challenge. Results: Among 100 revascularized dysfunctional segments, wall motion improved in 66 segments (66%) on follow-up gated myocardial SPECT after bypass surgery. On receiver operating characteristic (ROC) curve analysis, the sensitivity and specificity of rest and 24-hour delayed redistribution Tl-201 SPECT were 79%, 44% and 82%, 44%, respectively, at the optimal cutoff value of 50% of Tl-201 uptake. The sensitivity and specificity of systolic wall thickening at baseline and nitroglycerin-challenge were 49%, 50% and 64%, 65% at the optimal cutoff value of 15% of systolic wall thickening. Area under the ROC curve of nitroglycerin-challenged systolic wall thickening was significantly larger than that of baseline systolic wall thickening (p=0.004). Conclusion: Nitroglycerin-challenged quantitative gated Tc-99m-MIBI SPECT was a useful method for predicting functional recovery of dysfunctional myocardium.

Performance Improvement of Dielectric Barrier Plasma Reactor for Advanced Oxidation Process (고급산화공정용 유전체 장벽 플라즈마 반응기의 성능 개선)

  • Kim, Dong-Seog;Park, Young-Seek
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.34 no.7
    • /
    • pp.459-466
    • /
    • 2012
  • In order to improved treatment performance of dielectric barrier discharge (DBD) plasma, plasm + UV process and gas-liquid mixing method has been investigated. This study investigated the degradation of N, N-Dimethyl-4-nitrosoaniline (RNO, indicator of the generation of OH radical). The basic DBD plasma reactor of this study consisted of a plasma reactor (consist of quartz dielectric tube, titanium discharge (inner) and ground (outer) electrode), air and power supply system. Improvement of plasma reactor was done by the combined basic plasma reactor with the UV process, adapt of gas-liquid mixer. The effect of UV power of plasma + UV process (0~10 W), gas-liquid mixing existence and type of mixer, air flow rate (1~6 L/min), range of diffuser pore size (16~$160{\mu}m$), water circulation rate (2.8~9.4 L/min) and UV power of improved plasma + UV process (0~10 W) were evaluated. The experimental results showed that RNO degradation of optimum plasma + UV process was 7.36% higher than that of the basic plasma reactor. It was observed that the RNO decomposition of gas-liquid mixing method was higher than that of the plasma + UV process. Performance for RNO degradation with gas-liquid mixing method lie in: gas-liquid mixing type > pump type > basic reactor. RNO degradation of improved reactor which is adapted gas-liquid mixer of diffuser type showed increase of 17.42% removal efficiency. The optimum air flow rate, range of diffuser pore size and water circulation rate for the RNO degradation at improved reactor system were 4 L/min, 40~$100{\mu}m$ and 6.9 L/min, respectively. Synergistic effect of gas-liquid mixing plasma + UV process was found to be insignificant.

Estimation for Ground Air Temperature Using GEO-KOMPSAT-2A and Deep Neural Network (심층신경망과 천리안위성 2A호를 활용한 지상기온 추정에 관한 연구)

  • Taeyoon Eom;Kwangnyun Kim;Yonghan Jo;Keunyong Song;Yunjeong Lee;Yun Gon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.2
    • /
    • pp.207-221
    • /
    • 2023
  • This study suggests deep neural network models for estimating air temperature with Level 1B (L1B) datasets of GEO-KOMPSAT-2A (GK-2A). The temperature at 1.5 m above the ground impact not only daily life but also weather warnings such as cold and heat waves. There are many studies to assume the air temperature from the land surface temperature (LST) retrieved from satellites because the air temperature has a strong relationship with the LST. However, an algorithm of the LST, Level 2 output of GK-2A, works only clear sky pixels. To overcome the cloud effects, we apply a deep neural network (DNN) model to assume the air temperature with L1B calibrated for radiometric and geometrics from raw satellite data and compare the model with a linear regression model between LST and air temperature. The root mean square errors (RMSE) of the air temperature for model outputs are used to evaluate the model. The number of 95 in-situ air temperature data was 2,496,634 and the ratio of datasets paired with LST and L1B show 42.1% and 98.4%. The training years are 2020 and 2021 and 2022 is used to validate. The DNN model is designed with an input layer taking 16 channels and four hidden fully connected layers to assume an air temperature. As a result of the model using 16 bands of L1B, the DNN with RMSE 2.22℃ showed great performance than the baseline model with RMSE 3.55℃ on clear sky conditions and the total RMSE including overcast samples was 3.33℃. It is suggested that the DNN is able to overcome cloud effects. However, it showed different characteristics in seasonal and hourly analysis and needed to append solar information as inputs to make a general DNN model because the summer and winter seasons showed a low coefficient of determinations with high standard deviations.