• Title/Summary/Keyword: model predictive

Search Result 2,328, Processing Time 0.031 seconds

Development of Kimchi Cabbage Growth Prediction Models Based on Image and Temperature Data (영상 및 기온 데이터 기반 배추 생육예측 모형 개발)

  • Min-Seo Kang;Jae-Sang Shim;Hye-Jin Lee;Hee-Ju Lee;Yoon-Ah Jang;Woo-Moon Lee;Sang-Gyu Lee;Seung-Hwan Wi
    • Journal of Bio-Environment Control
    • /
    • v.32 no.4
    • /
    • pp.366-376
    • /
    • 2023
  • This study was conducted to develop a model for predicting the growth of kimchi cabbage using image data and environmental data. Kimchi cabbages of the 'Cheongmyeong Gaual' variety were planted three times on July 11th, July 19th, and July 27th at a test field located at Pyeongchang-gun, Gangwon-do (37°37' N 128°32' E, 510 elevation), and data on growth, images, and environmental conditions were collected until September 12th. To select key factors for the kimchi cabbage growth prediction model, a correlation analysis was conducted using the collected growth data and meteorological data. The correlation coefficient between fresh weight and growth degree days (GDD) and between fresh weight and integrated solar radiation showed a high correlation coefficient of 0.88. Additionally, fresh weight had significant correlations with height and leaf area of kimchi cabbages, with correlation coefficients of 0.78 and 0.79, respectively. Canopy coverage was selected from the image data and GDD was selected from the environmental data based on references from previous researches. A prediction model for kimchi cabbage of biomass, leaf count, and leaf area was developed by combining GDD, canopy coverage and growth data. Single-factor models, including quadratic, sigmoid, and logistic models, were created and the sigmoid prediction model showed the best explanatory power according to the evaluation results. Developing a multi-factor growth prediction model by combining GDD and canopy coverage resulted in improved determination coefficients of 0.9, 0.95, and 0.89 for biomass, leaf count, and leaf area, respectively, compared to single-factor prediction models. To validate the developed model, validation was conducted and the determination coefficient between measured and predicted fresh weight was 0.91, with an RMSE of 134.2 g, indicating high prediction accuracy. In the past, kimchi cabbage growth prediction was often based on meteorological or image data, which resulted in low predictive accuracy due to the inability to reflect on-site conditions or the heading up of kimchi cabbage. Combining these two prediction methods is expected to enhance the accuracy of crop yield predictions by compensating for the weaknesses of each observation method.

Long Term Follow Up of Interferon-alpha Treatment in Children with Chronic Hepatitis B (만성 B형간염 환아에 대한 Interferon-alpha 치료결과의 장기 추적관찰)

  • Baek, Seoung-Yon;Eom, Ji-Hyun;Chung, Ki-Sup
    • Pediatric Gastroenterology, Hepatology & Nutrition
    • /
    • v.6 no.2
    • /
    • pp.140-151
    • /
    • 2003
  • Purpose: We tried to evaluate the long term efficacy and positive predictive factors of interferon-alpha treatment in children with chronic hepatitis B. Methods: The study population included 113 children who received interferon therapy between May 1982 and July 2002 (20 years) for chronic hepatitis B in Department of Pediatrics, Yonsei University College of Medicine. Male to female ratio was 2.3 : 1 and the mean age at diagnosis was $11.1{\pm}4.1$ years old. Response to treatment was defined as normalization of alanine aminotransferase (ALT), disappearance of HBeAg and HBV-DNA Eighty two children responded while 32 did not. Interferon-alpha was given intramuscularly for 6 months at a dosage of $3{\times}10^6$ unit, 3 times weekly. In relapsed cases, lamivudine or interferon retreatment was done. Results: Seroconversion rate was 77.0% in terms of HBeAg, 74.3% in terms of HBV-DNA, and 80.5% in terms of ALT normalization after treatment. Seroconversion rate of both HBeAg and HBV-DNA was 72.6%. Analyzed by life table method, the effect of the treatment had been maintained over 10 years after cessation of therapy. Pre-treatment ALT level was the only significant positive predictive factor of response. Eleven cases (13.4%) relapsed, and 2 out of 3 showed response when treated with lamivudine and 1 out of 3 with interferon retreatment. Conclusion: Interferon-alpha showed significant efficacy in the treatment of chronic hepatitis B in our study. Further studies about the effect of interferon therapy on complications of hepatitis such as hepatocarcinoma, cirrhosis are warranted.

  • PDF

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Comparative analysis of activation functions of artificial neural network for prediction of optimal groundwater level in the middle mountainous area of Pyoseon watershed in Jeju Island (제주도 표선유역 중산간지역의 최적 지하수위 예측을 위한 인공신경망의 활성화함수 비교분석)

  • Shin, Mun-Ju;Kim, Jin-Woo;Moon, Duk-Chul;Lee, Jeong-Han;Kang, Kyung Goo
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1143-1154
    • /
    • 2021
  • The selection of activation function has a great influence on the groundwater level prediction performance of artificial neural network (ANN) model. In this study, five activation functions were applied to ANN model for two groundwater level observation wells in the middle mountainous area of the Pyoseon watershed in Jeju Island. The results of the prediction of the groundwater level were compared and analyzed, and the optimal activation function was derived. In addition, the results of LSTM model, which is a widely used recurrent neural network model, were compared and analyzed with the results of the ANN models with each activation function. As a result, ELU and Leaky ReLU functions were derived as the optimal activation functions for the prediction of the groundwater level for observation well with relatively large fluctuations in groundwater level and for observation well with relatively small fluctuations, respectively. On the other hand, sigmoid function had the lowest predictive performance among the five activation functions for training period, and produced inappropriate results in peak and lowest groundwater level prediction. The ANN-ELU and ANN-Leaky ReLU models showed groundwater level prediction performance comparable to that of the LSTM model, and thus had sufficient potential for application. The methods and results of this study can be usefully used in other studies.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Abnormal Water Temperature Prediction Model Near the Korean Peninsula Using LSTM (LSTM을 이용한 한반도 근해 이상수온 예측모델)

  • Choi, Hey Min;Kim, Min-Kyu;Yang, Hyun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.3
    • /
    • pp.265-282
    • /
    • 2022
  • Sea surface temperature (SST) is a factor that greatly influences ocean circulation and ecosystems in the Earth system. As global warming causes changes in the SST near the Korean Peninsula, abnormal water temperature phenomena (high water temperature, low water temperature) occurs, causing continuous damage to the marine ecosystem and the fishery industry. Therefore, this study proposes a methodology to predict the SST near the Korean Peninsula and prevent damage by predicting abnormal water temperature phenomena. The study area was set near the Korean Peninsula, and ERA5 data from the European Center for Medium-Range Weather Forecasts (ECMWF) was used to utilize SST data at the same time period. As a research method, Long Short-Term Memory (LSTM) algorithm specialized for time series data prediction among deep learning models was used in consideration of the time series characteristics of SST data. The prediction model predicts the SST near the Korean Peninsula after 1- to 7-days and predicts the high water temperature or low water temperature phenomenon. To evaluate the accuracy of SST prediction, Coefficient of determination (R2), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) indicators were used. The summer (JAS) 1-day prediction result of the prediction model, R2=0.996, RMSE=0.119℃, MAPE=0.352% and the winter (JFM) 1-day prediction result is R2=0.999, RMSE=0.063℃, MAPE=0.646%. Using the predicted SST, the accuracy of abnormal sea surface temperature prediction was evaluated with an F1 Score (F1 Score=0.98 for high water temperature prediction in summer (2021/08/05), F1 Score=1.0 for low water temperature prediction in winter (2021/02/19)). As the prediction period increased, the prediction model showed a tendency to underestimate the SST, which also reduced the accuracy of the abnormal water temperature prediction. Therefore, it is judged that it is necessary to analyze the cause of underestimation of the predictive model in the future and study to improve the prediction accuracy.

Development and Testing of a RIVPACS-type Model to Assess the Ecosystem Health in Korean Streams: A Preliminary Study (저서성 대형무척추동물을 이용한 RIVPACS 유형의 하천생태계 건강성 평가법 국내 하천 적용성)

  • Da-Yeong Lee;Dae-Seong Lee;Joong-Hyuk Min;Young-Seuk Park
    • Korean Journal of Ecology and Environment
    • /
    • v.56 no.1
    • /
    • pp.45-56
    • /
    • 2023
  • In stream ecosystem assessment, RIVPACS, which makes a simple but clear evaluation based on macroinvertebrate community, is widely used. In this study, a preliminary study was conducted to develop a RIVPACS-type model suitable for Korean streams nationwide. Reference streams were classified into two types(upstream and downstream), and a prediction model for macroinvertebrates was developed based on each family. A model for upstream was divided into 7 (train): 3 (test), and that for downstream was made using a leave-one-out method. Variables for the models were selected by non-metric multidimensional scaling, and seven variables were chosen, including elevation, slope, annual average temperature, stream width, forest ratio in land use, riffle ratio in hydrological characteristics, and boulder ratio in substrate composition. Stream order classified 3,224 sites as upstream and downstream, and community compositions of sites were predicted. The prediction was conducted for 30 macroinvertebrate families. Expected (E) and observed fauna (O) were compared using an ASPT biotic index, which is computed by dividing the BMWPK score into the number of families in a community. EQR values (i.e. O/E) for ASPT were used to assess stream condition. Lastly, we compared EQR to BMI, an index that is commonly used in the assessment. In the results, the average observed ASPT was 4.82 (±2.04 SD) and the expected one was 6.30 (±0.79 SD), and the expected ASPT was higher than the observed one. In the comparison between EQR and BMI index, EQR generally showed a higher value than the BMI index.

A study on solar radiation prediction using medium-range weather forecasts (중기예보를 이용한 태양광 일사량 예측 연구)

  • Sujin Park;Hyojeoung Kim;Sahm Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.49-62
    • /
    • 2023
  • Solar energy, which is rapidly increasing in proportion, is being continuously developed and invested. As the installation of new and renewable energy policy green new deal and home solar panels increases, the supply of solar energy in Korea is gradually expanding, and research on accurate demand prediction of power generation is actively underway. In addition, the importance of solar radiation prediction was identified in that solar radiation prediction is acting as a factor that most influences power generation demand prediction. In addition, this study can confirm the biggest difference in that it attempted to predict solar radiation using medium-term forecast weather data not used in previous studies. In this paper, we combined the multi-linear regression model, KNN, random fores, and SVR model and the clustering technique, K-means, to predict solar radiation by hour, by calculating the probability density function for each cluster. Before using medium-term forecast data, mean absolute error (MAE) and root mean squared error (RMSE) were used as indicators to compare model prediction results. The data were converted into daily data according to the medium-term forecast data format from March 1, 2017 to February 28, 2022. As a result of comparing the predictive performance of the model, the method showed the best performance by predicting daily solar radiation with random forest, classifying dates with similar climate factors, and calculating the probability density function of solar radiation by cluster. In addition, when the prediction results were checked after fitting the model to the medium-term forecast data using this methodology, it was confirmed that the prediction error increased by date. This seems to be due to a prediction error in the mid-term forecast weather data. In future studies, among the weather factors that can be used in the mid-term forecast data, studies that add exogenous variables such as precipitation or apply time series clustering techniques should be conducted.

Multivariate Analysis of Predictive Factors for the Severity in Stable Patients with Severe Injury Mechanism (중증 손상 기전의 안정된 환자에서 중증도 예측 인자들에 대한 다변량 분석)

  • Lee, Jae Young;Lee, Chang Jae;Lee, Hyoung Ju;Chung, Tae Nyoung;Kim, Eui Chung;Choi, Sung Wook;Kim, Ok Jun;Cho, Yun Kyung
    • Journal of Trauma and Injury
    • /
    • v.25 no.2
    • /
    • pp.49-56
    • /
    • 2012
  • Purpose: For determining the prognosis of critically injured patients, transporting patients to medical facilities capable of providing proper assessment and management, running rapid assessment and making rapid decisions, and providing aggressive resuscitation is vital. Considering the high mortality and morbidity rates in critically injured patients, various studies have been conducted in efforts to reduce those rates. However, studies related to diagnostic factors for predicting severity in critically injured patients are still lacking. Furthermore, patients showing stable vital signs and alert mental status, who are injured via a severe trauma mechanism, may be at a risk of not receiving rapid assessment and management. Thus, this study investigates diagnostic factors, including physical examination and laboratory results, that may help predict severity in trauma patients injured via a severe trauma mechanism, but showing stable vital signs. Methods: From March 2010 to December 2011, all trauma patients who fit into a diagnostic category that activated a major trauma team in CHA Bundang Medical Center were analyzed retrospectively. The retrospective analysis was based on prospective medical records completed at the time of arrival in the emergency department and on sequential laboratory test results. PASW statistics 18(SPSS Inc., Chicago, IL, USA) was used for the statistical analysis. Patients with relatively stable vital signs and alert mental status were selected based on a revised trauma score of more than 7 points. The final diagnosis of major trauma was made based on an injury severity score of greater than 16 points. Diagnostic variables include systolic blood pressure and respiratory rate, glasgow coma scale, initial result from focused abdominal sonography for trauma, and laboratory results from blood tests and urine analyses. To confirm the true significance of the measured values, we applied the Kolmogorov-Smirnov one sample test and the Shapiro-Wilk test. When significance was confirmed, the Student's t-test was used for comparison; when significance was not confirmed, the Mann-Whitney u-test was used. The results of focused abdominal sonography for trauma (FAST) and factors of urine analysis were analyzed using the Chi-square test or Fisher's exact test. Variables with statistical significance were selected as prognostics factors, and they were analyzed using a multivariate logistics regression model. Results: A total of 269 patients activated the major trauma team. Excluding 91 patients who scored a revised trauma score of less than 7 points, 178 patients were subdivided by injury severity score to determine the final major trauma patients. Twenty-one(21) patients from 106 major trauma patients and 9 patients from 72 minor trauma patients were also excluded due to missing medical records or untested blood and urine analysis. The investigated variables with p-values less than 0.05 include the glasgow coma scale, respiratory rate, white blood cell count (WBC), serum AST and ALT, serum creatinine, blood in spot urine, and protein in spot urine. These variables could, thus, be prognostic factors in major trauma patients. A multivariate logistics regression analysis on those 8 variables showed the respiratory rate (p=0.034), WBC (p=0.005) and blood in spot urine (p=0.041) to be independent prognostic factors for predicting the clinical course of major trauma patients. Conclusion: In trauma patients injured via a severe trauma mechanism, but showing stable vital signs and alert mental status, the respiratory rate, WBC count and blood in the urine can be used as predictable factors for severity. Using those laboratory results, rapid assessment of major trauma patients may shorten the time to diagnosis and the time for management.