• Title/Summary/Keyword: predicting method

Search Result 2,811, Processing Time 0.034 seconds

Abnormal Water Temperature Prediction Model Near the Korean Peninsula Using LSTM (LSTM을 이용한 한반도 근해 이상수온 예측모델)

  • Choi, Hey Min;Kim, Min-Kyu;Yang, Hyun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.3
    • /
    • pp.265-282
    • /
    • 2022
  • Sea surface temperature (SST) is a factor that greatly influences ocean circulation and ecosystems in the Earth system. As global warming causes changes in the SST near the Korean Peninsula, abnormal water temperature phenomena (high water temperature, low water temperature) occurs, causing continuous damage to the marine ecosystem and the fishery industry. Therefore, this study proposes a methodology to predict the SST near the Korean Peninsula and prevent damage by predicting abnormal water temperature phenomena. The study area was set near the Korean Peninsula, and ERA5 data from the European Center for Medium-Range Weather Forecasts (ECMWF) was used to utilize SST data at the same time period. As a research method, Long Short-Term Memory (LSTM) algorithm specialized for time series data prediction among deep learning models was used in consideration of the time series characteristics of SST data. The prediction model predicts the SST near the Korean Peninsula after 1- to 7-days and predicts the high water temperature or low water temperature phenomenon. To evaluate the accuracy of SST prediction, Coefficient of determination (R2), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) indicators were used. The summer (JAS) 1-day prediction result of the prediction model, R2=0.996, RMSE=0.119℃, MAPE=0.352% and the winter (JFM) 1-day prediction result is R2=0.999, RMSE=0.063℃, MAPE=0.646%. Using the predicted SST, the accuracy of abnormal sea surface temperature prediction was evaluated with an F1 Score (F1 Score=0.98 for high water temperature prediction in summer (2021/08/05), F1 Score=1.0 for low water temperature prediction in winter (2021/02/19)). As the prediction period increased, the prediction model showed a tendency to underestimate the SST, which also reduced the accuracy of the abnormal water temperature prediction. Therefore, it is judged that it is necessary to analyze the cause of underestimation of the predictive model in the future and study to improve the prediction accuracy.

Development of tracer concentration analysis method using drone-based spatio-temporal hyperspectral image and RGB image (드론기반 시공간 초분광영상 및 RGB영상을 활용한 추적자 농도분석 기법 개발)

  • Gwon, Yeonghwa;Kim, Dongsu;You, Hojun;Han, Eunjin;Kwon, Siyoon;Kim, Youngdo
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.8
    • /
    • pp.623-634
    • /
    • 2022
  • Due to river maintenance projects such as the creation of hydrophilic areas around rivers and the Four Rivers Project, the flow characteristics of rivers are continuously changing, and the risk of water quality accidents due to the inflow of various pollutants is increasing. In the event of a water quality accident, it is necessary to minimize the effect on the downstream side by predicting the concentration and arrival time of pollutants in consideration of the flow characteristics of the river. In order to track the behavior of these pollutants, it is necessary to calculate the diffusion coefficient and dispersion coefficient for each section of the river. Among them, the dispersion coefficient is used to analyze the diffusion range of soluble pollutants. Existing experimental research cases for tracking the behavior of pollutants require a lot of manpower and cost, and it is difficult to obtain spatially high-resolution data due to limited equipment operation. Recently, research on tracking contaminants using RGB drones has been conducted, but RGB images also have a limitation in that spectral information is limitedly collected. In this study, to supplement the limitations of existing studies, a hyperspectral sensor was mounted on a remote sensing platform using a drone to collect temporally and spatially higher-resolution data than conventional contact measurement. Using the collected spatio-temporal hyperspectral images, the tracer concentration was calculated and the transverse dispersion coefficient was derived. It is expected that by overcoming the limitations of the drone platform through future research and upgrading the dispersion coefficient calculation technology, it will be possible to detect various pollutants leaking into the water system, and to detect changes in various water quality items and river factors.

A study on solar radiation prediction using medium-range weather forecasts (중기예보를 이용한 태양광 일사량 예측 연구)

  • Sujin Park;Hyojeoung Kim;Sahm Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.49-62
    • /
    • 2023
  • Solar energy, which is rapidly increasing in proportion, is being continuously developed and invested. As the installation of new and renewable energy policy green new deal and home solar panels increases, the supply of solar energy in Korea is gradually expanding, and research on accurate demand prediction of power generation is actively underway. In addition, the importance of solar radiation prediction was identified in that solar radiation prediction is acting as a factor that most influences power generation demand prediction. In addition, this study can confirm the biggest difference in that it attempted to predict solar radiation using medium-term forecast weather data not used in previous studies. In this paper, we combined the multi-linear regression model, KNN, random fores, and SVR model and the clustering technique, K-means, to predict solar radiation by hour, by calculating the probability density function for each cluster. Before using medium-term forecast data, mean absolute error (MAE) and root mean squared error (RMSE) were used as indicators to compare model prediction results. The data were converted into daily data according to the medium-term forecast data format from March 1, 2017 to February 28, 2022. As a result of comparing the predictive performance of the model, the method showed the best performance by predicting daily solar radiation with random forest, classifying dates with similar climate factors, and calculating the probability density function of solar radiation by cluster. In addition, when the prediction results were checked after fitting the model to the medium-term forecast data using this methodology, it was confirmed that the prediction error increased by date. This seems to be due to a prediction error in the mid-term forecast weather data. In future studies, among the weather factors that can be used in the mid-term forecast data, studies that add exogenous variables such as precipitation or apply time series clustering techniques should be conducted.

Radiomics Analysis of Gray-Scale Ultrasonographic Images of Papillary Thyroid Carcinoma > 1 cm: Potential Biomarker for the Prediction of Lymph Node Metastasis (Radiomics를 이용한 1 cm 이상의 갑상선 유두암의 초음파 영상 분석: 림프절 전이 예측을 위한 잠재적인 바이오마커)

  • Hyun Jung Chung;Kyunghwa Han;Eunjung Lee;Jung Hyun Yoon;Vivian Youngjean Park;Minah Lee;Eun Cho;Jin Young Kwak
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.1
    • /
    • pp.185-196
    • /
    • 2023
  • Purpose This study aimed to investigate radiomics analysis of ultrasonographic images to develop a potential biomarker for predicting lymph node metastasis in papillary thyroid carcinoma (PTC) patients. Materials and Methods This study included 431 PTC patients from August 2013 to May 2014 and classified them into the training and validation sets. A total of 730 radiomics features, including texture matrices of gray-level co-occurrence matrix and gray-level run-length matrix and single-level discrete two-dimensional wavelet transform and other functions, were obtained. The least absolute shrinkage and selection operator method was used for selecting the most predictive features in the training data set. Results Lymph node metastasis was associated with the radiomics score (p < 0.001). It was also associated with other clinical variables such as young age (p = 0.007) and large tumor size (p = 0.007). The area under the receiver operating characteristic curve was 0.687 (95% confidence interval: 0.616-0.759) for the training set and 0.650 (95% confidence interval: 0.575-0.726) for the validation set. Conclusion This study showed the potential of ultrasonography-based radiomics to predict cervical lymph node metastasis in patients with PTC; thus, ultrasonography-based radiomics can act as a biomarker for PTC.

Prediction of Amyloid β-Positivity with both MRI Parameters and Cognitive Function Using Machine Learning (뇌 MRI와 인지기능평가를 이용한 아밀로이드 베타 양성 예측 연구)

  • Hye Jin Park;Ji Young Lee;Jin-Ju Yang;Hee-Jin Kim;Young Seo Kim;Ji Young Kim;Yun Young Choi
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.3
    • /
    • pp.638-652
    • /
    • 2023
  • Purpose To investigate the MRI markers for the prediction of amyloid β (Aβ)-positivity in mild cognitive impairment (MCI) and Alzheimer's disease (AD), and to evaluate the differences in MRI markers between Aβ-positive (Aβ [+]) and -negative groups using the machine learning (ML) method. Materials and Methods This study included 139 patients with MCI and AD who underwent amyloid PET-CT and brain MRI. Patients were divided into Aβ (+) (n = 84) and Aβ-negative (n = 55) groups. Visual analysis was performed with the Fazekas scale of white matter hyperintensity (WMH) and cerebral microbleeds (CMB) scores. The WMH volume and regional brain volume were quantitatively measured. The multivariable logistic regression and ML using support vector machine, and logistic regression were used to identify the best MRI predictors of Aβ-positivity. Results The Fazekas scale of WMH (p = 0.02) and CMB scores (p = 0.04) were higher in Aβ (+). The volumes of hippocampus, entorhinal cortex, and precuneus were smaller in Aβ (+) (p < 0.05). The third ventricle volume was larger in Aβ (+) (p = 0.002). The logistic regression of ML showed a good accuracy (81.1%) with mini-mental state examination (MMSE) and regional brain volumes. Conclusion The application of ML using the MMSE, third ventricle, and hippocampal volume is helpful in predicting Aβ-positivity with a good accuracy.

Development of Kimchi Cabbage Growth Prediction Models Based on Image and Temperature Data (영상 및 기온 데이터 기반 배추 생육예측 모형 개발)

  • Min-Seo Kang;Jae-Sang Shim;Hye-Jin Lee;Hee-Ju Lee;Yoon-Ah Jang;Woo-Moon Lee;Sang-Gyu Lee;Seung-Hwan Wi
    • Journal of Bio-Environment Control
    • /
    • v.32 no.4
    • /
    • pp.366-376
    • /
    • 2023
  • This study was conducted to develop a model for predicting the growth of kimchi cabbage using image data and environmental data. Kimchi cabbages of the 'Cheongmyeong Gaual' variety were planted three times on July 11th, July 19th, and July 27th at a test field located at Pyeongchang-gun, Gangwon-do (37°37' N 128°32' E, 510 elevation), and data on growth, images, and environmental conditions were collected until September 12th. To select key factors for the kimchi cabbage growth prediction model, a correlation analysis was conducted using the collected growth data and meteorological data. The correlation coefficient between fresh weight and growth degree days (GDD) and between fresh weight and integrated solar radiation showed a high correlation coefficient of 0.88. Additionally, fresh weight had significant correlations with height and leaf area of kimchi cabbages, with correlation coefficients of 0.78 and 0.79, respectively. Canopy coverage was selected from the image data and GDD was selected from the environmental data based on references from previous researches. A prediction model for kimchi cabbage of biomass, leaf count, and leaf area was developed by combining GDD, canopy coverage and growth data. Single-factor models, including quadratic, sigmoid, and logistic models, were created and the sigmoid prediction model showed the best explanatory power according to the evaluation results. Developing a multi-factor growth prediction model by combining GDD and canopy coverage resulted in improved determination coefficients of 0.9, 0.95, and 0.89 for biomass, leaf count, and leaf area, respectively, compared to single-factor prediction models. To validate the developed model, validation was conducted and the determination coefficient between measured and predicted fresh weight was 0.91, with an RMSE of 134.2 g, indicating high prediction accuracy. In the past, kimchi cabbage growth prediction was often based on meteorological or image data, which resulted in low predictive accuracy due to the inability to reflect on-site conditions or the heading up of kimchi cabbage. Combining these two prediction methods is expected to enhance the accuracy of crop yield predictions by compensating for the weaknesses of each observation method.

Peak Expiratory Flow(PEF) Measured by Peak Flow Meter and Correlation Between PEF and Other Ventilatory Parameters in Healthy Children (정상 소아에서 최고호기유량계(peak flow meter)로 측정한 최고호기유량(PEF)와 기타 환기기능검사와의 상관관계)

  • Oak, Chul-Ho;Sohn, Kai-Hag;Park, Ki-Ryong;Cho, Hyun-Myung;Jang, Tae-Won;Jung, Maan-Hong
    • Tuberculosis and Respiratory Diseases
    • /
    • v.51 no.3
    • /
    • pp.248-259
    • /
    • 2001
  • Background : In diagnosis or monitor of the airway obstruction in bronchial asthma, the measurement of $FEV_1$ in the standard method because of its reproducibility and accuracy. But the measurement of peak expiratory flow(PEF) by peak flow meter is much simpler and easier than that of $FEV_1$ especially in children. Yet there have been still no data of the predicted normal values of PEF measured by peak flow meter in Korean children. This study was conducted to provide equations to predict the normal value of PEF and correlation between PEF and $FEV_1$ in healthy children. Method : PEF was measured by MiniWright peak flow meter, and the forced expiratory volume and the maximum expiratory flow volume curves were measured by Microspiro HI 501(Chest Co.) in 346 healthy children(age:5-16 years, 194 boys and 152 girls) without any respiratory symptoms during 2 weeks before the study. The regression equations for various ventilatory parameters according to age and/or height, and the regression equations of $FEV_1$ by PEF were derived. Results : 1. The regression equation for PEF(L/min) was: $12.6{\times}$age(year)+$3.4{\times}$height(cm)-263($R^2=0.85$) in boys, and $6{\times}$age(year)+$3.9{\times}$height(cm)-293($R^2=0.82$) in girls. 2. The value of FEFmax(L/sec) derived from the maximum expiratory flow volume curves was multiplied by 60 to compare with PEF(L/min), and PEF was faster by 125 L/min in boys and 118 L/min in girls, respectively. 3. The regression equation for $FEV_1$(ml) by PEF(L/min) was:$7{\times}$PEF-550($R^2=0.82$) in boys, and $5.8{\times}$PEF-146 ($R^2=0.81$) in girls, respectively. Conclusion : This study provides regression equations predicting the normal values of PEF by age and/or height in children. And the equations for $FEV_1$, a gold standard of ventilatory function, was predicted by PEF. So, in taking care of children with airway obstruction, PEF measured by the peak flow meter can provide useful information.

  • PDF

Predicting Oxygen Uptake for Men with Moderate to Severe Chronic Obstructive Pulmonary Disease (COPD환자에서 6분 보행검사를 이용한 최대산소섭취량 예측)

  • Kim, Changhwan;Park, Yong Bum;Mo, Eun Kyung;Choi, Eun Hee;Nam, Hee Seung;Lee, Sung-Soon;Yoo, Young Won;Yang, Yun Jun;Moon, Joung Wha;Kim, Dong Soon;Lee, Hyang Yi;Jin, Young-Soo;Lee, Hye Young;Chun, Eun Mi
    • Tuberculosis and Respiratory Diseases
    • /
    • v.64 no.6
    • /
    • pp.433-438
    • /
    • 2008
  • Background: Measurement of the maximum oxygen uptake in patients with chronic obstructive pulmonary disease (COPD) has been used to determine the intensity of exercise and to estimate the patient's response to treatment during pulmonary rehabilitation. However, cardiopulmonary exercise testing is not widely available in Korea. The 6-minute walk test (6MWT) is a simple method of measuring the exercise capacity of a patient. It also provides high reliability data and it reflects the fluctuation in one' s exercise capacity relatively well with using the standardized protocol. The prime objective of the present study is to develop a regression equation for estimating the peak oxygen uptake ($VO_2$) for men with moderate to very severe COPD from the results of a 6MWT. Methods: A total of 33 male patients with moderate to very severe COPD agreed to participate in this study. Pulmonary function testing, cardiopulmonary exercise testing and a 6MWT were performed on their first visits. The index of work ($6M_{work}$, 6-minute walk distance [6MWD]${\times}$body weight) was calculated for each patient. Those variables that were closely related to the peak $VO_2$ were identified through correlation analysis. With including such variables, the equation to predict the peak $VO_2$ was generated by the multiple linear regression method. Results: The peak $VO_2$ averaged $1,015{\pm}392ml/min$, and the mean 6MWD was $516{\pm}195$ meters. The $6M_{work}$ (r=.597) was better correlated to the peak $VO_2$ than the 6MWD (r=.415). The other variables highly correlated with the peak $VO_2$ were the $FEV_1$ (r=.742), DLco (r=.734) and FVC (r=.679). The derived prediction equation was $VO_2$ (ml/min)=($274.306{\times}FEV_1$)+($36.242{\times}DLco$)+($0.007{\times}6M_{work}$)-84.867. Conclusion: Under the circumstances when measurement of the peak $VO_2$ is not possible, we consider the 6MWT to be a simple alternative to measuring the peak $VO_2$. Of course, it is necessary to perform a trial on much larger scale to validate our prediction equation.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.